Success Story - en

Multilingual topic model for article search

Motivation for launching the project by the client: the client wanted to add new functionality to his own product - the ability to search for a translation of a scientific article among the most common languages.

What we had initially: Antiplagiat did not have functionality for searching translations of scientific articles; the task was to add new functionality. 

Project goals: to build a topic model aimed to solve two problems: the problem of semantic search for the translation of scientific articles, as well as the problem of classifying scientific articles relative to scientific headings. 

MIL Team's solution: The team's experience in the field of topic modelling and microservice architecture made it possible to create a service for searching translations of scientific articles and defining scientific headings of articles, which can be launched in a virtual machine. 

Tools for building the model:
  • A parallel corpus of scientific articles from the library website;
  • A parallel corpus of Wikipedia articles in 100 languages;
  • Affiliation tags of scientific headings of different rubricators(UDC, OECD).

The model results:

  • a topic model of scientific headings;
  • a virtual machine on which the model can run.

Client: Antiplagiat
Technological stack: grpc, Python, sklearn, BigARTM
Natural Language Processing Research Division