Recognition of fixed-structure documents

Description: high efficiency of document management with a minimum number of errors involves storing electronic versions of documents and fast digitization of printed versions of documents. To improve the quality and speed of this process, it is proposed to perform automatic recognition of basic information from images of a printed document to create an electronic document template. The solution uses neural network algorithms for recognizing information from images.

Context: when processing a large flow of documents, a lot of time is spent on routine, and it is difficult to automate processing since one document has a lot of different forms. Manual markup takes a long time.

Decision: created a document recognition system based on its own OCR and Text Detection models.

Results:
The solution is integrated into the customer's business process:

Reducing the load on specialists by 60%;
Increase the speed of document processing in the business process by 10%;
Reduced costs due to lost documents by 5%.

Built solutions for automatic speech recognition:

Receipts;
Invoices
Order-out; Contract agreement.

To build the model, we used:

The marked-up document template;
Document reference fields with their characteristics;
A set of document images.

Simulation result:

OCR character selection model;
Search model for text blocks Text Detection;
Table recognition model Table Detection;
Model for building an electronic version of the document.

Customer: Telecom, Finance

Technology stack: TensorFlow, Python, Flask.