Context: when processing a large flow of documents, a lot of time is spent on routine, and it is difficult to automate processing since one document has a lot of different forms. Manual markup takes a long time.
Decision: created a document recognition system based on its own OCR and Text Detection models.
Results:
The solution is integrated into the customer's business process:
- Reducing the load on specialists by 60%;
- Increase the speed of document processing in the business process by 10%;
- Reduced costs due to lost documents by 5%.
Built solutions for automatic speech recognition:
- Receipts;
- Invoices
- Order-out; Contract agreement.
To build the model, we used:
- The marked-up document template;
- Document reference fields with their characteristics;
- A set of document images.
Simulation result:
- OCR character selection model;
- Search model for text blocks Text Detection;
- Table recognition model Table Detection;
- Model for building an electronic version of the document.
Customer: Telecom, Finance
Technology stack: TensorFlow, Python, Flask.