Journal: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Authors: Filipp Nikitin, Vladimir Dokholyan, Ilya Zharikov, Vadim Strijov
Abstract: With the increasing popularity of document analysis and recognition systems, text detection (TD) and text binarization (TB) in document images remain challenging tasks. In the paper, we introduced a two-step architecture for the TD task. Firstly, a U-net based model is used to get a text mask in terms of word-level bounding boxes. Secondly, we approximate the mask of the bounding boxes with rectangles using a classic computer vision method. The model achieves state-of-the-art result on document images and outperforms other popular approaches. Moreover, we introduce the Hybrid U-net architecture, which helps to solve the TB and TD problems at the same time. The model demonstrates high results on both problems. The shared convolution encoder allows to reduce the number of parameters and consumed memory compared to separate models without reducing the model performance.
Link: U-Net Based Architectures for Document Text Detection and Binarization