Zobrazit minimální záznam

dc.contributor.authorNguyen, Quoc-Dung
dc.contributor.authorLe, Duc-Anh
dc.contributor.authorPhan, Nguyet-Minh
dc.contributor.authorZelinka, Ivan
dc.date.accessioned2021-01-29T08:25:50Z
dc.date.available2021-01-29T08:25:50Z
dc.date.issued2020
dc.identifier.citationPattern Analysis and Applications. 2020.cs
dc.identifier.issn1433-7541
dc.identifier.issn1433-755X
dc.identifier.urihttp://hdl.handle.net/10084/142603
dc.description.abstractOptical character recognition (OCR) systems help to digitize paper-based historical achieves. However, poor quality of scanned documents and limitations of text recognition techniques result in different kinds of errors in OCR outputs. Post-processing is an essential step in improving the output quality of OCR systems by detecting and cleaning the errors. In this paper, we present an automatic model consisting of both error detection and error correction phases for OCR post-processing. We propose a novel approach of OCR post-processing error correction using correction pattern edits and evolutionary algorithm which has been mainly used for solving optimization problems. Our model adopts a variant of the self-organizing migrating algorithm along with a fitness function based on modifications of important linguistic features. We illustrate how to construct the table of correction pattern edits involving all types of edit operations and being directly learned from the training dataset. Through efficient settings of the algorithm parameters, our model can be performed with high-quality candidate generation and error correction. The experimental results show that our proposed approach outperforms various baseline approaches as evaluated on the benchmark dataset of ICDAR 2017 Post-OCR text correction competition.cs
dc.language.isoencs
dc.publisherSpringer Naturecs
dc.relation.ispartofseriesPattern Analysis and Applicationscs
dc.relation.urihttp://doi.org/10.1007/s10044-020-00936-ycs
dc.rightsCopyright © 2020, Springer-Verlag London Ltd., part of Springer Naturecs
dc.subjectOCRcs
dc.subjectn-gramscs
dc.subjectsimilaritycs
dc.subjectcontextcs
dc.subjectcorrection patterncs
dc.subjectevolutionary algorithmcs
dc.titleOCR error correction using correction patterns and self-organizing migrating algorithmcs
dc.typearticlecs
dc.identifier.doi10.1007/s10044-020-00936-y
dc.type.statusPeer-reviewedcs
dc.description.sourceWeb of Sciencecs
dc.identifier.wos000591971700001


Soubory tohoto záznamu

SouboryVelikostFormátZobrazit

K tomuto záznamu nejsou připojeny žádné soubory.

Tento záznam se objevuje v následujících kolekcích

Zobrazit minimální záznam