An efficient unsupervised approach for OCR error correction of Vietnamese OCR text
Loading...
Downloads
3
Date issued
Journal Title
Journal ISSN
Volume Title
Publisher
IEEE
Location
Signature
License
Abstract
Different types of OCR errors often occur in OCR texts due to the low quality of scanned
document images or limitations in OCR software. In this paper, we propose a novel unsupervised approach
for OCR error correction. Correction candidates for OCR errors are generated and explored in their
neighborhoods using correction character edits controlled by an adapted hill-climbing algorithm. Correction
characters are extracted from only original ground truth texts, which do not depend on OCR texts in training
data. A weighted objective function used to score and rank correction candidates is heuristically tested to
find optimal weight combinations. The proposed model is evaluated on an OCR text dataset originating from
the Vietnamese handwritten database in the ICFHR 2018 Vietnamese online handwritten text recognition
competition. The proposed model is also verified concerning its stability and complexity. The experimental
results show that our model achieves competitive performance compared to the other models in the ICFHR
2018 competition.
Description
Subject(s)
OCR, character edit, error correction, attention-based encoder-decoder, hill climbing
Citation
IEEE Access. 2023, vol. 11, p. 58406-58421.