Unsupervised spelling correction for Slovak

Loading...
Thumbnail Image

Downloads

17

Date issued

Authors

Hládek, Daniel
Staš, Ján
Juhár, Jozef

Journal Title

Journal ISSN

Volume Title

Publisher

Vysoká škola báňská - Technická univerzita Ostrava

Location

Signature

Abstract

This paper introduces a method to automatically propose and choose a correction for an incorrectly written word in a large text corpus written in Slovak. This task can be described as a process of finding the best matching sequence of correct words to a list of incorrectly spelled words, found in the input. Knowledge base of the classification system - statistics about sequences of correctly typed words and possible corrections for incorrectly typed words can be mathematically described as a hidden Markov model. The best matching sequence of correct words is found using Viterbi algorithm. The system will be evaluated on a manually corrected testing set.

Description

Delayed publication

Available after

Subject(s)

automatic spelling correction, hidden Markov model, natural language processing

Citation

Advances in electrical and electronic engineering. 2013, vol. 11, no. 5, p. 392-397 : ill.