Zobrazit minimální záznam

dc.contributor.authorStaš, Ján
dc.contributor.authorZlacký, Daniel
dc.contributor.authorHládek, Daniel
dc.contributor.authorJuhár, Jozef
dc.date2013
dc.date.accessioned2014-01-15T12:07:04Z
dc.date.available2014-01-15T12:07:04Z
dc.date.issued2013
dc.identifier.citationAdvances in electrical and electronic engineering. 2013, vol. 11, no. 5, p. 398-403 : ill.cs
dc.identifier.issn1804-3119
dc.identifier.issn1336-1376
dc.identifier.urihttp://hdl.handle.net/10084/101403
dc.description.abstractThis paper describes the process of categorization of unorganized text data gathered from the Internet to the in-domain and out-of-domain data for better domain-specific language modeling and speech recognition. An algorithm for text categorization and topic detection based on the most frequent key phrases is presented. In this scheme, each document entered into the process of text categorization is represented by a vector space model with term weighting based on computing the term frequency and inverse document frequency. Text documents are then classified to the in-domain and out-of-domain data automatically with predefined threshold using one of the selected distance/similarity measures comparing to the list of key phrases. The experimental results of the language modeling and adaptation to the judicial domain show significant improvement in the model perplexity about 19 % and decreasing of the word error rate of the Slovak transcription and dictation system about 5,54 %, relatively.cs
dc.format.extent277132 bytes
dc.format.mimetypeapplication/pdf
dc.language.isoencs
dc.publisherVysoká škola báňská - Technická univerzita Ostravacs
dc.relation.ispartofseriesAdvances in electrical and electronic engineeringcs
dc.relation.urihttp://advances.utc.sk/index.php/AEEE/article/download/897/898cs
dc.rights© Vysoká škola báňská - Technická univerzita Ostrava
dc.rightsCreative Commons Attribution 3.0 Unported (CC BY 3.0)
dc.subjectlanguage modelingcs
dc.subjectlarge vocabulary continuous speech recognitioncs
dc.subjectsimilarity measurecs
dc.subjectterm weightingcs
dc.subjecttext categorizationcs
dc.subjecttopic detectioncs
dc.titleCategorization of unorganized text corpora for better domain-specific language modelingcs
dc.typearticlecs
dc.rights.accessopenAccess
dc.type.versionpublishedVersioncs
dc.type.statusPeer-reviewedcs


Soubory tohoto záznamu

Tento záznam se objevuje v následujících kolekcích

  • AEEE. 2013, vol. 11 [58]
  • OpenAIRE [5085]
    Kolekce určená pro sklízení infrastrukturou OpenAIRE; obsahuje otevřeně přístupné publikace, případně další publikace, které jsou výsledkem projektů rámcových programů Evropské komise (7. RP, H2020, Horizon Europe).

Zobrazit minimální záznam