Zpracování a kategorizace textů v přirozeném jazyce

Kubica, Jan

Zpracování a kategorizace textů v přirozeném jazyce

Files

KUB0506_FEI_B2647_2612R025_2020.pdf (2.42 MB)

KUB0506_FEI_B2647_2612R025_2020_priloha.zip (5.85 MB)

KUB0506_FEI_B2647_2612R025_2020_posudek_vedouci_Saloun_Petr.pdf (55.22 KB)

KUB0506_FEI_B2647_2612R025_2020_posudek_oponent_Andresic_David.pdf (92.48 KB)

Downloads

48

Date issued

2020

Authors

Kubica, Jan

Publisher

Vysoká škola báňská - Technická univerzita Ostrava

Abstract

The aim of this work was the issue of text processing in natural language and its categorization, and specifically to develop a program for processing texts in Czech and English and their subsequent analysis. After considering the choice of language for implementation was selected programming language Python and its Scrapy library was used to extract data from the Internet. Lemmatization of texts is realized through its library Majka. The program can, after learning from the supplied datasets, compare several possible algorithms for text categorization and include new data in the given categories. The program also implements grouping of texts for categorization without initial datasets.

Subject(s)

Text categorization, Scrapy, Python, Majka, web crawler, machine learning

Item identifier

http://hdl.handle.net/10084/140472

Collections

Vysokoškolské kvalifikační práce Fakulty elektrotechniky a informatiky / Theses and dissertations of Faculty of Electrical Engineering and Computer Science (FEI)

Show full item record

Zpracování a kategorizace textů v přirozeném jazyce

Files

Downloads

Date issued

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Location

Signature

Abstract

Description

Delayed publication

Available after

Subject(s)

Citation

Item identifier

Collections