Zpracování a kategorizace textů v přirozeném jazyce
Loading...
Downloads
2
Date issued
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Vysoká škola báňská - Technická univerzita Ostrava
Location
Signature
Abstract
The aim of this work was the issue of text processing in natural language and its categorization, and specifically to develop a program for processing texts in Czech and English and their subsequent analysis. After considering the choice of language for implementation was selected programming language Python and its Scrapy library was used to extract data from the Internet. Lemmatization of texts is realized through its library Majka. The program can, after learning from the supplied datasets, compare several possible algorithms for text categorization and include new data in the given categories. The program also implements grouping of texts for categorization without initial datasets.
Description
Subject(s)
Text categorization, Scrapy, Python, Majka, web crawler, machine learning