Určování podobnosti dokumentů s použitím tradičních výpočetních metod a spolupráce davu
Loading...
Downloads
8
Date issued
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Vysoká škola báňská - Technická univerzita Ostrava
Location
Signature
Abstract
The master thesis deals with categorization of text documents and its improvement through crowdsourcing. Its goal is to design and implement text documents classifier prototype based on documents similarity and to design evaluation and improvements of categorization using crowdsourcing. For categorization the N-grams algorithm has been chosen, which was implemented in Java. Next, interface for crowdsourcing was created using CMS WordPress. In addition to data collection, the purpose of interface is to evaluate categorization accuracy, which leads to extension of classifier's test data set, thus the categorization is more successful. Both parts of the thesis should serve as base for prepared project between University of Ostrava and VŠB - Technical university of Ostrava.
Description
Subject(s)
Categorization, text documents, natural language, documents similarity, N-grams, crowdsourcing, WordPress, Java, PHP