Detekce klíčových slov v odborných článcích
Loading...
Downloads
3
Date issued
Authors
Blažek, Ondřej
Journal Title
Journal ISSN
Volume Title
Publisher
Vysoká škola báňská - Technická univerzita Ostrava
Location
Signature
Abstract
The subject of this thesis is one typical role of a scientific discipline called text mining. Specifically it is a keyword spotting documents, which can be used for example for the distribution of documents into categories.
The theoretical part is divided into two parts where the first part is devoted to the basic concepts and explains them in this issue. This is essentially a way to properly represent documents in a vector space.
The second part deals with the exploration of existing methods for determining the categories of documents and keywords detection on the basis of those categories are merged.
An important part of the work is its own implementation, which describes the steps of my process. For example we can find here steps to create a vector that will represent the document and clustering a set of documents into a given number of categories, based on their similarity. This clustering is used as a tool for categorization, which subsequently due to frequency analysis, keywords of categories are detected.
Description
Import 26/06/2013
Subject(s)
categorization , thematization , text mining , key words