Detekce klíčových slov v odborných článcích

Loading...
Thumbnail Image

Downloads

3

Date issued

Authors

Blažek, Ondřej

Journal Title

Journal ISSN

Volume Title

Publisher

Vysoká škola báňská - Technická univerzita Ostrava

Location

Signature

Abstract

The subject of this thesis is one typical role of a scientific discipline called text mining. Specifically it is a keyword spotting documents, which can be used for example for the distribution of documents into categories. The theoretical part is divided into two parts where the first part is devoted to the basic concepts and explains them in this issue. This is essentially a way to properly represent documents in a vector space. The second part deals with the exploration of existing methods for determining the categories of documents and keywords detection on the basis of those categories are merged. An important part of the work is its own implementation, which describes the steps of my process. For example we can find here steps to create a vector that will represent the document and clustering a set of documents into a given number of categories, based on their similarity. This clustering is used as a tool for categorization, which subsequently due to frequency analysis, keywords of categories are detected.

Description

Import 26/06/2013

Subject(s)

categorization , thematization , text mining , key words

Citation