Sledování frekvence slov v internetových zpravodajských serverech

Činčala, Radoslav

Sledování frekvence slov v internetových zpravodajských serverech

Files

CIN020_FEI_N2647_2612T025_2012.pdf (8.69 MB)

CIN020_FEI_N2647_2612T025_2012_priloha.zip (7.65 MB)

CIN020_FEI_N2647_2612T025_2012_posudek_vedouci_Baca_Radim.pdf (49.82 KB)

CIN020_FEI_N2647_2612T025_2012_posudek_oponent_Kratky_Michal.pdf (52.6 KB)

Downloads

69

Date issued

2012

Authors

Činčala, Radoslav

Publisher

Vysoká škola báňská - Technická univerzita Ostrava

Abstract

The aim of this work is processing of articles on public Czech news servers. Output is frequency of the most frequent words in a certain period of time or at certain news server. Format of articles is considerably different in dependence on particular server and mechanical extracting of article's main body is not easy. The work is primarily concerned with methods of extracting data from articles for purpose of easily adding of other news servers to monitoring. The resulting solution is creation of robust tool for mechanical data extraction from articles in news servers and tool that allows easy and fast news servers adding to automatically monitoring and mechanical extraction. Extracted data are then processed and stored into a database along with the frequencies of individual words and other related data in order to obtain statistics for different time intervals and for different servers. The output of data extraction can be influenced by lists of stop words and equivalent words, which can be easily changed dynamically. Work with tool allows simple web interface that allows efficient searching of words frequency in a given time interval or in a given server.

Description

Import 26/06/2013

Subject(s)

time, article, database, extraction, frequency, information, internet journalism, java, HTML language, lemming, rss feed, word, news server, information retrieval

Item identifier

http://hdl.handle.net/10084/98628

Collections

Vysokoškolské kvalifikační práce Fakulty elektrotechniky a informatiky / Theses and dissertations of Faculty of Electrical Engineering and Computer Science (FEI)

Show full item record

Sledování frekvence slov v internetových zpravodajských serverech

Files

Downloads

Date issued

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Location

Signature

Abstract

Description

Delayed publication

Available after

Subject(s)

Citation

Item identifier

Collections