Extrakce zpráv z webových stránek

Blanár, Štefan

Extrakce zpráv z webových stránek

Files

BLA0043_FEI_N2647_2612T025_2014.pdf (4.36 MB)

BLA0043_FEI_N2647_2612T025_2014_priloha.zip (8.08 MB)

BLA0043_FEI_N2647_2612T025_2014_posudek_vedouci_Kudelka_Milos.pdf (49.41 KB)

BLA0043_FEI_N2647_2612T025_2014_posudek_oponent_Horak_Zdenek.pdf (119.9 KB)

Downloads

59

Date issued

2014

Authors

Blanár, Štefan

Publisher

Vysoká škola báňská - Technická univerzita Ostrava

Abstract

The main goal of this diploma thesis is to perform large – scale research about text mining methods especially text mining of structured data from web, concrete from HTML documents, what is well-known problem. Results of this research will be summarized in fist part of this document. Next I probe a few web wrapper’s, especially I’ll try to find some existing wrapper, which could be used as solution for extraction news from web. I also perform an extensive observation of the most famous news portals and news on them. Finally acquired knowledge will be used for developing my own solution of problem extraction news from web pages. I’ll define what web news is and how they differs from information. Then I test my solution in real conditions on real well known news portals. All results of this testing will be presented in last chapter of this thesis.

Description

Import 05/08/2014

Subject(s)

text mining, regular expression, extraction, news, Internet, Web, URL, method, algorithm, ReLIE, ONTEA, DOM, XML, HTML, XPath, MDF, TPC, NCSCA, TTR, wrapper, crawler, automatic wrapper, semi-automatic wrapper, keywords, scheme

Item identifier

http://hdl.handle.net/10084/103914

Collections

Vysokoškolské kvalifikační práce Fakulty elektrotechniky a informatiky / Theses and dissertations of Faculty of Electrical Engineering and Computer Science (FEI)

Show full item record

Extrakce zpráv z webových stránek

Files

Downloads

Date issued

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Location

Signature

Abstract

Description

Delayed publication

Available after

Subject(s)

Citation

Item identifier

Collections