Extrakce zpráv z webových stránek
Loading...
Downloads
12
Date issued
Authors
Blanár, Štefan
Journal Title
Journal ISSN
Volume Title
Publisher
Vysoká škola báňská - Technická univerzita Ostrava
Location
Signature
Abstract
The main goal of this diploma thesis is to perform large – scale research about text mining methods especially text mining of structured data from web, concrete from HTML documents, what is well-known problem. Results of this research will be summarized in fist part of this document. Next I probe a few web wrapper’s, especially I’ll try to find some existing wrapper, which could be used as solution for extraction news from web. I also perform an extensive observation of the most famous news portals and news on them. Finally acquired knowledge will be used for developing my own solution of problem extraction news from web pages. I’ll define what web news is and how they differs from information. Then I test my solution in real conditions on real well known news portals. All results of this testing will be presented in last chapter of this thesis.
Description
Import 05/08/2014
Subject(s)
text mining, regular expression, extraction, news, Internet, Web, URL, method, algorithm, ReLIE, ONTEA, DOM, XML, HTML, XPath, MDF, TPC, NCSCA, TTR, wrapper, crawler, automatic wrapper, semi-automatic wrapper, keywords, scheme