Zpracování velkých objemů nestrukturovaných dat na platformě Hadoop

Prouza, Martin

Zpracování velkých objemů nestrukturovaných dat na platformě Hadoop

Files

PRO0099_FEI_B2647_2612R025_2015.pdf (3.01 MB)

PRO0099_FEI_B2647_2612R025_2015_priloha.zip (51.73 MB)

PRO0099_FEI_B2647_2612R025_2015_posudek_vedouci_Skacelik_Jiri.pdf (572.93 KB)

PRO0099_FEI_B2647_2612R025_2015_posudek_oponent_Baca_Radim.pdf (54.13 KB)

Downloads

49

Date issued

2015

Authors

Prouza, Martin

Publisher

Vysoká škola báňská - Technická univerzita Ostrava

Abstract

This thesis is concerned with processing unstructured text on Hadoop platform. First part focuses on the reasons of creation Big Data concept. I explain data issue of these days and show, why common database systems are inappropriate for working with huge amounts of unstructured data. The next part focuses on theory about Big Data concept and processing on Hadoop platform. I introduce Hadoop architecture and how it differs from common warehouse database. I also explain parallel data processing theory and how parallel processing is solved on Hadoop platform. The last part focuses on practical part to solve processing huge amounts of unstructured server logs on Hadoop platform. How can we analyze these data and get some valuable information from them. The result will be presented in reporting program QlikView. Result was also processed on classic SQL database, to compare Hadoop platform contribution in processing unstructured data.

Description

Import 22/07/2015

Subject(s)

Distributed file system, Hadoop, HDFS, MapReduce, NoSQL, parallel data processing, QlikView, unstructured data, ngram, Hive

Item identifier

http://hdl.handle.net/10084/108911

Collections

Vysokoškolské kvalifikační práce Fakulty elektrotechniky a informatiky / Theses and dissertations of Faculty of Electrical Engineering and Computer Science (FEI)

Show full item record

Zpracování velkých objemů nestrukturovaných dat na platformě Hadoop

Files

Downloads

Date issued

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Location

Signature

Abstract

Description

Delayed publication

Available after

Subject(s)

Citation

Item identifier

Collections