Paralelní zpracování dat a možnosti datové analytiky v rámci Big Data

Derján, Lukáš

Paralelní zpracování dat a možnosti datové analytiky v rámci Big Data

Files

DER0007_EKF_N6209_6209T025_2015.pdf (2.86 MB)

DER0007_EKF_N6209_6209T025_2015_posudek_vedouci_Tvrdikova_Milena.pdf (485.39 KB)

DER0007_EKF_N6209_6209T025_2015_posudek_oponent_Dragolov_Daniel.pdf (704.15 KB)

Downloads

92

Date issued

2015

Authors

Derján, Lukáš

Publisher

Vysoká škola báňská - Technická univerzita Ostrava

Abstract

The diploma thesis focuses on analysing the way of working and processing the high-volume unstructured datasets, called Big Data. Reader will find out more about the architecture of Big Data-oriented solutions and its comparison with the traditional architecture of Business Intelligence solutions (BI). Now traditional Business Intelligence tools and solutions are still not technologically ready for processing Big Data. This has led into emergence of new approaches to parallel data processing and the new Big Data-oriented, technologies. Data analytics is playing an important role when talking about the Big Data. If using relevant analysis, organizations can get more information about their customers, uncover hidden relationships in data and increase their profits and customers loyalty. There is a platform that is technologically ready for processing and analysing Big Data. The Apache Hadoop. This platform is more described within the theoretical part, where the terms of Big Data and parallel data processing are explained, as well as in practical part of the diploma thesis, where the platform is used for analytical processing of the pre-selected data file. Thus basic features of a programming framework MapReduce and a distributed file system HDFS (together forming the Hadoop implementation) are explained. In terms of applicability the implementation of analytical tasks according to customer requirements is the real outcome. An increasing number of analytical platforms deployment on top of existing BI solutions in organizations and the ever-increasing volume of publicly available data, is then in social terms, a potentially problematic area that sooner or later hit the barriers personal privacy. The practical part of the thesis is based on the project requirements from the client company. The project is focused on finding the suitability of Big Data Hadoop platform for running analytical tasks over the relatively small datasets. To verify the suitability the n-gram analysis was used the selected data file. MapReduce framework as well as in-memory solutions Spark and TEZ has been used as the engines within the Hadoop platform. The conclusions of the thesis has been used as input for further decisions making regarding building the Big Data architecture within the organization and evaluation necessary transformation of existing BI solution for Hadoop platform.

Description

Import 22/07/2015

Subject(s)

Big Data, Apache Hadoop, Data analysis, Parallel data processing, Business Intelligence, n-gram analysis, in-memory solutions, Hive, MapReduce, Spark, TEZ

Item identifier

http://hdl.handle.net/10084/107001

Collections

Vysokoškolské kvalifikační práce Ekonomické fakulty / Theses and dissertations of Faculty of Economics (EKF)

Show full item record

Paralelní zpracování dat a možnosti datové analytiky v rámci Big Data

Files

Downloads

Date issued

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Location

Signature

Abstract

Description

Delayed publication

Available after

Subject(s)

Citation

Item identifier

Collections