Vzorkování rozsáhlých dat

Abstract

The aim of this diploma thesis is to analyze, describe and implement of sampling methods for vector and network data with respect to big data. On the beginning the thesis focuses on theoretical basics of big data. Furthermore, in theory, the terms of vector, vector space and network data, which are represented in the form of graph, are introduced. Next chapter focuses on the statistical point of view for data sampling. The last theoretical chapter presents MPI standard for parallel processing in .NET. The next two chapters are more practical and show existing sampling methods for vector and network data including their showcase implementation. In the end of this thesis all algorithms are tested on real datasets and evaluated.

Description

Import 03/11/2016

Subject(s)

data sampling, big data, vector data, network data, C#, message passing interface, master thesis

Citation