Distribuované datové struktury pro masivně paralelní zpracování dat
Loading...
Downloads
11
Date issued
Authors
Nedbálek, Aleš
Journal Title
Journal ISSN
Volume Title
Publisher
Vysoká škola báňská - Technická univerzita Ostrava
Location
Signature
Abstract
The growing trend of processing large amounts of data leads to the distribution load
among multiple nodes and creation of scalable distributed data structures - SDDS. The
distribution of data allows parallel processing, increase throughput and duplication data
between nodes can ensure availability in the case of failure. These properties are necessary
for applications with an emphasis on accessibility and a large number of clients.
In this work we present the summary of each SDDS with a description of the distribution
and data decomposition between nodes. These structures can be divided according to the
concept used for linear hash and tree data structures.
Development suggested the rules and we followed them to create own concepts SDDS.
Decomposition and the distribution of view on the data, we propose own solution. The
whole concept is implement in the C++ language. Serializing a call method and communication
we want take from publicly available API libraries. Then we decided for their
own implementation. We have designed and implemented a method for remote method
calls using two commands Command and ResultSet. Testing communication on TCP and
UDP protocols. Data structures like R-tree and B-tree for testing were supplied. Implementation
has also brought many problems and different solutions together with tests
(serializing access, network environment, threads).
The result of the implementation is a multi-threaded server application and client enable
to use various data structures. The real utilization found the application in to the project
SGS Detection plagiarism documents. Access to the application provides a web client in
ASP.NET. Tests of the network communication have shown us bandwidth constraints in
a real network.
Finally, we conducted tests of SDDS and embedded solutions for the B-tree and R-tree.
Unfortunately, demonstrated in tests virtualization environment and lack of hardware
resources. We did not achieve the expected throughput with scalable data replication.
Despite these difficulties the results are interesting. When inserting data we decreases
permeability with increase data replication. Results of the point queries referred to the
proportional grow throughput with the numbers of data duplicity and the range querys
are quite approximate to throughput embedded solutions.
Description
Import 26/06/2013
Subject(s)
Linear hash, Tree data structures, Distributed data structures, Massive parallel
data management, R-tree, B-tree