Distribuované datové struktury pro masivně paralelní zpracování dat

Loading...
Thumbnail Image

Downloads

11

Date issued

Authors

Nedbálek, Aleš

Journal Title

Journal ISSN

Volume Title

Publisher

Vysoká škola báňská - Technická univerzita Ostrava

Location

Signature

Abstract

The growing trend of processing large amounts of data leads to the distribution load among multiple nodes and creation of scalable distributed data structures - SDDS. The distribution of data allows parallel processing, increase throughput and duplication data between nodes can ensure availability in the case of failure. These properties are necessary for applications with an emphasis on accessibility and a large number of clients. In this work we present the summary of each SDDS with a description of the distribution and data decomposition between nodes. These structures can be divided according to the concept used for linear hash and tree data structures. Development suggested the rules and we followed them to create own concepts SDDS. Decomposition and the distribution of view on the data, we propose own solution. The whole concept is implement in the C++ language. Serializing a call method and communication we want take from publicly available API libraries. Then we decided for their own implementation. We have designed and implemented a method for remote method calls using two commands Command and ResultSet. Testing communication on TCP and UDP protocols. Data structures like R-tree and B-tree for testing were supplied. Implementation has also brought many problems and different solutions together with tests (serializing access, network environment, threads). The result of the implementation is a multi-threaded server application and client enable to use various data structures. The real utilization found the application in to the project SGS Detection plagiarism documents. Access to the application provides a web client in ASP.NET. Tests of the network communication have shown us bandwidth constraints in a real network. Finally, we conducted tests of SDDS and embedded solutions for the B-tree and R-tree. Unfortunately, demonstrated in tests virtualization environment and lack of hardware resources. We did not achieve the expected throughput with scalable data replication. Despite these difficulties the results are interesting. When inserting data we decreases permeability with increase data replication. Results of the point queries referred to the proportional grow throughput with the numbers of data duplicity and the range querys are quite approximate to throughput embedded solutions.

Description

Import 26/06/2013

Subject(s)

Linear hash, Tree data structures, Distributed data structures, Massive parallel data management, R-tree, B-tree

Citation