Nástroj pro distribuované zpracování dat

Abstract

The goal of this bachelor thesis is to create a tool for planning and managing distributed calculations on a server used for processing optical mapping data. The server contains terabytes of data in the form of hundreds of thousands of files, where manual processing is very time-consuming. The main benefit of this work is a tool divided into two parts. The first part, the configurator, helps the user organize input data using filters, assign them to existing calculation scripts, and set parameters for parallel processing. The second part, the execution module, provides task scheduling on server resources, manages parallel calculations, and monitors the processing progress. This thesis also analyzes the specific requirements for processing optical mapping data and presents a solution that allows efficient work with large numbers of files.

Description

Subject(s)

distributed data processing, optical mapping, bachelor's thesis

Citation