Ergonomie a efektivita workflow na HPC klastrech

Beránek, Jakub

Ergonomie a efektivita workflow na HPC klastrech

Files

BER0134_USP_P2658_2612V078_2024.pdf (3.16 MB)

BER0134_USP_P2658_2612V078_2024_posudek_oponent_Cardoso_Joao_Manuel_Paiva.pdf (153.62 KB)

BER0134_USP_P2658_2612V078_2024_posudek_oponent_Dvorsky_Jiri.pdf (146.43 KB)

BER0134_USP_P2658_2612V078_2024_posudek_oponent_Toroslu_Ismail_Hakki.pdf (141.22 KB)

Downloads

62

Date issued

2024

Authors

Beránek, Jakub

Publisher

Vysoká škola báňská – Technická univerzita Ostrava

Abstract

This thesis deals with the execution of task graphs on High-performance Computing (HPC) clusters (supercomputers), with a focus on efficient usage of hardware resources and ergonomic interfaces for task graph submission. Task-based programming is a popular approach for defining scientific workflows that can be computed on distributed clusters. However, executing task graphs on su- percomputers introduces unique challenges, such as performance issues caused by the large scale of HPC workflows or cumbersome interactions with HPC allocation managers like PBS (Portable Batch System) or Slurm. This work examines what are the main challenges in this area and how do they affect task graph execution, and it proposes various approaches for alleviating these challenges, both in terms of efficiency and developer ergonomics. This thesis provides three main contributions. Firstly, it provides a task graph simulation en- vironment that enables prototyping and benchmarking of various task scheduling algorithms, and performs a comprehensive study of the performance of various task schedulers using this envi- ronment. Secondly, it analyzes the bottlenecks and overall performance of a state-of-the-art task runtime Dask and provides an implementation of an alternative Dask server which significantly improves its performance in HPC use-cases. And primarily, it introduces a unified meta-scheduling and resource management design for effortless execution of task graphs on heterogeneous HPC clusters that facilitates efficient usage of hardware resources. It also provides a reference im- plementation of this design within an HPC-tailored task runtime called HyperQueue, which is available as open-source software under the MIT (Massachusetts Institute of Technology) license at https://github.com/it4innovations/hyperqueue.

Subject(s)

distributed computing, task graphs, heterogeneous resources, high-performance computing

Item identifier

http://hdl.handle.net/10084/155691

Collections

Vysokoškolské kvalifikační práce univerzitních studijních programů / Theses and dissertations of University Study Programmes

Show full item record

Ergonomie a efektivita workflow na HPC klastrech

Files

Downloads

Date issued

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Location

Signature

Abstract

Description

Subject(s)

Citation

Item identifier

Collections