HyperQueue: Efficient and ergonomic task graphs on HPC clusters

dc.contributor.authorBeránek, Jakub
dc.contributor.authorBöhm, Ada
dc.contributor.authorPalermo, Gianluca
dc.contributor.authorMartinovič, Jan
dc.contributor.authorJansík, Branislav
dc.date.accessioned2025-04-02T07:39:05Z
dc.date.available2025-04-02T07:39:05Z
dc.date.issued2024
dc.description.abstractTask graphs are a popular method for defining complex scientific simulations and experiments that run on distributed and HPC (High-performance computing) clusters, because they allow their authors to focus on the problem domain, instead of low-level communication between nodes, and also enable quick prototyping. However, executing task graphs on HPC clusters can be problematic in the presence of allocation managers like PBS or Slurm, which are not designed for executing a large number of potentially short-lived tasks with dependencies. To make task graph execution on HPC clusters more efficient and ergonomic, we have created HYPERQUEUE, an open-source task graph execution runtime tailored for HPC use-cases. It enables the execution of large task graphs on top of an allocation manager by aggregating tasks into a smaller amount of PBS/Slurm allocations and dynamically load balances tasks amongst all available nodes. It can also automatically submit allocations on behalf of the user, it supports arbitrary task resource requirements and heterogeneous HPC clusters, it is trivial to deploy and does not require elevated privileges.cs
dc.description.firstpageart. no. 101814cs
dc.description.sourceWeb of Sciencecs
dc.description.volume27cs
dc.identifier.citationSoftwareX. 2024, vol. 27, art. no. 101814.cs
dc.identifier.doi10.1016/j.softx.2024.101814
dc.identifier.issn2352-7110
dc.identifier.urihttp://hdl.handle.net/10084/155847
dc.identifier.wos001267389800001
dc.language.isoencs
dc.publisherElseviercs
dc.relation.ispartofseriesSoftwareXcs
dc.relation.urihttps://doi.org/10.1016/j.softx.2024.101814cs
dc.rights© 2024 The Authors. Published by Elsevier B.V.cs
dc.rights.accessopenAccesscs
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/cs
dc.subjectdistributed computingcs
dc.subjecttask schedulingcs
dc.subjecthigh performance computingcs
dc.subjectjob managercs
dc.titleHyperQueue: Efficient and ergonomic task graphs on HPC clusterscs
dc.typearticlecs
dc.type.statusPeer-reviewedcs
dc.type.versionpublishedVersioncs

Files

Original bundle

Now showing 1 - 1 out of 1 results
Loading...
Thumbnail Image
Name:
2352-7110-2024v27an101814.pdf
Size:
641.01 KB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 out of 1 results
Loading...
Thumbnail Image
Name:
license.txt
Size:
718 B
Format:
Item-specific license agreed upon to submission
Description: