Data management for distributed computational workflows: An iRODS-based setup and its performance

dc.contributor.authorHayek, Mohamad
dc.contributor.authorGolasowski, Martin
dc.contributor.authorHachinger, Stephan
dc.contributor.authorGarcía-Hernández, Ruben J.
dc.contributor.authorMunke, Johannes
dc.contributor.authorLindner, Gabriel
dc.contributor.authorSlaninová, Kateřina
dc.contributor.authorTunka, Philipp
dc.contributor.authorVondrák, Vít
dc.contributor.authorKranzlmüller, Dieter
dc.contributor.authorMartinovič, Jan
dc.date.accessioned2026-05-04T10:16:40Z
dc.date.available2026-05-04T10:16:40Z
dc.date.issued2026
dc.description.abstractModern data-management frameworks promise a flexible and efficient management of data and metadata across storage backends. However, such claims need to be put to a meaningful test in daily practice. We conjecture that such frameworks should be fit to construct a data backend for workflows which use geographically distributed high-performance and cloud computing systems. Cross-site data transfers within such a backend should largely saturate network bandwidth, in particular when parameters such as buffer sizes are optimized. To explore this further, we evaluate the "integrated Rule-Oriented Data System" iRODS with EUDAT's B2SAFE module as data backend for the "Distributed Data Infrastructure" within the LEXIS Platform for complex computing workflow orchestration and distributed data management. The focus of our study is on testing our conjectures-i.e., on construction and assessment of the data infrastructure and on measurements of data-transfer performance over the wide-area network between two selected supercomputing sites connected to LEXIS. We analyze limitations and identify optimization opportunities. Efficient utilization of the available network bandwidth is possible and depends on suitable client configuration and file size. Our work shows that systems such as iRODS nowadays fit the requirements for integration in federated computing infrastructures involving web-based authentication flows with OpenID Connect and rich on-line services. We are continuing to exploit these properties in the EXA4MIND project, where we aim at optimizing data-heavy workflows, integrating various systems for managing structured and unstructured data.
dc.description.firstpageart. no.e0340757
dc.description.issue1
dc.description.sourceWeb of Science
dc.description.volume21
dc.identifier.citationPLOS One. 2026, vol. 21, issue 1, art. no.e0340757.
dc.identifier.doi10.1371/journal.pone.0340757
dc.identifier.issn1932-6203
dc.identifier.urihttp://hdl.handle.net/10084/158553
dc.identifier.wos001660620300010
dc.language.isoen
dc.publisherPLOS
dc.relation.ispartofseriesPLOS One
dc.relation.urihttps://doi.org/10.1371/journal.pone.0340757
dc.rights© 2026 Mohamad Hayek et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
dc.rights.accessopenAccess
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.titleData management for distributed computational workflows: An iRODS-based setup and its performance
dc.typearticle
dc.type.statusPeer-reviewed
dc.type.versionpublishedVersion
local.files.count1
local.files.size3352125
local.has.filesyes

Files

Original bundle

Now showing 1 - 1 out of 1 results
Loading...
Thumbnail Image
Name:
1932-6203-2026v21i1ane0340757.pdf
Size:
3.2 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 out of 1 results
Loading...
Thumbnail Image
Name:
license.txt
Size:
718 B
Format:
Item-specific license agreed upon to submission
Description: