D3.2 : Data Preprocessing Toolbox

dc.contributor.authorGurbanov, Huseyn
dc.contributor.authorChaudhary, Smaran
dc.contributor.authorHachinger, Stephan
dc.contributor.authorÖztürk, Aslı Umay
dc.contributor.authorHurych, David
dc.contributor.authorGutta, Kiran Babu
dc.contributor.refereeFoltýn Ladislav
dc.contributor.refereeVentsenostseva, Zhanna
dc.date.accessioned2026-02-03T11:31:44Z
dc.date.available2026-02-03T11:31:44Z
dc.date.issued2025-10-31
dc.description.abstractThis deliverable introduces the “Toolboxes” module of the EXA4MIND Platform, extending and covering the idea of a Data Preprocessing Toolbox developed within Work Package 3 (WP3, Extreme Data Analytics and Processing) of the EXA4MIND project. The Preprocessing Toolbox contains a set of generic and application-specific preprocessing tools that enable data cleaning, transformation, fusion, and harmonisation across heterogeneous data sources. Processing tools focusing on validation have been made into an own Validation Toolbox submodule, and an Analytics & Artificial Intelligence (AI) Toolbox is being compiled as well. With a uniform approach to command line interfacing and curated code, our Toolboxes provide examples that users can use or adapt for their own Extreme Data applications. While such applications are too individual to cater for them with a reasonably limited set of tools, the idea of our Toolboxes (Preprocessing, Validation and Analytics & AI) is thus to enable the users to construct their own processing steps within advanced data-driven workflows. Preprocessing steps in particular ensure consistency and quality of input data, and thus lay the foundation for effective querying and analytics services across the Extreme Data platform. The Preprocessing Toolbox has been designed in close alignment with WP2 (Data Spaces Management), enabling seamless integration between distributed data spaces and advanced analytics capabilities. In particular, it is easy to employ the Toolboxes and individual tools on data from different data sources, constructing data-driven workflows within an instance of the EXA4MIND Advanced Query and Indexing System (AQIS, from WP3). Including a fairness-check tool and validation mechanisms, the Toolboxes support the overall project objective of enabling trustworthy, green, and fair AI. This document complements the Open Source code repositories which are the main part of the deliverable. It briefly presents motivation and design principles, and gives pointers to the actual implementations and usage instructions.
dc.description.placeofpublicationOstrava
dc.format25 listů : ilustrace
dc.identifier.urihttp://hdl.handle.net/10084/158254
dc.language.isoen
dc.rightsAttribution-NoDerivatives 4.0 Internationalen
dc.rights.accessopenAccess
dc.rights.urihttp://creativecommons.org/licenses/by-nd/4.0/
dc.subjectData Preprocessing
dc.subjectData Validation
dc.subjectData Analytics
dc.subjectArtificial Intelligence
dc.subjectEXA4MIND Toolboxes
dc.subjectExtreme Data
dc.titleD3.2 : Data Preprocessing Toolbox
dc.typereport
dc.type.versionsubmittedVersion
local.files.count1
local.files.size3202398
local.has.filesyes

Files

Original bundle

Now showing 1 - 1 out of 1 results
Loading...
Thumbnail Image
Name:
EXA4MIND_D3.2_DataPreprocessingToolbox.pdf
Size:
3.05 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 out of 1 results
Loading...
Thumbnail Image
Name:
license.txt
Size:
718 B
Format:
Item-specific license agreed upon to submission
Description:

Collections