D3.2 : Data Preprocessing Toolbox
| dc.contributor.author | Gurbanov, Huseyn | |
| dc.contributor.author | Chaudhary, Smaran | |
| dc.contributor.author | Hachinger, Stephan | |
| dc.contributor.author | Öztürk, Aslı Umay | |
| dc.contributor.author | Hurych, David | |
| dc.contributor.author | Gutta, Kiran Babu | |
| dc.contributor.referee | Foltýn Ladislav | |
| dc.contributor.referee | Ventsenostseva, Zhanna | |
| dc.date.accessioned | 2026-02-03T11:31:44Z | |
| dc.date.available | 2026-02-03T11:31:44Z | |
| dc.date.issued | 2025-10-31 | |
| dc.description.abstract | This deliverable introduces the “Toolboxes” module of the EXA4MIND Platform, extending and covering the idea of a Data Preprocessing Toolbox developed within Work Package 3 (WP3, Extreme Data Analytics and Processing) of the EXA4MIND project. The Preprocessing Toolbox contains a set of generic and application-specific preprocessing tools that enable data cleaning, transformation, fusion, and harmonisation across heterogeneous data sources. Processing tools focusing on validation have been made into an own Validation Toolbox submodule, and an Analytics & Artificial Intelligence (AI) Toolbox is being compiled as well. With a uniform approach to command line interfacing and curated code, our Toolboxes provide examples that users can use or adapt for their own Extreme Data applications. While such applications are too individual to cater for them with a reasonably limited set of tools, the idea of our Toolboxes (Preprocessing, Validation and Analytics & AI) is thus to enable the users to construct their own processing steps within advanced data-driven workflows. Preprocessing steps in particular ensure consistency and quality of input data, and thus lay the foundation for effective querying and analytics services across the Extreme Data platform. The Preprocessing Toolbox has been designed in close alignment with WP2 (Data Spaces Management), enabling seamless integration between distributed data spaces and advanced analytics capabilities. In particular, it is easy to employ the Toolboxes and individual tools on data from different data sources, constructing data-driven workflows within an instance of the EXA4MIND Advanced Query and Indexing System (AQIS, from WP3). Including a fairness-check tool and validation mechanisms, the Toolboxes support the overall project objective of enabling trustworthy, green, and fair AI. This document complements the Open Source code repositories which are the main part of the deliverable. It briefly presents motivation and design principles, and gives pointers to the actual implementations and usage instructions. | |
| dc.description.placeofpublication | Ostrava | |
| dc.format | 25 listů : ilustrace | |
| dc.identifier.uri | http://hdl.handle.net/10084/158254 | |
| dc.language.iso | en | |
| dc.rights | Attribution-NoDerivatives 4.0 International | en |
| dc.rights.access | openAccess | |
| dc.rights.uri | http://creativecommons.org/licenses/by-nd/4.0/ | |
| dc.subject | Data Preprocessing | |
| dc.subject | Data Validation | |
| dc.subject | Data Analytics | |
| dc.subject | Artificial Intelligence | |
| dc.subject | EXA4MIND Toolboxes | |
| dc.subject | Extreme Data | |
| dc.title | D3.2 : Data Preprocessing Toolbox | |
| dc.type | report | |
| dc.type.version | submittedVersion | |
| local.files.count | 1 | |
| local.files.size | 3202398 | |
| local.has.files | yes |
Files
Original bundle
1 - 1 out of 1 results
Loading...
- Name:
- EXA4MIND_D3.2_DataPreprocessingToolbox.pdf
- Size:
- 3.05 MB
- Format:
- Adobe Portable Document Format
License bundle
1 - 1 out of 1 results
Loading...
- Name:
- license.txt
- Size:
- 718 B
- Format:
- Item-specific license agreed upon to submission
- Description: