Projekt EXA4MIND
Permanent URI for this collectionhttp://hdl.handle.net/10084/152251
Browse
Recent Submissions
Item type: Item , D4.2 . Final Version of the ADAMS4SIMS System(2026-03-31) Číž, David; Furmánek, Radek; Pylat, Nestor; Otyepka, Michal; Banáš, Pavel; Ventsenostseva, Zhanna; Bogdanov, EvgenyThe Automated Data Mining System for Systematic Improvement of Molecular Simulations (ADAMS4SIMS) integrates a comprehensive suite of functionalities essential for the preprocessing, storage, and analysis of molecular dynamics (MD) data. The system consists of a back-end, a web-based GUI, and a PostgreSQL database. The HDF5 format is employed for managing large trajectory datasets effectively. While the preprocessing tools focus on metadata extraction and data preparation, the analytical tools available provide users with the capability to perform scalable analyses of their simulations, enabling systematic improvements in structural modeling.Item type: Item , D1.4 : Final Architecture, Lessons Learned, Benchmarks and Validation(2026-05-14) Bogdanov, Evgeny; Teran, Jose; Golasowski, Martin; Otyepka, Michal; Číž, David; Furmánek, Radek; Gurbanov, Huseyn; Chaudhary, Smaran; Pauw, Viktoria; Hayek, Mohamad; Karagöz, Pınar; Aslan, Bora; Duruoglu, Gokhan; Gordon, Stuart; Hurych, David; Schumann, Martin; Hachinger, Stephan; Culell, MariaThe EXA4MIND project enables Extreme Data analytics at supercomputing facilities with optimum data backends in its “Extreme Data Database” (EDD) – from object stores and flat file structures towards relational and document databases. The EXA4MIND/EDD platform facilitates advanced querying and indexing and leverages state-of-the-art orchestration of processing workflows and pipelines for reproducibility and high performance. It provides connectivity to major European data ecosystems such as EUDAT, EOSC, or European Data Spaces. This deliverable describes the final architecture of the solution, a multitude of lessons learned collected across the work packages throughout the project period, and our successful qualitative and quantitative platform validation efforts, including benchmarks.Item type: Item , D2.4 : Report on FAIRification of Database Data(2026-04-30) Chaudhary, Smaran; Číž, David; Golasowski, Martin; Gordon, Stuart; Hachinger, Stephan; Hayek, Mohamad; Khormali, Shahab; Hurych, David; Karagöz, Pınar; Pauw, Viktoria; Mlýnský, Vojtěch; Ozdemir, Aysel Yagmur; Hurych, DavidEXA4MIND provides a flexibly-deployable platform for Extreme Data analytics with the Extreme Data Database (EDD) framework at the core. On top of the actual data stores (databases, object stores) in the EDD, the project implements a “Dataset connectivity and FAIR Support” module, supporting the assignment of metadata and persistent identifiers to data according to the FAIR principles (findable, accessible, interoperable, reusable) of research data management. This module facilitates data publication and data exchange with European Data ecosystems (EUDAT, EOSC, Data Spaces). This deliverable describes the module (including a FAIR Data Subsetting Application Programming Interface as a novel approach to FAIR data in databases) and gives examples of how datasets from EXA4MIND are published. The work described is important for implementing EXA4MIND’s Data Management Plan and for reaching project objectives related to FAIR data publication and persistent-identifier assignment.Item type: Item , D3.4 Collection of APIs to External Applications(2026-04-30) Gelecek, Atakan; Karagöz, Pınar; Hachinger, Stephan; Číž, David; Golasowski, Martin; Hurych, David; Öztürk, Aslı Umay; Sever, Yiğit; Furmánek, Radek; Ponimatkin, GeorgyThis deliverable presents the collection of Application Programming Interfaces (APIs) integrated by the EXA4MIND project to enable interaction between the platform and external applications. The document describes the APIs that expose the core services of the EXA4MIND ecosystem, including AI inference capabilities, workflow orchestration, distributed data processing, database access, and supporting infrastructure services. The APIs provide standardized and infrastructure-agnostic access – in particular to the Advanced Querying and Indexing System (AQIS), which acts as the central analytics engine of the platform. Through these interfaces, external applications can trigger analytics workflows, execute machine learning inference tasks, access heterogeneous data storage systems, and integrate with distributed computing environments. The deliverable also briefly describes APIs developed in other work packages, including data ingestion pipelines, semantic search services, and domainspecific analytics applications such as the Aquaview soil moisture analysis system. Together, these APIs form a coherent integration layer that enables scalable data processing, advanced analytics, and interoperability across the EXA4MIND platform. The documented interfaces demonstrate how external tools and applications can utilise the platform’s capabilities for large-scale data analysis and AI-driven workflows.Item type: Item , D4.3 : Validation and Assessment Report of the ADAMS4SIMS System(2026-04-30) Banáš, Pavel; Číž, David; Mlýnský, Vojtěch; Otyepka, Michal; Furmánek, Radek; Pauw, Viktoria; Gelecek, Atakan; Gordon, StuartMolecular-dynamics (MD) simulations generate large and heterogeneous datasets that are difficult to store, reuse, compare, and exploit systematically. This limits force-field development, where assessment and tuning often still rely on manual post-processing of selected simulations and are therefore difficult to scale to broad benchmark datasets. ADAMS4SIMS (Automated Data Mining System for Systematic Improvement of Molecular Simulations), developed as the Scientific Application Case of the EXA4MIND project, addresses this bottleneck. It integrates MD trajectories, force-field libraries, and experimental Nuclear Magnetic Resonance observables into a platform for FAIR (findable, accessible, interoperable, reusable) research data management, analysis, and force-field benchmarking. The system supports the full workflow from raw-data upload and preprocessing to metadata extraction, standardized trajectory representation, database integration, interactive analysis, comparison with experimental data, and reweighting-based evaluation of alternative force-field parametrizations. This deliverable reports the validation and assessment of the ADAMS4SIMS system as an integrated scientific data platform. The validation covers three complementary dimensions: (i) internal technical validation of functionality, security, reliability, and role-based access control; (ii) workflow-level validation of scientific correctness against established reference tools and independently prepared calculations; and (iii) performance validation of preprocessing, data representation, and analysis efficiency. Technical validation confirms reliable operation of the backend services, authentication and authorization mechanisms, data-processing workflows, and graphical user interface for the defined user roles. Scientific validation shows that ADAMS4SIMS reproduces reference results for MD trajectory analyses, experimental-data comparison using normalized 𝜒2 statistics, and reweighting-based force-field evaluation. This was checked against analyses based on the well-proven CPPTRAJ tool, manual postprocessing workflows, and independently prepared reweighting calculations. Performance validation demonstrates that HDF5-based trajectory representation and distributed computation enable efficient access to large MD datasets and scalable analysis workflows. Overall, the results confirm that ADAMS4SIMS is a robust and scientifically reliable final EXA4MIND demonstrator for FAIR molecular-simulation data management, reproducible analysis, and data-driven force-field benchmarking.Item type: Item , D5.3 : Demonstration of the Tool for Industry Application Case and KPI Measures(2026-03-30) Hurych, David; Zanaska, Karel; Kartal, Ecem; Teye-Adjei, Anthony; Hachinger, StephanThis deliverable presents the final demonstration and validation of the tools and AI models developed for the EXA4MIND Industry Application Case, specifically targeting Advanced Driving Assistance Systems (ADAS). It outlines a data processing pipeline that leverages the Advanced Query and Indexing System (AQIS) for scalable, AI-based pre-annotation and smart data retrieval on extreme-scale datasets. The report validates the successful achievement of three primary Key Performance Indicators (KPIs). For KPI 6.1, the deployment of models such as YOLO v11, YOLO World, and SAM resulted in a 76% reduction in manual annotation time for object detection and a 74% reduction for semantic and instance segmentation. For KPI 6.2, the project built a robust database encompassing 19,259 hours of multi-sensor driving data. Finally, to fulfill KPI 6.3, the document demonstrates a ”smart database querying” mechanism that fuses natural language and image similarity searches to rapidly retrieve and analyze ADAS failure scenarios. These advancements enable targeted data curation and rapid, closed-loop AI adaptation, significantly improving the efficiency of ADAS developmentItem type: Item , D7.4 : Dissemination and Communication Final Report(2026-05-07) Culell Elustondo, María Ignacia; Espino, Alejandra; Slaninová, Kateřina; Dobiašová, Markéta; Derquennes, MarcThis deliverable provides the final overview of communication and dissemination activities carried out throughout the EXA4MIND project. It consolidates the outreach actions implemented during the project lifetime and assesses how these activities contributed to the visibility of project results, scientific outputs and ecosystem collaborations. The report also documents engagement with research communities, participation in international events and Open Science dissemination practices, highlighting how EXA4MIND positioned itself within the European landscape of extreme data analytics and high-performance computing.Item type: Item , D8.3 : Data Management Plan (Updated)(2026-04-30) Slaninová, Kateřina; Hachinger, Stephan; Bogdanov, Evgeny; Mlýnský, Vojtěch; Otyepka, Martin; Hurych, David; Grakova, Ekaterina; Culell, María IgnaciaThe Data Management Plan lays out our concept for handling main aspects of the life cycle of the project data (data organisation and long-term storage, access, preservation and sharing). This document also includes a specification of outputs (datasets generated during the project).Item type: Item , D6.2 : Demonstration of the TERRAVIEW Application Case(2025-10-31) Bogdanov, Evgeny; Gutta, Kiran; Teran, José; Chaudhary, Smaran; Furmánek, Radek; Hachinger, Stephan; Ventsenostseva, Zhanna; Öztürk, Aslı UmayThis deliverable demonstrates the outcome of the SME/Smart - Viticulture application case of EXA4MIND. The work has focused on a Soil Moisture Content (SMC) monitoring system capable of processing large-scale satellite imagery to support viticulture. This “A quaview” platform is shown, assessed and validated. A central component of this endeavour is a rigorous benchmarking study that evaluates the efficiency of the EXA4MIND "Extreme Data" architecture against traditional data retrieval methods. EXA4MIND with its Advanced Query and Indexing System offers a continuously-updated satellite data cache for Aquaview via an iRODS object store. The performance of selective Cloud-Optimized GeoTIFF (COG) access via this cache is compared to legacy full-file downloads. The results demonstrate that the integrated solution achieves significant improvements in data ingestion speeds and substantial reductions in payload sizes. These findings validate the scalability of the Aquaview pipeline, confirming its readiness to address the critical water management challenges of the wine industry using infrastructure at supercomputing centres.Item type: Item , D3.3 : Query Performance Report(2025-10-31) Gelecek, Atakan; Koç, Robin; KARAGÖZ, Pinar; Pauw, Viktoria; Ventsenostseva, Zhanna; Furmánek, RadekThe EXA4MIND project enables Extreme Data analytics, leveraging top-notch computing facilities and optimum data stores – from object stores over relational databases to NoSQL databases. EXA4MIND aims to facilitate data-flow connectivity between its database and object-storage deployments, computing systems at supercomputing centres, and the European data ecosystem (EUDAT, EOSC, European Data Spaces). In such demanding setting, using the databases efficiently and effectively is critical. Deliverable D3.3 presents how different types of databases and data storage systems are used in the Platform and Application Cases with a focus on improvement provided with the effective data management in the AQIS workflows. The obtained improvements are presented under two subtitles: (i) Functional Improvements, and (ii) Performance Improvements. The functional improvements describe the new functionalities provided to the Application Cases with the use of databases and database querying, whereas the performance improvements include the improvements obtained in comparison to previous status of the functionalities and services by the Application Cases with the integration of databases or tailoring the way databases are accessed and queried. The solutions we present are also applicable to a variety of other possible use cases.Item type: Item , D3.2 : Data Preprocessing Toolbox(2025-10-31) Gurbanov, Huseyn; Chaudhary, Smaran; Hachinger, Stephan; Öztürk, Aslı Umay; Hurych, David; Gutta, Kiran Babu; Foltýn Ladislav; Ventsenostseva, ZhannaThis deliverable introduces the “Toolboxes” module of the EXA4MIND Platform, extending and covering the idea of a Data Preprocessing Toolbox developed within Work Package 3 (WP3, Extreme Data Analytics and Processing) of the EXA4MIND project. The Preprocessing Toolbox contains a set of generic and application-specific preprocessing tools that enable data cleaning, transformation, fusion, and harmonisation across heterogeneous data sources. Processing tools focusing on validation have been made into an own Validation Toolbox submodule, and an Analytics & Artificial Intelligence (AI) Toolbox is being compiled as well. With a uniform approach to command line interfacing and curated code, our Toolboxes provide examples that users can use or adapt for their own Extreme Data applications. While such applications are too individual to cater for them with a reasonably limited set of tools, the idea of our Toolboxes (Preprocessing, Validation and Analytics & AI) is thus to enable the users to construct their own processing steps within advanced data-driven workflows. Preprocessing steps in particular ensure consistency and quality of input data, and thus lay the foundation for effective querying and analytics services across the Extreme Data platform. The Preprocessing Toolbox has been designed in close alignment with WP2 (Data Spaces Management), enabling seamless integration between distributed data spaces and advanced analytics capabilities. In particular, it is easy to employ the Toolboxes and individual tools on data from different data sources, constructing data-driven workflows within an instance of the EXA4MIND Advanced Query and Indexing System (AQIS, from WP3). Including a fairness-check tool and validation mechanisms, the Toolboxes support the overall project objective of enabling trustworthy, green, and fair AI. This document complements the Open Source code repositories which are the main part of the deliverable. It briefly presents motivation and design principles, and gives pointers to the actual implementations and usage instructions.Item type: Item , D5.2 : New AI Models to Reduce Manual Annotation(2024) Vobecký, Antonín; Šivic, Josef; Hurych, David; Martinovič, Tomáš; Görkem Özer, ArifThis deliverable presents two new AI models developed in Task 5.3 to reduce the amount of required manual annotations in the context of annotating large-scale multi-modal data in autonomous driving in WP5. The developed models build on recent breakthrough advancements in (multi-modal) self-supervised machine learning. We address two key perception tasks for the automatic annotation of data from autonomous driving. We start with models for object instance detection and segmentation. Then we proceed to models for semantic segmentation but address the semantic segmentation problem in a novel 2D-3D setting. For both problems, we address the open vocabulary setting, which enables reasoning about any object class that can be specified in natural language. This is an important set-up for autonomous driving as it enables describing also unusual situations and corner cases. The developed models have been deployed on use-case-specific data provided by VALEO and are ready to be integrated into the tool for testing and validation of advanced driver assistance systems developed in Task 5.2.Item type: Item , D1.3 : Training Plan(2024) Echarte, Arantxa; Karagöz, Pinar; Derquennes, MarcDeliverable D1.3 Training Plan, developed under Task T1.4 of Work Package 1 (WP1) within the EXA4MIND project, outlines a comprehensive strategy for fostering a collaborative community that will drive innovation, knowledge exchange, and the widespread adoption of the project’s outcomes. The primary goal is to ensure the long-term impact, sustainability, and relevance of EXA4MIND technologies by engaging stakeholders such as researchers, industry professionals, policymakers, and end users. The training program includes diverse activities such as webinars, hands-on activities, and industry engagement sessions, designed to build expertise in areas like Extreme Data Mining, Large Language Models, and High-Performance Computing. Key training materials and resources will be delivered through the EXAKI self-service platform and other online and in-person events. Feedback mechanisms will ensure EXA4MIND considers continuous improvement and alignment with community needs. Through these initiatives, the EXA4MIND project aims to create a dynamic learning environment, empowering participants to implement cutting-edge technologies and fostering a vibrant community focused on advanced data analytics and innovation.Item type: Item , D8.1 : Data Management Plan(Vysoká škola báňská – Technická univerzita Ostrava. IT4I, 2023) Slaninová, Kateřina; Hachinger, Stephan; Harsh, Piyush; Mlýnský, Vojtěch; Šivic, Josef; Vobecký, Antonín; Hurych, David; Zahradník, Jan; Freani, Jérôme; Karagoz, Pinar; Derquennes, MarcThe Data Management Plan lays out our planning for handling main aspects of the life cycle of the project data (data organisation and long-term storage, access, preservation, and sharing). This document also includes a preliminary specification of outputs (what data will be generated during the project). It is a living document and will be continuously updated during the project.Item type: Item , D7.1 : Impact Master Plan(Vysoká škola báňská – Technická univerzita Ostrava. IT4I, 2023) Echarte, Arantxa; Espino, Alejandra; Derquennes, Marc; Lopez, Diana; Martinovič, Jan; Slaninová, Kateřina; Dobiašová, Markéta; Harsh, Piyush; Hachinger, StephanThis document outlines the planning of the dissemination, communication, exploitation and standardisation strategies for the EXA4MIND Horizon Europe project. This planning will be of relevance throughout the duration of the project and will be revisited periodically as it progresses.Item type: Item , D2.2 : Data and Workflow Management Toolbox Alpha Status Report(Vysoká škola báňská – Technická univerzita Ostrava. IT4I, 2023) Hayek, Mohamad; Golasowski, Martin; Hachinger, Stephan; Martinovič, Jan; Číž, David; Hurych, David; Harsh, Piyush; Martinovič, Tomáš; Zahradník, JanThe EXA4MIND project connects pre-eminent databases and data management systems to supercomputing systems and European Data Spaces as well as the world of FAIR research data. The core purpose of this endeavour is running next-generation Extreme Data workflows, with emphasis on data analytics, Machine Learning / Artificial Intelligence, or classical simulations. This deliverable reports on the Data and Workflow Management Toolbox provided for this purpose, building upon the successful LEXIS Platform (delivered by the H2020 project, GA 825532). Furthermore, it illustrates the first workflows run by our application cases at supercomputing centres as a basis for the milestone MS5 First Data-driven Workflows have been Executed using Systems at Supercomputing Centres.Item type: Item , D2.1 : Extreme Data Flow Patterns – Report(Vysoká škola báňská – Technická univerzita Ostrava. IT4I, 2023) Golasowski, Martin; Hayek, Mohamad; Číž, David; Hurych, David; Zahradník, Jan; Harsh, Piyush; Serbetot, Loïck; Karagoz, Pinar; Freani, JérômeThis deliverable of the EXA4MIND project collects and analyses data flow patterns from all the project application cases. The collected data flow descriptions are used to identify a set of common occurring patterns that will be taken into account when designing the Extreme Data Database.Item type: Item , D1.1 : Application Cases and Architecture Requirements(Vysoká škola báňská – Technická univerzita Ostrava. IT4I, 2023) Hayek, Mohamad; Golasowski, Martin; Karagosz, Pinar; Číž, David; Zahradník, Jan; Hurych, David; Harsh, Piyush; Freani, Jérôme; Vobecký, Antonín; Hakki Toroslu, IsmailThis document is the first deliverable of the EXA4MIND project. It contains requirements provided by the project’s application-case work packages WP4-WP6 and their mapping to the EXA4MIND Platform features. The document is roughly divided into two parts. The first part is containing a unified description of each application case and its requirements. The second half of the document contains the mapping of the requirements to the technical features of the EXA4MIND Platform and the project objectives provided by the technical work packages WP1-WP3.