A COMPREHENSIVE FRAMEWORK ENABLING THE DELIVERY OF TRUSTWORTHY DATASETS FOR EFFICIENT AIOT OPERATION
A COMPREHENSIVE FRAMEWORK ENABLING THE DELIVERY OF TRUSTWORTHY DATASETS FOR EFFICIENT AIOT OPERATION
Artificial Intelligence (AI) and data engineering technologies are now hot topics for research and innovation to be adopted for industrial innovation and boost the EU economy, as they are key enablers for the ongoing digital transformation strategies in various business sectors and critical infrastructures. At the same time, with the emergence of the Internet of Things (IoT) and computing (IoT-Edge-Cloud) continuum technologies (e.g., edge networks, storage), huge amounts of sensor data are rapidly generated and properly treated to address strategic challenges in various industrial domains, serving as a critical fuelling power for AI. This observation is in line with the European strategy for data and holds a prominent role in ICT systems development that manage and control IoT data generation and analysis across the computing continuum, to enable smarter, more efficient, and more responsive devices in running critical business operations and infrastructures.
Such Artificial Intelligence of Things (AIoT) systems and their respective services/applications require a large amount of real data at scale to increase their accuracy, robustness, and sustainability. However, the availability of such datasets in many sectors is rather limited due to various restrictions, such as: i) the cost for installing and instrumenting an IoT smart space and the infrastructure for real data collection and maintenance in local and/or federal data warehouses; ii) the expected labour intensiveness might be unaffordable, considering that the resulting dataset might be limited in its scale; iii) sensor data may not exist for all scenarios of interest; and iv) issues with respect to the rights for confidentiality and vulnerability disclosure. To overcome these restrictions, scientists rely on offline modelling and simulation techniques, which raise essential difficulties due to the domain gap between real and synthetically derived data. To address this, Machine Learning (ML)-based techniques have been applied to directly learn modelling parameters. While such efforts can help in understanding properties of the space for similar scenarios, adopting them in other settings of interest is difficult without considering the semantics of underlying attributes and relationships in the smart space. However, properly accounting for the domain knowledge and efficiently harnessing it for better AI data driven models is far beyond trivial. In addition, the data is often imbalanced, so that the nominal system behaviour is usually over-represented and need to be reduced in dimensionality, while the data about anomalous, rare events, or non-typical system’s behaviour is scarce. Finally, the dynamic nature of smart spaces, especially critical infrastructure ecosystems should also be taken into consideration.
In this context, the PANDORA project aims to develop a comprehensive AI-based and domain-informed framework, to optimise the preparation and the delivery of complete and trustworthy datasets for training and enhancing AI models deployed in AIoT systems (Phase 1). Furthermore, PANDORA aims to increase the degree of the autonomy, trustworthiness and energy efficiency of the relevant processes for designing these AI models and managing and using the respective IoT-enabled datasets in smart space ecosystems (Phase 2).
To conduct advance research excellence in the development of resilient, transparent, and human-centred AI approaches towards optimised and autonomous data processing and use.
To provide novel methods, mechanisms, and tools for the development of customisable, and trustworthy datasets for model-based AI developments.
To support the development and the continual autonomous operations of robust and energy efficient “data in AI” pipelines across the computing continuum.
To provide a cross-sector and multi-variant smart data space to realise the PANDORA framework and validate the data-enabled trustworthy AI mechanisms in real life scenarios.
To foster synergetic approaches in the EU industrial and scientific research communities and promote international collaboration on efficient and trustworthy AI approaches
To enhance the EU multidisciplinary competencies in the fields of industrial AI, data and robotics and embrace open innovation.
Eunomia Ltd in this project is responsible to monitor all project activities to ensure compliance with the applicable normative framework at International, EU, and national level, focusing especially to privacy (Privacy by Design - GDPR) and AI ethics issues. It will also ensure that research ethics and gender equity rising during PANDORA experimentation are in line with European values. A preliminary analysis of the requirements imposed by the existing national and European legal and regulatory framework applicable to PANDORA will also be conducted. Finally, Eunomia will also be part of the conducted experiments to ensure compliance with data protection regulations, with particular attention to gender balance (EU’s GEP) in evaluation panels and other relevant advisory bodies.