Automated Data-provenance Extraction

Automated Data-provenance Extraction (multiple projects)

Supervisor: ()

Background

Machine learning pipelines have become increasingly more complex with the rise of big data and the increase in computational resources. While this has revolutionized the field of AI, it also made determining which data was used in which part of the process increasingly more difficult. This is a problem, as it adds to the ‘black-box’ behaviour of the resulting models. The recent push for eXplainable AI (XAI) aims to demistify this ‘black-box’ behaviour. This project fall under the larger umbrella of XAI, where the aim is to represent the data transformations within the machine learning pipeline such that the provenance of the used data is easily extracted. The final product is what we will refer to as a ‘data journey’. For an example, see [Daga and Groth (2023)]

Description

In theses around this topic, students can explore various provenance representation frameworks and extraction techniques. The goal is to create a provide a provenance overview that is easily understandable for humans, but also alows for complex query answering. Individual students can decide to work on:

Adapting the [PROV data model] to fit within the context of machine learning.
Exploring alternative ways of automating the provenance extracting process.
Usability validation of the resulting data journeys.

Literature

Mandatory reading:

“Data journeys: Explaining AI workflows through abstraction” [PDF]