Automated Data-provenance Extraction (multiple projects)

Supervisor: ()


Machine learning pipelines have become increasingly more complex with the rise of big data and the increase in computational resources. While this has revolutionized the field of AI, it also made determining which data was used in which part of the process increasingly more difficult. This is a problem, as it adds to the ‘black-box’ behaviour of the resulting models. The recent push for eXplainable AI (XAI) aims to demistify this ‘black-box’ behaviour. This project fall under the larger umbrella of XAI, where the aim is to represent the data transformations within the machine learning pipeline such that the provenance of the used data is easily extracted. The final product is what we will refer to as a ‘data journey’. For an example, see [Daga and Groth (2023)]


In theses around this topic, students can explore various provenance representation frameworks and extraction techniques. The goal is to create a provide a provenance overview that is easily understandable for humans, but also alows for complex query answering. Individual students can decide to work on:


Mandatory reading: