At KAI, we set clear expectations for students. We want to make sure students know what to expect from us as their supervisors. We have prepared a short document which touches upon some important points like meetings, planning and writing of your thesis.
Most of the topics below can be investigated by either BSc or MSc AI students. We also welcome groups of students working on the same or similar topic.
If you are interested in one of the projects below, please contact the supervisor(s) listed to receive more information about the topics. Where available, have a look at the detailed description first. Also, keep in mind that all theses can be shaped to accomodate your interests.
Information Extraction
Supervisor: Benno Kruit (b.b.kruit@vu.nl), Ilaria Tiddi (i.tiddi@vu.nl), Lise Stork (l.stork@vu.nl)
We offer multiple projects under the umbrella of information extraction with varying foci. Information extraction focuses on generating structured data from unstructured inputs in an automated manner. The input as well as the output can vary based on the application or end usage of the extracted data.
Automated Processing of Scholarly Data: (Ilaria) The goal of this project is to support the automatisation of processing the CEUR-WS proceedings data. For a BSc thesis, the objective is to extract an ontology of CEUR knowledge. For a MSc thesis, this would be extended with analysing abstracts or creating an interface for data input and knowledge graph population.
Making social history research papers machine interpretable: (Lise) The goal of this project is to see if we can partially automate or support the construction of knowledge graph on social history hypotheses from the literature. Here is a more detailed description.
Information extraction from Structured lists (BSc, Benno): Many different kinds of documents contain lists because they are a simiple way of enumerating several related items. We want to investigate ways of extracting the information from the lists and retaining the inherent relationship between list items.
Higher-arty Relation Extraction with Qualifiers (MSc, Benno): This projects aims to inverstigate new techniques for extracting complex statements with meta-information from text. The goal is to leverage ontology/schema-level information about types, relations, and meta-relations to overcome the incompleteness problem. More information can be found here.
Multilingual Travel Knowledge Extraction (MSc, Benno): In this project, we want to explore approaches for linking geographical location and inforamtion together within the applciation of travel. More details are given here.
Data Schema Induction for Shopping (MSc, Benno): The goal is to investigate approaches for inducing a ceherent ontology for product descriptions. More information is available here.
Ontologies, Knowledge Engineering
Supervisor: Romana Pernisch (r.pernisch@vu.nl), Ilaria Tiddi (i.tiddi@vu.nl)
Ontologies model specific domains. As domains evolve over time, ontologies have to be changed as well. Not only are the ontologies themsevels affected but also applications using those ontologies for various purposes. We have multiple theses in this domain.
ChImp 2.0: The ChImp Protégé plugin helps ontology engineers during this process by summarising and displaying changes and the effects of changes on the ontology as a whole. We have multiple possible projects with ChImp. More information is here.
Materialisation/Reasoning: We have previously investigated the impact on the materialisation (making implicit knowledge explicit) and want to further the analysis by diving into more depth. This means that we want to investigate the types of changes in more detail but also the effect of the changes more localized in the materialisation, rather then looking at the materialisation as a whole. More details are given here.
Embeddings: We have previously developed a methon to compare embeddings as the underlying knowledge graph changes. We want to further this method and analyse its capabilities in more detail. Here is the thesis description.
Knowledge Engineering for Hybrid Intelligence (Bsc, MSc): Inspired by Software Design and Engineering, Knowledge Engineering deals with the formal design, maintainance and usage of knowledge-based systems. In this project, we will look at modelling Hybrid Intelligent systems using knowledge engineering techniques.
Argument and Rule Mining
Supervisor: Loan Ho (t.t.l.ho@vu.nl), Lise Stork (l.stork@vu.nl)
There are multiple projects in the domain of argument mining with different objectives:
Argumentation-based explanation from KGs (MSc): the goal is to run a Tableau reasoner on an DL-formalised ontology.
Case-based reasoning in the legal domain (BSc): the goal is to create an ontology (DL-lite) for tabular data input and running the Tableau algorithm to predict whether a case has a (non)violation.
Rule mining, similarly to information extraction, aims at finding structures. In this case, we want to learn rules that describe the data best to help us understand it better. There are multiple projects that involve rule mining:
Rule Mining on Hypergraphs to Forecast Recipe Popularity: The goal of this project is to use a temporal knowledge graph and some auxilary data to run temporal rule mining algorithms to predict future purchases at restaurants. Here are more details.
Community Detection from a Species Interaction Network: The goal of this project is to extract interesting communities in species interaction networks and or detect interesting logic rules that relate to these communities. More details are given here
Robotics and Knowledge Representation
Supervisor: Mark Adamik (m.adamik@vu.nl), Ilaria Tiddi (i.tiddi@vu.nl)
These projects look at the intersection of robotics and knowledge graphs. Knowledge of Robotic Operating System (ROS) is a plus:
Semantic Mapping with a mobile service robot (MSc): The goal of this project is to implement semantic mapping for an indoor mobile robot using knowledge graphs. More information about this project can be found here.
Visual Scene Understanding for Indoor Mobile Robots (BSc): In this project, you will use the camera feed of the robot and off-the-shelf image recognition algorithms to understand the environment by generating knowledge graphs. More information about this project can be found here.
Energy-efficient robots through knowledge-awareness. (BSc, MSc): We have an ontology representing the capabilities of the robots (picking objects, moving, scanning surroundings, etc.). The ontology needs to be expanded with energy budgets so the robot can choose the actions to performed based on its capabilities and energy-efficiency. Collaboration with the Software Engineering group.
Explanations and Narratives
Supervisors: Lise Stork (l.stork@vu.nl), Ilaria Tiddi (i.tiddi@vu.nl)
The following topics are aimed at providing a more human-like AI, by creating explanations or creating narratives.
Identifying Formal Narratives from KGs: (Ilaria) The goal of this project is to extract as many narratives as possible (in terms of set of facts) from existing KGs such as DBpedia or Wikidata. In order collect these facts, we will use the narrative formal structure as presented in this paper.
A Benchmark for understanding Narratives: (Ilaria) Language Models and KGs. Work on extending the three existing benchmarks (1, 2, 3) for understanding narratives.
Recipe variation using gastronomic recipe explanations: (Lise) In this project, the goal is to investigate recipe variability by finding explanations for why certain ingredients taste better than others by using the Food Knowledge Graph. More details can be found here.
A tool for Publishing Social Inequality Hypotheses: (Lise) The goal of this project is to create a web interface to facilitate the semantic annotation of social history research papers. More information is located here.
Question Answering
Supervisors: Benno Kruit (b.b.kruit@vu.nl), Stefan Schlobach (k.s.schlobach@vu.nl)
QA is a very broad topic. We, however, focus on QA over structured data in various forms:
Playing “20 Questions” with a KG (BSc): detailed description is located here.
Graph Queries on Relation Databases (MSc CS): Description here.
Multi-lingual problems
Supervisors: Benno Kruit (b.b.kruit@vu.nl)
Even though these topics would also fit under different topics already discribed above, we wanted to highlight them as they are both addressing the problem of multiple languages in different tasks:
Multilingual Entity Linking (BSc): Many names can refer to several different entities. In this project, we want to look at the problem of disambiguation and linking of entities.
Multilingual Travel Knowledge Extraction (MSc): This project is about integrating Wikivoyage location data with data from OpenStreetMap. The approach will leverage distant supervision, relation extraction, data integration and deep learning techniques. More details can be found here.
Semantics of Deep Learning Methods
Supervisor: Ilaria Tiddi (i.tiddi@vu.nl)
Analysing multi-task deep models with Graph Analysis: This is a collaboration with the Bioinformatics department. The goal is to extend previous work (1,2) to include knowledge about the data inputs of a multi-task, multi-class classifier for bioinformatics data using a KG. We will then use this information to elicit the model’s inner workings and generate textual explenation for its decisions. A combination of link prediction and graph summarisation will be used.
Incorporating Semantics in Message Passing methods: Message passing models are neural network architectures that operate by propagating information along the structure of a graph over which they are trained end-to-end. Currently, these methods treat all relationships in the same way, while in knowledge graphs some edges carry more semantics than others (e.g. entity type or subclass hierarchies). Here, we will look at feeding such information in a message passing model such as R-GCN and test it in a node labelling or link prediction scenario.
Extracting Semantics from neural co-activation graphs: Here, we will look at extracting taxonomical or ontological information using the analysis of the co-activation graph (CoAG) of a neural network architecture. We will use community analysis incrementally to analyse different knowledge granularities (CoAG alone, CoAG with rdfs:subClassOf, CoAG with other relationships). The taxonomy will help understand what the neural representation has learned, and it will be compared with a ground-truth KG to see how correct the neural representation is.
Internships
Elsevier
Supervisor: Romana Pernisch (r.pernisch@vu.nl)
Elsevier is offering many theses, which were presented at the VU Theses Fair on the 11th November. The list can be found here.
Following theses from the list would potentially supervised by Romana:
Automatic Taxonomy Construction
Quality Metrics for Knowledge Graphs
Distributional Bias and Drift in Biomedical embeddings ND Corpora
Impact of Ontology Changes on Document Annotations
CFLW Cyber Strategies
Supervisors: Eljo Haspels (eljo.haspels@cflw.com), Romana Pernisch (r.pernisch@vu.nl)
CFLW is a tech startup from the Netherlands, founded at the end of 2019, based in The Hague. They develop intelligence services for law enforcement agencies, cybersecurity agencies and financial/fintech organizations. Their core product is Dark Web Monitor, which is used by various agencies around the world. For more details on CFLW see their website
Internship details
Requirements:
Students are expected to be at the office at least once a week on Thursdays. The CFLW office is at the Hague Security Delta, next to Laan van NOI railway station.
Students must be technically skilled in computer science, artificial intelligence, digital technologies, forensics or any other related background.
Students need to be intrinsically motivated to make cyberspace a little bit safer.
Benefits:
Students will receive an internship fee.
Students will learn how a startup works as they will take part in the company operations.
They formulate student projects in a way so students can realize a real-world impact on security.
CFWL is offering two theses, follow link for more details on the projects:
Supervisors: Romana Pernisch (r.pernisch@vu.nl), Ilaria Tiddi (i.tiddi@vu.nl)
Lareb is a Pharmacovigilance Research Lab studying the effects of drugs over human bodies. The goal of this thesis is to extract and model some domain data on drug reactions so that link prediction approaches can be deployed over this data. Additionally, there is also interest in aligning the extracted model with existing Knowledge Graphs on drugs. BSc or MSc.
Accenture
Supervisor: Ilaria Tiddi
We will be offering projects around KGs, ML and Hybrid Intelligence in collaboration with Accenture.
Triply is a company offering infrastructure solutions for knowledge graph-based data. There are several project available in collaboration with Triply DB on using Machine Learning and NLP over large scale KGs. Group work is possible and projects can be either BSc or MSc.
Semantic Data Quality: Develop a method to automatically detect common mistakes in ontologies. Examples can be: taxonomic loops, inconsistency, or redundancies in ontologies. Use these mistakes to identify, for instance, outdated knowledge.
Semantic Explanations: The goal here is to automatically generate natural language explanations from complex data structures (graphs). We will be looking at building explanations in the form of argumentative structure (cfr. Toulmin’s model) for specific given facts.
Semantic Search Engine: The project aims at creating a search engine over several knowledge graphs, that can allow look up all IRIs in the Semantic Web.
Fair Semantics: Here, we aim at detecting discriminatory features in datasets such as most recurring classes or properties. We can then use them to create “prescriptions” or “manuals” for a datasets, to describe their biases and the underlying assumptions that may be found in the data. Could be extended to develop a method to overcome such bias.