Supervisors: Benno Kruit (firstname.lastname@example.org) and Stefan Schlobach (email@example.com)
Keywords: Hierarchical Clustering, Rule Learning, Sparse Encoding
“Twenty Questions” is a popular parlor game where one player tries to guess what another player is thinking of, in less than 20 yes/no questions. It is also known as Breadbox, after a popular first question: “Is it bigger than a breadbox?”, and is similar to the (simpler) game called “Guess Who?”. Such games are based on iteratively splitting the set of candidate answers in half, effectively performing a binary search using properties of entities. A similar approach may be useful for supporting entity search in Knowledge Graphs (KGs) for users who are not sure what they are looking for, or cannot remember a name. It could also speed up entity prediction in Machine Learning classifiers, by reducing a scan over all N candidates to a binary search in log(N) steps. However, optimally splitting a set of entities in half is a hard problem because there are 2N options. Furthermore, the best partitioning may involve paths of multiple steps in the KG, such as “albums by bands from the Netherlands”, of which there are too many to compute completely.
This project aims to investigate efficient approaches for automatically finding semantically coherent questions that split a set of entities from a KG in half. The approach should be able to trade the complexity of such questions against the balance of their split, because complex questions will result in more equal splits at the cost of coherence. Ideally, it should also be able to formulate alternative questions when the user does not know the answer, and take into account the incompleteness of the KG under the open-world assumption. However, the specific focus of the project is for the student to choose.
The starting point is a simple property-based approach that works for small, closed-world KGs. Then, it should investigate more scalable approaches that don’t involve enumerating all possible options, and account for the open-world assumption. Possible avenues for addressing these challenges are using KG embeddings and ontologies.