Leveraging Large Language Models for Ontology Extraction through Question-Answering

Supervisor: Jieying Chen (j.chen2@vu.nl)

Abstract

The aim of the thesis is to explore the potential of automating and refining the ontology extraction process using advanced Large Language Models (LLMs). Ontologies, which offer a structured representation of concepts and their relationships within a specific domain, are fundamental to semantic web applications, knowledge portrayal, and smart systems. Historically, the creation and updating of these ontologies have necessitated considerable manual intervention by domain specialists. However, the emergence of LLMs, known for their adeptness in context comprehension, meaningful response generation, and structured information extraction, hints at a transformative approach. By harnessing LLMs via a Question-Answering (QA) mechanism, the thesis seeks to foster more agile and real-time ontology development and modifications, addressing the continually changing landscape of knowledge domains.

Objectives

Examine the current methodologies and tools for ontology extraction, and understand the capabilities and limitations of LLMs in knowledge extraction, especially within a QA paradigm.
Design and develop a framework that uses LLMs to extract ontological structures by posing domain-specific questions, iteratively refining and expanding the ontology based on answers.
Fine-tune selected LLMs on domain-specific datasets to enhance their precision in ontology extraction and to ensure that the responses align with the terminologies and structures of the particular knowledge domain.
Construct an evaluation to measure the accuracy, depth, and comprehensiveness of the ontologies extracted using the LLM-QA approach, comparing it to manually generated and other machine-generated ontologies.

References

Tom B. Brown, etc.”Language Models are Few-Shot Learners.” In NeurIPS, 2020.
LLaMA: https://arxiv.org/abs/2302.13971