Active Learning - Improving the Dark Web Classifier based on expert input

Supervisors:

The selection process for this thesis has started and we do not accept any more applications.

Background:

Dark Web pages are classified along different dimensions hence every page has more than one label, making this a multi label problem. At the moment pages are labeled by humans and a (static) pretrained AI classifier.

Description:

Although precision and recall scores are good, it would be highly beneficial to update the classifier as new data arrives. Labels can change over time as well, as new crimes are defined. Instead of first labeling a number of pages with the new labels and then learning a new classifier, it would be beneficial to be able to update the model on the go. We envision this to be done with the help of human experts that would collaborate with the AI by accepting/rejecting or even modifying labels assigned by the model. The model would then take these inputs and contineu to learn as new examples are given to the human expert to annotate.

Tasks:

  1. What are best practices with regards to updating AI models?
  2. What can we learn from the stream community?
  3. How can expert confirmation or rejection of AI based annotations be used to retrain the classifier (in a continuous way)?
  4. What infrastructure is required for such a setup at CFLW?

Literature: