Active Learning using Detectron2

Tired of labeling all your data? Learn more about how model predictions and embeddings can help you select the right data.

Supervised machine learning requires labeled data. In computer vision applications such as autonomous driving, labeling a single frame can cost up to $10. The fast growth of new, connected devices and cheaper sensors leads to a continuous increase in new data. Labeling everything is simply not possible anymore. Many companies in fact only label between 0.1% and 1% of the data they collect. But finding the right 0.1% of data is like finding the needle in the haystack without knowing what the needle looks like. So, how can we do it efficiently?

One approach to tackle the problem is active learning. When doing active learning, we use a pre-trained model and use the model predictions to select the next batch of data for labeling. Different algorithms exist which help you select the right data based on model predictions. For example, the well-known approach of uncertainty sampling selects new data based on low model confidence. Let's assume a scenario where we have two images with cats, one where the model says it’s 60% sure it’s a cat and one where the model is 90% certain that there is a cat. We would now pick the image where the model has only 60% confidence. We essentially pick the “harder” example.
With active learning, we iterate this prediction and selection process until we reach our target metrics.

Example image from Comma10k with model predictions of a Faster R-CNN model with a ResNet-50 backbone trained on MS COCO.

In this post, we won’t go into detail about how active learning works. There are many great resources about active learning. Instead, we will focus on how you can use active learning with just a few lines of code using the Active Learning feature of LightlyOne. LightlyOne is a data curation platform for computer vision. It leverages recent advances in self-supervised learning and active learning to help you work with unlabeled datasets.

The Datasets: From MS COCO to Comma10k

It is very common these days to use pre-trained models and fine-tune them on new tasks using transfer learning. Since we are interested in object detection here we use a pre-trained model from MS COCO (or COCO). Consisting of more than 100k labeled images, it is a very common dataset used for transfer learning for image segmentation, object detection, or keypoint/pose estimation.

Our goal is to use active learning to use a COCO pre-trained model and fine-tune it on a dataset for autonomous driving. For this transfer task, we are using the Comma10k dataset. From the repository: “It’s 10,000 PNGs of real driving captured from the comma fleet. It’s MIT license, no academic-only restrictions or anything.”

As you might have noticed already the Comma10k dataset has annotations for training “segnets” (semantic segmentation networks). However, there are no annotations for bounding boxes we require for our transfer task. We, therefore, have to add the missing annotations. Instead of annotating all 10k images, we will use active learning to pick the first 100 images where we expect the highest return in model improvement and annotate them first.

Let’s have a look at how active learning can help us select the first 100 images for annotation.

Let’s get started

This post is based on the Active Learning using Detectron2 on Comma10k tutorial. If you want to run the code yourself there is also a ready-to-use Google Colab Notebook.

To get active learning working can be really hard. Many companies fail to implement active learning properly and get little to no value out of it. One of the main reasons for this is that they focus only on uncertainty sampling. Uncertainty sampling is one of two big categories of active learning algorithms. In the illustration below you find the two active learning approaches on the right side.

Knowledge Quadrant — The right column is Active Learning. (see Active Learning with PyTorch)

Uncertainty sampling is probably the most common approach. You pick new samples based on where model predictions have low confidence.

The LightlyOne Platform supports both, uncertainty sampling as well as diversity sampling algorithms.

The second approach is diversity sampling. You can use it to diversify the dataset. We pick images that are visually/ semantically distinct from each other.

Uncertainty sampling can be used with a variety of scores (least confidence, margin, entropy…).
For diversity sampling LightlyOne uses the coreset algorithm and embeddings obtained from its open-source self-supervised learning framework LightlySSL.

However, there is more.

LightlyOne has another algorithm for active learning called CORAL (COReset Active Learning) which uses a combination of diversity and uncertainty sampling.

The goal is to overcome the limitations of the individual methods by selecting images with low model confidence but at the same time making sure that they are visually distinct from each other.

Let’s see how we can make use of active learning and the LightlyOne Platform.

Embed and Upload your Dataset

Let’s start by creating embeddings and uploading the dataset to the LightlyOne Platform. We will use the embeddings later for the diversification part of the CORAL algorithm.

You can easily train, embed, and upload a dataset using the LightlySSL Python package.
First, we need to install the package. We recommend using pip for this. Make sure you’re in a Python3.6+ environment. If you’re on Windows you should create a conda environment.

Run the following command in your shell to install the latest version of LightlySSL:

Now that we have LightlySSL installed we can run the command line command lightly-magic to train, embed, and upload our dataset. You need to pass a token and a dataset_id argument to the command. You find both in the LightlyOne Platform after creating a new dataset.

Once you ran the lightly-magic CLI command you should see the uploaded dataset in the LightlyOne Platform. You can have a look at the 2d visualizations of your dataset. Do you spot the two clusters forming images of day and night?

2D visualization of the Comma10k dataset on the LightlyOne Platform
Active Learning Workflow

Now, that we have a dataset uploaded to the LightlyOne Platform with embeddings we can start with the active learning workflow. We are interested in the part where you have a trained model and are ready to run predictions on unlabeled data. We start by creating an ActiveLearningAgent. This agent will help us managing the images which are unlabeled and makes sure we interface with the platform properly.

In our case, we don’t have a model yet. Let’s load it from the disk and get it ready to run predictions on unlabeled data.

Finally, we can use our pre-trained model and run predictions on the unlabeled data. It’s important that we use the same order of the individual files as we have on the LightlyOne Platform. We can simply do this by iterating over the al_agent.query_set which contains a list of filenames in the right order.

In order to upload the predictions, we need to turn them into scores. And since we’re working on an “object detection” problem here we use the ScorerObjectDetection.

We’re finally ready to query the first batch of images.

Query the first 100 images for labeling

To query data based on the model predictions and our embeddings on the LightlyOne Platform we can use the .query(...) method of the agent. We can pass it an SamplerConfig object to describe the kind of sampling algorithm and its parameters we want to run.

After querying the new 100 images we can simply access their filenames using the added_set of the active learning agent.

Congratulations, you did your (first) active learning iteration!
Now you can label the 100 images and train your model with them.
Active learning is usually done in a continuous feedback loop. After training your model using the new data you would do another iteration and predict + select another batch of images for labeling.

I hope you got a good idea of how you can use active learning for your next computer vision project. For more information check out the Active Learning using Detectron2 on Comma10k tutorial.

Igor Susmelj, co-founder
Lightly