Introducing LightlyTrain: Better Vision Models, Faster - No Labels Needed

Table of contents

LightlyTrain lets you pretrain vision models on your own unlabeled data — no labels needed. Improve performance on classification, detection, and segmentation tasks while reducing labeling costs and speeding up deployment. Compatible with popular architectures like YOLO, RT-DETR, and ResNet, LightlyTrain adapts models to your domain and integrates easily into existing pipelines — fully on-premise and scalable to millions of images.

Ideal For:
CV/ML Engineers
Reading time:
10 mins
Category:
Models

Share blog post

Quick summary of key points about AI model training techniques and their implementation.

TL;DR

We are excited to announce LightlyTrain: A solution which makes foundation models work on your data.

Pretrained models have allowed huge breakthroughs in many computer vision applications. However, they are trained on generic datasets like ImageNet or COCO. This limits their effectiveness in domain specific applications.

LightlyTrain bridges this gap and unlocks the potential of foundation models pretrained on your data. Our self-supervised pretraining framework is tailored for industrial applications. With LightlyTrain, teams can pretrain models on unlabeled data from their own domain, and see substantial gains in performance across classification, detection, and segmentation tasks.

Models pretrained with LightlyTrain achieve consistently higher performance across different model architectures and dataset sizes – outperforming both ImageNet and models trained from scratch.

Figure 1: On COCO, YOLOv8-s models pretrained with LightlyTrain achieve high performance across all tested label fractions. These improvements hold for other architectures like YOLOv11, RT-DETR, and Faster R-CNN.

Bringing Pretraining to Industry

Pretraining on your own data works. We already built one of the most widely used self-supervised learning frameworks for research. But industry adoption was difficult: the expertise necessary to make pretraining work made it inaccessible. Therefore, we set out to make pretraining easy. LightlyTrain is the result of many iterations with our clients: we abstracted away the nitty gritty details and focused on what they really cared about: a simple way to train better models.

Why LightlyTrain?

LightlyTrain helps industry clients unlock the potential of foundation models on domain-specific tasks. By pretraining a model on your unlabeled, domain-specific data, you significantly reduce the amount of labeling needed to reach a high model performance. Therefore, LightlyTrain reduces labeling costs and speeds up model deployment.

This allows you to focus on new features and new domains instead of managing your labeling cycles. LightlyTrain is designed for simple integration into existing training pipelines and supports a wide range of model architectures and use-cases out of the box. It is available as a Python package or Docker container and runs fully on-premises.

Key value proposition:

  • No Labels Required: Speed up development by pretraining models on your unlabeled image and video data.
  • Domain Adaptation: Improve models by pretraining on your domain-specific data (e.g. video analytics, agriculture, automotive, healthcare, manufacturing, retail, and more).
  • Model & Task Agnostic: Compatible with any architecture and task, including detection, classification, and segmentation.
  • Industrial-Scale Support: LightlyTrain scales from thousands to millions of images. Supports on-prem, cloud, single, and multi-GPU setups.

Supported models and libraries:

  • YOLOv5–v12, RT-DETR, ResNet, Vision Transformers, and more.
  • Torchvision, Ultralytics, TIMM, SuperGradients, and more.

Benchmarks

LightlyTrain has been benchmarked across multiple regimes: 

  • Diverse model architectures
  • Different dataset sizes
  • Domain specific datasets

1. LightlyTrain Model Architecture Generalization

This benchmark highlights how LightlyTrain works with various models. For this purpose, the models were first pretrained with LightlyTrain on the full COCO dataset without labels. Then each model was fine-tuned with labels using only 10% of the COCO dataset to highlight typical industry use-cases where datasets are only partially labeled. All models pretrained with LightlyTrain show improved detection performance compared to their ImageNet or non-pretrained counterparts.

YOLO Series

YOLO is one of the most popular model series for object detection tasks. Pretraining with LightlyTrain yields up to 14% higher mAP compared to starting from ImageNet pretrained weights and up to 34% higher mAP compared to no pretraining. We observe consistent improvements for different YOLO versions (v8, v11, and v12) and model sizes (S and L).

Figure 2: Object detection performance on 10% of the COCO dataset for different YOLO versions. All versions benefit from pretraining with LightlyTrain. Note that no ImageNet supervised weights are available for YOLO12.

RT-DETR

RT-DETR is a modern alternative to YOLO object detectors. For RT-DETR we achieve a performance increase of +1.2% mAP with pretraining compared to initializing the model with supervised ImageNet weights.

Faster R-CNN

Classic architectures like Faster R-CNN, which has a ResNet50 backbone with a detection head on top, also benefit from pretraining with LightlyTrain. In this case we observe up to 3.6% higher mAP compared to ImageNet weights. This showcases how LightlyTrain can be integrated into established training pipelines to further improve model performance.

Figure 3: Object detection performance on 10% of the COCO dataset on RT-DETR and Faster R-CNN.
Figure 3: Object detection performance on 10% of the COCO dataset on RT-DETR and Faster R-CNN.

2. LightlyTrain Performance Across Dataset Sizes

The following plot highlights the benefits of pretraining across different dataset sizes. Especially in cases where only little labeled data is available (left of the plot) the benefit of starting from a pretrained model becomes even more evident. When fine-tuning on only 1’200 labeled images, LightlyTrain yields a +50% mAP gain over ImageNet weights. 

This can be attributed to the pretraining which has access to a large number of unlabeled images from the COCO dataset. The pretrained model is therefore already well adapted to images from this dataset before the fine-tuning. The ImageNet model on the other hand is pretrained on other data and only sees the 1’200 COCO images during fine-tuning, making it much harder for the model to adapt to the new dataset. 

Even with larger dataset sizes LightlyTrain pretraining continues to outperform the baseline by a large margin.

Figure 4: Object detection performance on the COCO dataset for different fine-tune label fractions. LightlyTrain improves model performance across all tested label fractions.

3. Domain-Specific Results

There are many pretrained models publicly available. However, most of them are trained on a few well-known datasets like ImageNet or COCO. While such models have demonstrated great performance on many tasks, they struggle to generalize to data that is significantly different from the data they were originally trained on. With LightlyTrain you can avoid this issue by pretraining models on the data from the same domain as your downstream task. 

Automotive

BDD100K (Automotive - Detection, RT-DETR)

Pretraining on unlabeled video frames from BDD100K allows LightlyTrain to deliver the best performance when fine-tuning on labeled driving scenes.

Figure 5:  Object detection performance on the BDD100K autonomous driving dataset.
Medical

DeepLesion (Medical - Detection, YOLO11x)

Under strong domain shifts like medical imaging, LightlyTrain also outperforms ImageNet pretraining. In this case we observe +1.1% mAP improvement on a lesion detection task from CT scan images.

Figure 6: Object detection performance on the DeepLesion dataset
Agriculture

DeepWeeds (Agriculture - Classification, ResNet50)

LightlyTrain helps models adapt more effectively to agricultural data, outperforming ImageNet pretraining by +4.3% top1 accuracy on a classification task.

Figure 7: Classification performance on the DeepWeeds dataset.
Figure 7: Classification performance on the DeepWeeds dataset.

How to Get Started with LightlyTrain

Getting started with LightlyTrain is easy.

Installation:

pip install lightly-train

Then start pretraining with:

import lightly_train‍

if __name__ == "__main__":  
	lightly_train.train(      
		out="out/my_experiment",            # Output directory      
		data="my_data_dir",                 # Directory with images     
		model="torchvision/resnet50",       # Model to train 
	)

Finally, load the pretrained model and fine-tune it using your existing training pipeline:

import torch
from torchvision import models‍

# Load the pretrained model
model = models.resnet50()
model.load_state_dict(torch.load("out/my_experiment/exported_models/exported_last.pt"))‍

# Fine-tune the model with your existing training pipeline

...

Resources: 

LightlyTrain is available as a Python package and Docker container. All training runs can be executed fully on-premise — no internet access or telemetry required.

Licensing & Commercial Usage

LightlyTrain is available under AGPL-3.0 for open-source use. Commercial licenses are available — book a demo with our team to learn more.

See Lightly in Action

Curate data, train foundation models, deploy on edge today.

Book a demo

Get Started with Lightly

Talk to Lightly’s computer vision team about your use case.
Book a Demo

Stay ahead in computer vision

Get exclusive insights, tips, and updates from the Lightly.ai team.