Customer Success Stories

From Millions of Road Images to High-Value Training Data: How Greenwood Engineering Uses LightlyTrain

Lightly helped Greenwood Engineering curate millions of road-surface images by training a custom DINOv2 model with LightlyTrain, enabling a more effective selection of high-value samples for labeling and model training.

Vijay Gill Hansted
Machine Learning Engineer
Overview

Lightly helped Greenwood Engineering curate millions of road-surface images by training a custom DINOv2 model with LightlyTrain, enabling a more effective selection of high-value samples for labeling and model training.

Industry
Manufacturing
Location
Brøndby, Denmark
Employee
<50

Get Started with Lightly

Talk to Lightly’s computer vision team about your use case.
Book a Demo
Products
LightlyTrain
Results
2M+
Curated Images
Use Case
Data curation

About

Greenwood Engineering, based in Denmark, develops advanced measurement systems used across road, rail, and airport infrastructure. Their equipment continuously records high-resolution road-surface images, generating extremely large datasets that capture the texture, condition, and marking patterns of road networks at scale.

As the volume of captured data grew into the millions, Greenwood began exploring machine learning approaches to classify surface types, detect patterns, and support automated quality assessment. 

Problem

While collecting data wasn’t an issue, making it useful was the real challenge.

Greenwood Engineering trains models for detecting and measuring road surface defects, such as cracks and potholes, as well as measuring lane marking quality. But labeling millions of images was infeasible, and manually selecting which samples to label was slow, repetitive, and prone to redundancy:

  • Many images captured nearly identical road segments.
  • Important edge cases were buried in the dataset.
  • Labeling at scale was costly, and manual filtering didn’t scale.

Labeling this entire corpus was not feasible, and manual sampling lacked consistency.

Testimonials

We collect millions of road surface images, but since surface imagery is highly spatially correlated, labelling every sample is redundant, and finding sets of diverse data was a challenge.

Vijay Gill Hansted

Machine Learning Engineer

Scalable and Efficient Data Curation using Lightly

To efficiently curate their dataset, Greenwood used LightlyTrain to train their own DINOv2 model on unlabeled road surface images. The resulting model captures different road surface conditions much better than an off-the-shelf model. This made data curation the most effective lever for improving model performance, and LightlyTrain enabled that shift. 

Using LightlyTrain and the custom DINOv2 model, the team generated embeddings for their entire dataset. These embeddings gave them a scalable way to explore the data, run similarity search, remove redundancy across millions of road-surface images, and extract valuable samples for labeling.

Why Curation Was Essential

With the current dataset, the team quickly reached the point where finding relevant samples for labelling required manually inspecting hundreds or thousands of samples.

What they needed instead was a better understanding of which images were actually informative. LightlyTrain helped provide that structure:

  • Training improvements from additional labeled data were modest
  • Adding labels without addressing redundancy led to diminishing returns
  • The dataset needed to be organized before annotation could have impact

Results

LightlyTrain helped Greenwood organize their large road-surface dataset into a meaningful embedding space. This gave the team a much clearer understanding of where redundancy existed, which textures and markings were visually similar, and which samples were diverse enough to prioritize for labeling.

With this visibility, annotation could focus on examples that were most likely to improve downstream models.

  • Identify clusters of similar road textures and markings
  • Run k-nearest-neighbor (kNN) queries to find visually related samples
  • Detect redundant images across long road stretches
  • Build a more balanced and representative labeled subset

Get Started with Lightly

Talk to Lightly’s computer vision team about your use case.
Book a Demo
Testimonials

What engineers say after adopting Lightly

No fluff—just results from teams using Lightly to move faster with better data and models.

"We had millions of images but no clear way to prioritize. Manual selection was slow and full of guesswork. With Lightly, we just feed in the data and get back what’s actually worth labeling."

Carlos Alvarez
Machine Learning Engineer

"Through this collaboration, SDSC and Lightly have combined their expertise to revolutionize the process of frame selection in surgical videos, making it more efficient and accurate than ever before to find the best subset of frames for labeling and model training."

Margaux Masson-Forsythe
Director of Machine Learning

“Lightly enabled us to improve our ML data pipeline in all regards: Selection, Efficiency, and Functionality. This allowed us to cut customer onboarding time by 50% while achieving better model performance.”

Harishma Dayanidhi
Co-Founder/ VP of Engineering

"It took far less work than expected to plug DINO into our SSL system - the LightlySSL code was clean and easy to adapt"

Suraj Pai
Research Associate

“By integrating Lightly into our existing workflow, we achieved a 90% reduction in dataset size and doubled the efficiency of our deployment process. The tool’s seamless implementation significantly enhanced our data pipeline.”

Usman Khan
Sr. Data Scientist

“Lightly gave us transparency to a part of the ML development that is a black box, data. Furthermore, Lightly enabled us to do Active Learning at scale and helped us improve recall and F1-score of our object detector by 32% and 10% compared to our previous data selection method. We finally saw the light in our data using Lightly.”

Gonzalo Urquieta
Project Leader

Explore Lightly Products

LightlyStudio

Data Curation & Labeling

Curate, label and manage your data
in one place

Learn More

LightlyTrain

Self-Supervised Pretraining

Leverage self-supervised learning to pretrain models

Learn More

LightlyEdge

Smart Data Capturing on Device

Find only the most valuable data directly on device

Learn More

Ready to Get Started?

Experience the power of automated data curation with Lightly

Book a Demo

Get Beyond ImageNet: Vision Model Pretraining for Real-World Tasks.

See benchmarks comparing real-world pretraining strategies inside. No fluff.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.