Trusted by top ML teams

Build state-of-the-art ML Pipelines

with Active Learning

Lightly selects the subset of your data with the biggest impact on model accuracy, allowing you to improve your model iteratively by using the best data for retraining.

Get the most out of your data by reducing data redundancy, bias, and focusing on edge-cases.

Build state-of-the-art ML Pipelines

with Active Learning

Lightly selects the subset of your data with the biggest impact on model accuracy, allowing you to improve your model iteratively by using the best data for retraining.

Get the most out of your data by reducing data redundancy, bias, and focusing on edge-cases.

Scale

Select the best 1% from millions of images or videos

Lightly's algorithms can process lots of data (e.g., 10k videos or 10m images) within less than 24 hrs

Automation

Use our API to automate the whole data selection process

Connect Lightly to your existing Cloud buckets and process new data automatically

Science

Use state-of-the art active learning algorithms

Lightly combines active- and self-supervised learning algorithms for data selection

Features that lift you to the next level

Left side: python code with commands to run Lightly
Right side: Input/Output Dataset Class Distribution graph

We help customers to have up to

90%

less labeling costs

Data redundancy can not only hurt model performance but also create significant costs for data labeling, storage, and compute. Don't waste your money on bad data!

Tired of only selecting 1 frame per minute to reduce the data load?
Want to get randomness out your data?
Manually picking doesn't give you the best result?

Lightly will be able to help you with that and will make selecting the best data easy for you

Python script showing commands to run Lightly

20%

better models

Selecting training data that is difficult for your model can yield significant gains in accuracy. Use active- and self-supervised learning approaches thanks to Lightly.

Python code showing commands to run Lightly

Send the best subset of your data to labeling at the click of a button
Trigger retraining and model deployment
Automatically build datapools and datasets

2x

faster retraining cycles

Managing your data and machine learning pipeline efficiently saves a lot of time and reduces errors. Leave hacky in-house solutions and scripts behind for a scalable and reliable solution.

Send the best subset of your data to labeling at the click of a button
Trigger retraining and model deployment
Automatically build datapools and datasets

Take the human error factor out of your data curation equation
Get access to cutting-edge data curation technology
No need to implement research papers yourself

Research papers
1) The 10% you don't need, Google, CVPR 2019
2) Less is More: An exploration of data redundancy with active data subsampling
Tweet from Andrej Karpathy:
"We see more significant improvements from training data distribution search (data splits + oversampling factor ratios) than neural architecture search. The latter is so overrated :)

‍We trust in hard math and base Lightly on it

Ask our customers

‍

"After training a model on the filtered data suggested by Lightly, I saw a dramatic increase in performance on our key metrics. Part of this is certainly because this was the first time we trained a model on any data that we've collected, but I'm fairly certain that performance would not have been as good if we had chosen what data to label at random."

Angelo Stekardis

Former Computer Vision Lead

CurbFlow

‍

“By integrating Lightly into our existing workflow, we achieved a 90% reduction in dataset size and doubled the efficiency of our deployment process. The tool’s seamless implementation significantly enhanced our data pipeline.”

Usman Khan

Sr. Data Scientist

Aigen

‍

"Through Lightly we were able to see, that a lot of data being collected was not meaningful enough for training an accurate model. This led us to change the way we gathered data and allowed us to ultimately create a much more information-dense and higher-quality dataset overall. Needless to say, the performance of our final model was greatly improved."

Nasib Adriano Naimi

Autonomy Engineer

DroGone

‍

“Lightly gave us transparency to a part of the ML development that is a black box, data. Furthermore, Lightly enabled us to do Active Learning at scale and helped us improve recall and F1-score of our object detector by 32% and 10% compared to our previous data selection method. We finally saw the light in our data using Lightly.”

‍

Gonzalo Urquieta

Project Leader

Lythium

‍

"Lightly is hyper-focused on finding thousands of relevant images from millions of video frames to improve deep learning models. The Lightly platform enabled us to build models and deploy features more than 2x faster and unlock completely new development workflows. I can recommend every MLOps team with a lot of data to integrate Lightly."

Isura Ranatunga

Co-Founder and CTO

Rabot

No items found.

‍

"I was truly amazed once we received the results of Lightly. We knew we had a lot of similar images due to our video feed but the results showed us how we can work more efficiently by selecting the right data"

Alejandro Garcia

CEO

AI Retailer Systems

4 easy steps to configure your ML pipeline

Connect

Connect Lightly with your data locally, in GCP, Azure, and AWS S3 buckets. Data stays on your infrastructure, which keeps it secured

Configure

Use a combination of model predictions, embeddings, and metadata to reach your desired data distribution

Run

Process data on your infrastructure using a docker container. Our solution streams data from the bucket without cluttering disks

Use

Get your curated dataset labeled, train your machine learning model, and check the accuracy improvement

Integrate with your ML Stack

Designed to seamlessly plug into your favorite storage, tooling, and service providers in order to build an automated data pipeline for machine learning that enables a closed loop feedback cycle.

Data Storage

Label Tooling

Model tooling

Learn more

Try for free

Featured in

Select the right training data
for Computer Vision

Build state-of-the-art ML Pipelines

with Active Learning

Build state-of-the-art ML Pipelines

with Active Learning

Scale

Select the best 1% from millions of images or videos

Automation

Use our API to automate the whole data selection process

Science

Use state-of-the art active learning algorithms

Features that lift you to the next level

Data Selection

Data Insights

Pipeline Management

Easy Integration & Security

We help customers to have up to

90%

less labeling costs

20%

better models

2x

faster retraining cycles