Trusted by top ML teams
Audi LogoFrontify LogoLogo ARMNautilusNauto LogoSyngenta Logo

Build state-of-the-art ML Pipelines

with Active Learning

Lightly selects the subset of your data with the biggest impact on model accuracy, allowing you to improve your model iteratively by using the best data for retraining.

Get the most out of your data by reducing data redundancy, bias, and focusing on edge-cases.

Build state-of-the-art ML Pipelines

with Active Learning

Lightly selects the subset of your data with the biggest impact on model accuracy, allowing you to improve your model iteratively by using the best data for retraining.

Get the most out of your data by reducing data redundancy, bias, and focusing on edge-cases.

Scale

Select the best 1% from millions of images or videos

Lightly's algorithms can process lots of data (e.g., 10k videos or 10m images) within less than 24 hrs

Settings icon

Automation

Use our API to automate the whole data selection process

Connect Lightly to your existing Cloud buckets and process new data automatically

Science

Use state-of-the art active learning algorithms

Lightly combines active- and self-supervised learning algorithms for data selection

We help customers to have up to

90%

less labeling costs

Data redundancy can not only hurt model performance but also create significant costs for data labeling, storage, and compute. Don't waste your money on bad data!

Down arrow
  • Tired of only selecting 1 frame per minute to reduce the data load?
  • Want to get randomness out your data? 
  • Manually picking doesn't give you the best result? 

Lightly will be able to help you with that and will make selecting the best data easy for you

Python script showing commands to run Lightly

20%

 better models

Selecting training data that is difficult for your model can yield significant gains in accuracy. Use active- and self-supervised learning approaches thanks to Lightly.

Down arrowPython code showing commands to run Lightly
  • Send the best subset of your data to labeling at the click of a button
  • Trigger retraining and model deployment
  • Automatically build datapools and datasets

2x

faster retraining cycles

Managing your data and machine learning pipeline efficiently saves a lot of time and reduces errors. Leave hacky in-house solutions and scripts behind for a scalable and reliable solution.

Down arrow
  • Send the best subset of your data to labeling at the click of a button
  • Trigger retraining and model deployment
  • Automatically build datapools and datasets
  • Take the human error factor out of your data curation equation
  • Get access to cutting-edge data curation technology
  • No need to implement research papers yourself
Research papers
1) The 10% you don't need, Google, CVPR 2019
2) Less is More: An exploration of data redundancy with active data subsampling
Tweet from Andrej Karpathy:
"We see more significant improvements from training data distribution search (data splits + oversampling factor ratios) than neural architecture search. The latter is so overrated :)

We trust in hard math and base Lightly on it

Ask our customers

“Lightly enabled us to improve our ML data pipeline in all regards: Selection, Efficiency, and Functionality. This allowed us to cut customer onboarding time by 50% while achieving better model performance.”

Harishma Dayanidhi

Co-Founder and VP of Engineering

Voxel

"Through this collaboration, SDSC and Lightly have combined their expertise to revolutionize the process of frame selection in surgical videos, making it more efficient and accurate than ever before to find the best subset of frames for labeling and model training."

Margaux Masson-Forsythe

Director of Machine Learning

SDSC

"After training a model on the filtered data suggested by Lightly, I saw a dramatic increase in performance on our key metrics. Part of this is certainly because this was the first time we trained a model on any data that we've collected, but I'm fairly certain that performance would not have been as good if we had chosen what data to label at random."

Angelo Stekardis

Former Computer Vision Lead

CurbFlow

"Lightly is hyper-focused on finding thousands of relevant images from millions of video frames to improve deep learning models. The Lightly platform enabled us to build models and deploy features more than 2x faster and unlock completely new development workflows. I can recommend every MLOps team with a lot of data to integrate Lightly."

Isura Ranatunga

Co-Founder and CTO

Rabot

"Lightly helped us understand more about our own data-gathering process. Through their service, we were able to see, that a lot of data being collected was not meaningful enough for training an accurate model. This led us to change the way we gathered data and allowed us to ultimately create a much more information-dense and higher-quality dataset overall. Needless to say, the performance of our final model was greatly improved."

Nasib Adriano Naimi

Autonomy Engineer

DroGone

“Lightly gave us transparency to a part of the ML development that is a black box, data. Furthermore, Lightly enabled us to do Active Learning at scale and helped us improve recall and F1-score of our object detector by 32% and 10% compared to our previous data selection method. We finally saw the light in our data using Lightly.”

Gonzalo Urquieta

Project Leader

Lythium

4 easy steps to configure your ML pipeline

Number 1

Connect

Connect Lightly with your data in GCP, Azure, and AWS S3 buckets. Data stays on your infrastructure, which keeps your it secured

Number 2

Configure

Use a combination of model predictions, embeddings, and metadata to reach your desired data distribution

Number 3

Run

Process data on your infrastructure using a docker container. Our solution streams data from the bucket without cluttering disks

Number 4

Use

Get your curated dataset labeled, train your machine learning model, and check the accuracy improvement

Integrate with your ML Stack

Designed to seamlessly plug into your favorite storage, tooling, and service providers in order to build an automated data pipeline for machine learning that enables a closed loop feedback cycle.

Data Storage

Amazon S3 Logo
Google Cloud Logo
Microsoft Azure Logo

Label Tooling

Sama Logo
V7 labs Logo
Scale Logo
CVAT logo
Labelbox logo
LabelStudio logo

Model tooling

PyTorch logo
TensorFlow logo
Weights & Biases Logo

Featured in

Improve your data
Today is the day to get the most out of your data. Share our mission with the world — unleash your data's true potential.
Contact us