Training data is the collection of examples used to fit a machine learning model’s parameters. It consists of input features (and, in supervised learning, corresponding labels or target values) that the learning algorithm uses to learn patterns and make predictions.For instance, to train a face detection model, the training dataset might include thousands of images with faces annotated (labeled) by bounding boxes.The quality and representativeness of training data are critical – a model’s performance is bounded by what it sees during training. Typically, one partitions available data into training, validation, and test sets; the model is optimized on the training set. Good training data should be large, diverse, and accurately labeled to ensure the model generalizes well. Issues like noise or bias in training data can lead to poor model behavior, so data curation and augmentation are important steps in the ML pipeline.
Data Selection & Data Viewer
Get data insights and find the perfect selection strategy
Learn MoreSelf-Supervised Pretraining
Leverage self-supervised learning to pretrain models
Learn MoreSmart Data Capturing on Device
Find only the most valuable data directly on devide
Learn More