A dataset is a collection of data, often presented in tabular form (rows as instances, columns as features) or as a set of structured records, that is used for analysis or training models. In machine learning, datasets are typically divided into training, validation, and test sets. Specialized formats exist: in computer vision, a dataset might be a set of images with annotations (labels, bounding boxes, etc.), in NLP a dataset could be a text corpus with labels (like sentiment), and so on. Famous ML datasets include MNIST (handwritten digits), ImageNet (images for classification), and COCO (images with object annotations). Key aspects of a dataset include its size, feature structure, and representativeness of the problem domain.
Data Selection & Data Viewer
Get data insights and find the perfect selection strategy
Learn MoreSelf-Supervised Pretraining
Leverage self-supervised learning to pretrain models
Learn MoreSmart Data Capturing on Device
Find only the most valuable data directly on devide
Learn More