In computer vision, embedding spaces are vector representations where images or image regions are mapped into a continuous space that captures visual similarity and semantic meaning. Models learn to project images into these spaces such that visually or conceptually similar images are close together, while dissimilar ones are far apart.
These embeddings are typically produced by convolutional neural networks (CNNs) or vision transformers trained using classification, contrastive, or self-supervised objectives. Common applications include image retrieval, clustering, active learning, anomaly detection, and similarity-based search.
For example, in contrastive learning (e.g., SimCLR, DINO), embeddings of augmented views of the same image are pulled together, while those from different images are pushed apart. Embedding spaces also support zero-shot transfer by aligning images with text or labels (e.g., CLIP).
Embedding quality depends on training objectives, data diversity, and architecture. Well-structured embedding spaces allow efficient downstream tasks without retraining the full model.
Data Selection & Data Viewer
Get data insights and find the perfect selection strategy
Learn MoreSelf-Supervised Pretraining
Leverage self-supervised learning to pretrain models
Learn MoreSmart Data Capturing on Device
Find only the most valuable data directly on devide
Learn More