The Jaccard Index, also known as Jaccard similarity coefficient, measures the similarity between two sets by dividing the size of their intersection by the size of their union. It’s defined as:
J(A, B) = |A ∩ B| / |A ∪ B|
Values range from 0 (no overlap) to 1 (identical sets). The Jaccard Index is commonly used in information retrieval, clustering evaluation, recommender systems, and computer vision tasks like semantic segmentation.
In classification, it’s useful for evaluating models on imbalanced data, particularly for multilabel tasks where traditional accuracy may be misleading. It’s also the basis for metrics like Intersection over Union (IoU) used in object detection.
Compared to cosine or Euclidean distance, the Jaccard Index focuses purely on set membership, ignoring frequency or magnitude. It works best for binary or categorical data and is often paired with Jaccard distance (1 - Jaccard Index) in clustering or similarity-based search.
Data Selection & Data Viewer
Get data insights and find the perfect selection strategy
Learn MoreSelf-Supervised Pretraining
Leverage self-supervised learning to pretrain models
Learn MoreSmart Data Capturing on Device
Find only the most valuable data directly on devide
Learn More