A-Z of Machine Learning and Computer Vision Terms

  • This is some text inside of a div block.
  • This is some text inside of a div block.
  • This is some text inside of a div block.
  • This is some text inside of a div block.
  • This is some text inside of a div block.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
PyTorch
PyTorch
Q
Q
Quantum Machine Learning
Quantum Machine Learning
Query Strategy (Active Learning)
Query Strategy (Active Learning)
Query Synthesis Methods
Query Synthesis Methods
R
R
RAG Architecture
RAG Architecture
ROC (Receiver Operating Characteristic) Curve
ROC (Receiver Operating Characteristic) Curve
Random Forest
Random Forest
Recall (Sensitivity or True Positive Rate)
Recall (Sensitivity or True Positive Rate)
Recurrent Neural Network (RNN)
Recurrent Neural Network (RNN)
Region-Based CNN (R-CNN)
Region-Based CNN (R-CNN)
Regression (Regression Analysis)
Regression (Regression Analysis)
Regularization Algorithms
Regularization Algorithms
Reinforcement Learning
Reinforcement Learning
Responsible AI
Responsible AI
S
S
Scale Imbalance
Scale Imbalance
Scikit-Learn
Scikit-Learn
Segment Anything Model (SAM)
Segment Anything Model (SAM)
Selective Sampling
Selective Sampling
Self-Supervised Learning
Self-Supervised Learning
Semantic Segmentation
Semantic Segmentation
Semi-supervised Learning
Semi-supervised Learning
Sensitivity and Specificity of Machine Learning
Sensitivity and Specificity of Machine Learning
Sentiment Analysis
Sentiment Analysis
Sliding Window Attention
Sliding Window Attention
Stream-Based Selective Sampling
Stream-Based Selective Sampling
Supervised Learning
Supervised Learning
Support Vector Machine (SVM)
Support Vector Machine (SVM)
Surrogate Model
Surrogate Model
Synthetic Data
Synthetic Data
T
T
Tabular Data
Tabular Data
Text Generation Inference
Text Generation Inference
Training Data
Training Data
Transfer Learning
Transfer Learning
Transformers (Transformer Networks)
Transformers (Transformer Networks)
Triplet Loss
Triplet Loss
True Positive Rate (TPR)
True Positive Rate (TPR)
Type I Error (False Positive)
Type I Error (False Positive)
Type II Error (False Negative)
Type II Error (False Negative)
U
U
Unsupervised Learning
Unsupervised Learning
V
V
Variance (Model Variance)
Variance (Model Variance)
Variational Autoencoders
Variational Autoencoders
W
W
Weak Supervision
Weak Supervision
Weight Decay (L2 Regularization)
Weight Decay (L2 Regularization)
X
X
XAI (Explainable AI)
XAI (Explainable AI)
XGBoost
XGBoost
Y
Y
YOLO (You Only Look Once)
YOLO (You Only Look Once)
Yolo Object Detection
Yolo Object Detection
Z
Z
Zero-Shot Learning
Zero-Shot Learning
C

Clustering

Clustering is an unsupervised learning technique that involves grouping a set of data points into clusters such that points in the same cluster are more similar to each other than to points in other clusters​. Unlike classification, clustering operates on unlabeled data – the algorithm tries to discover inherent groupings or structure in the data without any ground truth labels. The goal is to maximize intra-cluster similarity (data points within a cluster should be as alike as possible) and maximize inter-cluster difference (distinct clusters should be well separated or different in characteristics).A classic example is clustering customers based on their purchase behavior: the algorithm might find one cluster of customers who buy mainly baby products, another cluster who buy luxury items, and so on – without having been told what those groups are beforehand. The “similarity” is defined via a distance or similarity measure (Euclidean distance is common for numeric data, but other measures or learned embeddings can be used). There are many clustering algorithms, each with different assumptions about cluster shape or formation: K-means clustering assumes clusters are roughly spherical in the feature space and partitions data into $k$ clusters by iteratively assigning points to the nearest cluster centroid and updating centroids; hierarchical clustering builds a tree of clusters by either successively merging the closest clusters (agglomerative) or splitting clusters (divisive), which allows one to choose a clustering at any level of granularity; DBSCAN defines clusters as areas of high density and can find arbitrarily shaped clusters while marking outliers as noise (it’s good for datasets with irregular cluster shapes); Gaussian mixture models assume data is generated from a mixture of Gaussian distributions and use statistical inference (EM algorithm) to soft-cluster points. Despite different approaches, the common theme is that clustering algorithms try to capture the natural structure in data.Clustering is often used for exploratory data analysis – to discover patterns that weren’t immediately apparent. For example, in biology, gene expression data might be clustered to find groups of genes with similar expression profiles (perhaps indicating co-regulation). In image processing, one might cluster pixel colors to compress images (color quantization) or cluster images in an unsupervised way to organize a photo collection by content. It’s also used in anomaly detection (points that don’t fit well into any cluster can be considered anomalies). One challenge with clustering is evaluating the results: since there are no true labels, validation uses metrics like silhouette score or Davies–Bouldin index (which assess cohesion and separation of clusters), or one uses domain knowledge to interpret clusters. Another challenge is that clustering can be sensitive to scaling of features and the choice of distance metric. Often, some preprocessing (like PCA for dimensionality reduction or feature normalization) is done to make clustering more effective. Overall, clustering is a powerful tool to let the data speak for itself by revealing potential groupings that can lead to insights or serve as a preprocessing step for other tasks (e.g., cluster then classify, or initialize labels via clustering).

Explore Our Products

Lightly One

Data Selection & Data Viewer

Get data insights and find the perfect selection strategy

Learn More

Lightly Train

Self-Supervised Pretraining

Leverage self-supervised learning to pretrain models

Learn More

Lightly Edge

Smart Data Capturing on Device

Find only the most valuable data directly on devide

Learn More

Ready to Get Started?

Experience the power of automated data curation with Lightly

Learn More