🎉 Big news: LightlyTrain now supports DINOv2. Read our announcement.

A-Z of Machine Learning and Computer Vision Terms

Artificial Intelligence (AI)

Binary Classification

Canonical Correlation Analysis (CCA)

Case-Based Reasoning

Chain of Thought (CoT)

ChatGPT

Chi-Squared Automatic Interaction Detection (CHAID)

Class Boundary (Statistics & Machine Learning)

Class Imbalance

Collaborative Filtering

Computer Vision

Computer Vision Model

Concept Drift

Conditional Random Field (CRF)

Confusion Matrix

Constrained Clustering

Contrastive Learning

Convolutional Neural Networks (CNNs)

Dimensionality Reduction

Dropout

Dynamic and Event-Based Classifications

Expectation-Maximization Algorithm (EM)

Extreme Learning Machine

Fisher’s Linear Discriminant

Foundation Models

Frame Rate

Frames Per Second (FPS)

Fully Connected Layer

Fuzzy Logic

Generative Adversarial Network (GAN)

Generative Adversarial Networks

Generative Pre-Trained Transformer

Imbalanced Data

Imbalanced data refers to datasets where the distribution of classes is highly skewed—one class appears far more frequently than others. This is common in real-world scenarios like fraud detection, medical diagnosis, or fault prediction, where the event of interest (e.g., fraud, disease, failure) is rare.

Standard machine learning models tend to be biased toward the majority class in such cases, often achieving high accuracy while completely ignoring the minority class. As a result, alternative evaluation metrics like precision, recall, F1-score, and AUC-ROC are preferred over plain accuracy.

Dealing with imbalanced data involves techniques like resampling (oversampling the minority class or undersampling the majority class), using specialized algorithms (e.g., SMOTE, cost-sensitive learning), or modifying loss functions to penalize misclassification of the minority class more heavily. Ensemble methods like Random Forest or XGBoost also tend to perform well when combined with proper tuning.

Handling imbalance properly is crucial for applications where false negatives carry a high cost, such as missed fraud or undiagnosed diseases.