A-Z of Machine Learning and Computer Vision Terms

  • This is some text inside of a div block.
  • This is some text inside of a div block.
  • This is some text inside of a div block.
  • This is some text inside of a div block.
  • This is some text inside of a div block.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
PyTorch
PyTorch
Q
Q
Quantum Machine Learning
Quantum Machine Learning
Query Strategy (Active Learning)
Query Strategy (Active Learning)
Query Synthesis Methods
Query Synthesis Methods
R
R
RAG Architecture
RAG Architecture
ROC (Receiver Operating Characteristic) Curve
ROC (Receiver Operating Characteristic) Curve
Random Forest
Random Forest
Recall (Sensitivity or True Positive Rate)
Recall (Sensitivity or True Positive Rate)
Recurrent Neural Network (RNN)
Recurrent Neural Network (RNN)
Region-Based CNN (R-CNN)
Region-Based CNN (R-CNN)
Regression (Regression Analysis)
Regression (Regression Analysis)
Regularization Algorithms
Regularization Algorithms
Reinforcement Learning
Reinforcement Learning
Responsible AI
Responsible AI
S
S
Scale Imbalance
Scale Imbalance
Scikit-Learn
Scikit-Learn
Segment Anything Model (SAM)
Segment Anything Model (SAM)
Selective Sampling
Selective Sampling
Self-Supervised Learning
Self-Supervised Learning
Semantic Segmentation
Semantic Segmentation
Semi-supervised Learning
Semi-supervised Learning
Sensitivity and Specificity of Machine Learning
Sensitivity and Specificity of Machine Learning
Sentiment Analysis
Sentiment Analysis
Sliding Window Attention
Sliding Window Attention
Stream-Based Selective Sampling
Stream-Based Selective Sampling
Supervised Learning
Supervised Learning
Support Vector Machine (SVM)
Support Vector Machine (SVM)
Surrogate Model
Surrogate Model
Synthetic Data
Synthetic Data
T
T
Tabular Data
Tabular Data
Text Generation Inference
Text Generation Inference
Training Data
Training Data
Transfer Learning
Transfer Learning
Transformers (Transformer Networks)
Transformers (Transformer Networks)
Triplet Loss
Triplet Loss
True Positive Rate (TPR)
True Positive Rate (TPR)
Type I Error (False Positive)
Type I Error (False Positive)
Type II Error (False Negative)
Type II Error (False Negative)
U
U
Unsupervised Learning
Unsupervised Learning
V
V
Variance (Model Variance)
Variance (Model Variance)
Variational Autoencoders
Variational Autoencoders
W
W
Weak Supervision
Weak Supervision
Weight Decay (L2 Regularization)
Weight Decay (L2 Regularization)
X
X
XAI (Explainable AI)
XAI (Explainable AI)
XGBoost
XGBoost
Y
Y
YOLO (You Only Look Once)
YOLO (You Only Look Once)
Yolo Object Detection
Yolo Object Detection
Z
Z
Zero-Shot Learning
Zero-Shot Learning
C

Class Imbalance

Class imbalance refers to an uneven distribution of classes in a dataset, where some class (the “majority” class) has many more samples than another class (the “minority” class)​.This situation is common in real-world classification tasks – for instance, in fraud detection, fraudulent transactions might be only 1% of the data (minority) while legitimate transactions are 99% (majority). Similarly, in medical diagnostics data, healthy cases often vastly outnumber disease cases. Class imbalance can be problematic because most machine learning algorithms assume or perform best when the classes are roughly balanced. The model will tend to bias towards the majority class, since simply predicting the majority every time minimizes overall error; as a result, it may largely ignore the minority class, which is usually the class of greater interest​For example, a classifier might achieve 99% accuracy on the fraud dataset by always predicting “not fraud,” but such a model is essentially useless for catching actual fraud instances.The presence of class imbalance means that evaluation metrics like plain accuracy become less informative – one must look at metrics that capture minority-class performance (such as precision, recall, F1-score, area under the ROC curve, etc.). It also necessitates special techniques during modeling. Data-level methods include re-sampling the training data: one can over-sample the minority class (e.g. duplicate minority examples or generate synthetic ones using methods like SMOTE) or under-sample the majority class (remove some majority examples) to achieve a more balanced dataset​.Algorithm-level methods include using cost-sensitive learning or class weight adjustments – assigning a higher penalty to mistakes on the minority class during training, so the model is incentivized to get those right​.In practice, a combination of approaches may be used. For instance, one might slightly over-sample the minority class and also use a weighted loss function that emphasizes minority-class accuracy​. Another strategy is to use one-vs-all or threshold-moving techniques to adjust the decision threshold for the minority class to achieve a desired recall. It’s also important to have a properly stratified validation scheme: evaluation on imbalanced data should reflect the costs of different errors. In summary, class imbalance is a common challenge that can lead to biased models if not addressed – the key is to recognize it and apply techniques that restore focus on the minority class performance without introducing too much overfitting or noise by naive oversampling.

Explore Our Products

Lightly One

Data Selection & Data Viewer

Get data insights and find the perfect selection strategy

Learn More

Lightly Train

Self-Supervised Pretraining

Leverage self-supervised learning to pretrain models

Learn More

Lightly Edge

Smart Data Capturing on Device

Find only the most valuable data directly on devide

Learn More

Ready to Get Started?

Experience the power of automated data curation with Lightly

Learn More