🎉 Big news: LightlyTrain now supports DINOv2. Read our announcement.

A-Z of Machine Learning and Computer Vision Terms

Histogram of Oriented Gradients (HOG)

Human Pose Estimation

Human in the Loop (HITL)

Hyperparameter Tuning

Intersection over Union (IoU)

Large Language Model (LLM)

Latent Dirichlet Allocation (LDA)

Latent Space

Learning Rate

Linear Discriminant Analysis (LDA)

Linear Regression

Logistic Regression

Long Short-Term Memory (LSTM)

Loss Function

Machine Learning (ML)

Manifold Learning

Markov Chains

Mean Average Precision (mAP)

Mean Squared Error (MSE)

Medical Image Segmentation

Natural Language Processing (NLP)

Neural Architecture Search

Neural Networks

Neural Style Transfer

Optical Character Recognition (OCR)

Optimization Algorithms

Outlier Detection

Overfitting

PACS (Picture Archiving and Communication System)

PR AUC

Pandas and NumPy

Panoptic Segmentation

Parameter-Efficient Fine-Tuning (Prefix-Tuning)

Predictive Model Validation

Principal Component Analysis

Quantum Machine Learning

Query Strategy (Active Learning)

Query Synthesis Methods

RAG Architecture

ROC (Receiver Operating Characteristic) Curve

Learning Rate

The learning rate is a hyperparameter that controls how much the model’s weights are updated during training in response to the estimated error (loss) gradient. In gradient descent optimization, the weight update is typically: w := w - η * (∂L/∂w), where η (eta) is the learning rate. A high learning rate can speed up training but might overshoot minima or cause divergence (the training loss might not decrease because steps are too large). A low learning rate ensures more stable convergence but training becomes slow and can get stuck in local minima or plateaus. Often, learning rate schedules or adaptive methods are used: starting higher then decaying, or methods like Adam adjust the effective learning rate per parameter. Tuning the learning rate is crucial for efficient training. A common heuristic is to try different powers of 10 (e.g., 1e-1, 1e-2, 1e-3, 1e-4) or use techniques like learning rate finder to pick a good value. Too high usually shows divergence (loss increases), too low shows very slow decrease of loss.