Gradient descent is an optimization algorithm used to minimize a function by iteratively moving in the direction of the steepest descent, as defined by the negative of the gradient. In machine learning, it's commonly used to minimize the loss function of a model by adjusting its parameters (e.g., weights in a neural network).
At each step, parameters are updated using the gradient of the loss with respect to the parameters, scaled by a learning rate. A small learning rate may result in slow convergence, while a large one can cause the algorithm to overshoot or diverge. Variants like Stochastic Gradient Descent (SGD), Mini-Batch Gradient Descent, and Momentum introduce trade-offs between convergence speed, noise, and stability.
More advanced forms like Adam, RMSprop, and AdaGrad adapt the learning rate based on parameter-specific behavior over time, often leading to faster or more reliable convergence.
Gradient descent assumes the loss surface is continuous and differentiable, and it may get stuck in local minima or saddle points—especially in high-dimensional, non-convex problems like deep learning.
Data Selection & Data Viewer
Get data insights and find the perfect selection strategy
Learn MoreSelf-Supervised Pretraining
Leverage self-supervised learning to pretrain models
Learn MoreSmart Data Capturing on Device
Find only the most valuable data directly on devide
Learn More