Weight decay is a regularization technique used during model training to discourage large weights in the model and thus prevent overfitting. In practice, weight decay works by adding a penalty term to the loss function equal to the sum of squared weights (L2 norm of the weights) scaled by a regularization factor λ.This means the optimizer, when updating model parameters, will push weights toward zero unless they provide sufficient reduction in the original loss to justify their magnitude. By penalizing large weights, the model is biased toward simpler (smoother) functions that generalize better to unseen data.Weight decay is commonly applied in training neural networks and is mathematically equivalent to L2 regularization. The strength of weight decay (λ) is a hyperparameter: too high can lead to underfitting (weights overly constrained), while too low might not effectively combat overfitting.
Data Selection & Data Viewer
Get data insights and find the perfect selection strategy
Learn MoreSelf-Supervised Pretraining
Leverage self-supervised learning to pretrain models
Learn MoreSmart Data Capturing on Device
Find only the most valuable data directly on devide
Learn More