Concept drift refers to the phenomenon where the statistical properties of the target variable (or the underlying relationship between features and target) change over time, thereby degrading a predictive model’s performance. In other words, the “concept” that the model is trying to learn is not stationary – it evolves. This is a common situation in many real-world applications. For example, suppose we have a model for predicting which topics are popular on social media. Over time, people’s interests shift (say, from one fad to another), so the patterns the model learned from last year’s data may no longer hold next year. Similarly, in fraud detection, as fraudsters adapt their strategies, the patterns of fraudulent transactions change, causing a model trained on last month’s fraud behavior to gradually become less effective. The result of concept drift is that the model’s accuracy or error rate worsens as time goes on, unless the model is updated.There are a few types of concept drift. Gradual drift is when the change happens slowly over time (e.g., consumer preferences shifting gradually). Sudden drift (or abrupt drift) is when the change is abrupt – for instance, an entirely new pattern appears overnight (perhaps due to a policy change or a sudden event like the COVID-19 pandemic causing a shift in human behavior). Seasonal or recurring drift is when concepts change in a cyclical manner (e.g., clothing sales patterns differ in winter vs summer). Detecting concept drift is an important part of maintaining machine learning systems. Techniques for drift detection monitor incoming data and model predictions – for example, one might monitor the model’s error rate on a rolling window of data, or use dedicated statistical tests/algorithms (like DDM, ADWIN, or EDDM) that raise an alert if the distribution of predictions or errors changes significantly. When drift is detected, the typical response is to update the model, either by retraining it on more recent data, or by using online learning algorithms that can adapt continuously. In some systems, an ensemble of models is maintained and weighted, with newer models gradually replacing older ones as they prove more accurate on recent data (this is sometimes combined with windowing strategies that train on the most recent data window).It’s also useful to distinguish concept drift from data drift (or covariate shift). Data drift usually refers to changes in the input feature distribution (for example, a sensor starts producing higher readings due to calibration issues), whereas concept drift refers to changes in the functional relationship between input and output (the output meaning changes relative to inputs). Data drift can lead to concept drift if the model’s prediction depends on those features. In any case, both are challenges for deployed models. Managing concept drift is an active area of research in machine learning operations (MLOps) and involves robust pipeline design: continuously logging data, retraining periodically, and validating that model assumptions hold over time. By accounting for concept drift, practitioners ensure that their models remain accurate and relevant in dynamic environments, rather than “decaying” as the world changes around them.
Data Selection & Data Viewer
Get data insights and find the perfect selection strategy
Learn MoreSelf-Supervised Pretraining
Leverage self-supervised learning to pretrain models
Learn MoreSmart Data Capturing on Device
Find only the most valuable data directly on devide
Learn More