Outlier detection is the process of identifying data points that are very different from the rest of the data. These might indicate errors, novelties, or rare events. It’s similar to anomaly detection, often used interchangeably. Techniques vary: some are model-based (assuming data comes from a distribution and flagging those that have low probability under the model), others are distance or density-based (like DBSCAN can identify noise points, or the concept of local outlier factor where points in low-density regions relative to neighbors are outliers). Simpler methods include: Z-score method (if a point’s feature values are many standard deviations away from mean), or robust methods like using median and MAD (median absolute deviation). Outlier detection can be univariate or multivariate. High-dimensional outlier detection is tricky (curse of dimensionality). Applications: fraud detection (fraudulent transactions are outliers), network intrusion detection, identifying mislabeled or erroneous data, or discovering novel events (like a new type of error in a system).
Data Selection & Data Viewer
Get data insights and find the perfect selection strategy
Learn MoreSelf-Supervised Pretraining
Leverage self-supervised learning to pretrain models
Learn MoreSmart Data Capturing on Device
Find only the most valuable data directly on devide
Learn More