K-means is an unsupervised machine learning algorithm used to partition a dataset into K distinct, non-overlapping groups (clusters) based on similarity. It works by initializing K cluster centroids, then iteratively assigning each data point to the nearest centroid and updating the centroids as the mean of the assigned points. This process continues until convergence—usually when assignments no longer change or a maximum number of iterations is reached.
The goal of k-means is to minimize intra-cluster variance (how close points are to the centroid) while maximizing inter-cluster separation. It performs well on spherical, similarly sized clusters but can struggle with complex shapes or outliers. The algorithm requires the number of clusters (K) to be set in advance, which is often chosen using techniques like the Elbow Method or Silhouette Score.
K-means is commonly used in customer segmentation, image compression, anomaly detection, and organizing large datasets by similarity. Its simplicity and scalability make it a popular choice, but it assumes clusters of equal variance and may converge to a local minimum depending on the initialization.
Data Selection & Data Viewer
Get data insights and find the perfect selection strategy
Learn MoreSelf-Supervised Pretraining
Leverage self-supervised learning to pretrain models
Learn MoreSmart Data Capturing on Device
Find only the most valuable data directly on devide
Learn More