Unsupervised Learning algorithm K-Means Clustering divides the unlabeled dataset into various clusters. K specifies the number of pre-defined clusters that must be produced during the process; for example, if K=2, two clusters will be created, and if K=3, three clusters will be created, and so on.
Pick k items at random from the dataset to serve as cluster representatives. Using a Euclidean distance derived by a similarity function, link each remaining item in the collection to the closest cluster representative. Recalculate the representatives of the new clusters.
Choose the number of clusters k.
select k random points from the data as centroids.
Assign all the points to the closest cluster centroid.
Recompute the centroids of newly formed clusters.
Repeat steps 3 and 4.
K-Means Clustering is a method of dividing data into groups or clusters. It works by assigning each point to the cluster it most resembles in terms of distance. The algorithm iteratively assigns new cluster centers to unassigned points until all points are assigned to a cluster and no further changes in clustering are required.
This process can be done manually or with a machine learning algorithm. Some of the steps involved are:
Determine the number of clusters
Select a clustering method
Group similar data together
Calculate dissimilarities and similarities between groups
Create an output table with cluster information and values for each value in the input table