This algorithm learns from unlabeled data. It identifies patterns and relationships in the data without prior knowledge. Techniques and algorithms often used in unsupervised learning:
Clustering is an unsupervised learning technique that groups similar data points together based on their characteristics. It is used to find patterns and relationships in unlabeled data. Clustering algorithms identify groups of data points that are similar to each other and different from other groups. This can help to discover hidden patterns in the data and gain insights into the underlying structure of the dataset. Some popular clustering algorithms include k-means, hierarchical clustering, and DBSCAN.
K-means clustering is a popular unsupervised learning algorithm used in data mining, pattern recognition, and image analysis. It is a method of grouping similar data points together based on their characteristics. The algorithm works by identifying k number of centroids, where k is a user-defined parameter. It then assigns each data point to the nearest centroid based on their distance from it. The centroids are then moved to the mean of the data points assigned to them, and the process is repeated until convergence. The result is k clusters of data points that are similar to each other and different from data points in other clusters. K-means clustering is often used in market segmentation, image compression, and anomaly detection.
The cost function for k-means clustering is defined as the sum of the squared distances between each data point and its assigned centroid. Mathematically, it can be represented a
$$ J(c, m) = \frac1m * \sum_{i=1}^m ||x^{(i)} - u_{c^{(i)}}||^2 $$
where:
The objective of the k-means algorithm is to minimize this cost function by finding the optimal values of the centroids. The algorithm achieves this by iteratively updating the centroids and the assignments of the data points until convergence.