K-Means and X-Means Clustering
Clustering methods are used to group the data/observations into a few segments so that data within any segment are alike while data across segments are different. Cluster centroids are chosen randomly through a fixed number of K-clusters. The algorithm partitions the given data into K-clusters, each one having its own cluster membership and assigns each data point to the closest centroid. It then recomputes the centroid using current cluster association and if the clustering does not converge, the process will be repeated until a specified number of times. X-means clustering is a variation of K-means clustering that treats cluster allocations by repetitively attempting partition and keeping the optimal resultant splits, until some criterion is reached.
Goals:
Determine intrinsic grouping in a set of unlabeled data. Provide a fast and efficient way to cluster unstructured data, use of concurrency speeds up the process of model construction and the use of the Bayesian Information Criterion gives a mathematically sound measure of quality.
BrandIdea’s Implementation:
Consumer Segmentation, Geo-demographic Segmentation, SEC Affinity, Progressive Index