Skip to content
Category

Cluster analysis algorithms

page 1
k-means clustering
Vector quantization algorithm that minimizes the sum of squared deviations between points and their nearest mean
expectation–maximization algorithm
iterative method for finding maximum likelihood estimates in statistical models
self-organizing map
machine learning technique useful for dimensionality reduction
DBSCAN
Density-based spatial clustering of applications with noise (DBSCAN) is a data clustering algorithm proposed by Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu in 1996. It is a density-based clustering non-parametric algorithm: given a set of points in some space, it groups together points that are closely packed (points with many nearby neighbors), and marks as outliers points that lie alone in low-density regions (those whose nearest neighbors are too far away). DBSCAN is one of the most commonly used and cited clustering algorithms.
hierarchical clustering
method of cluster analysis which seeks to build a hierarchy of clusters
neighbor joining
clustering method in bioinformatics
UPGMA
UPGMA (unweighted pair group method with arithmetic mean) is a simple agglomerative (bottom-up) hierarchical clustering method. It also has a weighted variant, WPGMA, and they are generally attributed to Sokal and Michener.
spectral clustering
clustering methods
fuzzy clustering
cluster analysis where membership of a data point in a cluster is fuzzy
OPTICS algorithm
algorithm for finding density based clusters in spatial data
k-medoids
The -medoids method is a classical partitioning technique of clustering that splits a data set of objects into clusters, where the number of clusters is assumed to be known a priori (which implies that the programmer must specify k before the execution of a -medoids algorithm). The "goodness" of the given value of can be assessed with methods such as the silhouette method. The name of the clustering method was coined by Leonard Kaufman and Peter J. Rousseeuw with their PAM (Partitioning Around Medoids) algorithm.
K-means++
In data mining and machine learning fields, '''k-means++' is an algorithm for choosing the initial values/centroids (or "seeds") for the k-means clustering algorithm. It was proposed in 2007 by David Arthur and Sergei Vassilvitskii, as an approximation algorithm for the NP-hard k-means problem—a way of avoiding the sometimes poor clusterings found by the standard k''-means algorithm. It is similar to the first of three seeding methods proposed, in independent work, in 2006 by Rafail Ostrovsky, Yuval Rabani, Leonard Schulman and Chaitanya Swamy. (The distribution of the first seed is diff
BIRCH
BIRCH (balanced iterative reducing and clustering using hierarchies) is an unsupervised data mining algorithm used to perform hierarchical clustering over particularly large data-sets. With modifications it can also be used to accelerate k-means clustering and Gaussian mixture modeling with the expectation–maximization algorithm. An advantage of BIRCH is its ability to incrementally and dynamically cluster incoming, multi-dimensional metric data points in an attempt to produce the best quality clustering for a given set of resources (memory and time constraints). In most cases, BIRCH only requ
Mean-shift
algorithm used to determine the modes of distribution of probability of a dataset
Ward's method
criterion applied in hierarchical cluster analysis
K-medians clustering
cluster analysis algorithm
complete-linkage clustering
agglomerative hierarchical clustering method