Category

Cluster analysis algorithms

page 1

k-means clustering

Vector quantization algorithm that minimizes the sum of squared deviations between points and their nearest mean

expectation–maximization algorithm

iterative method for finding maximum likelihood estimates in statistical models

self-organizing map

machine learning technique useful for dimensionality reduction

Density-based spatial clustering of applications with noise (DBSCAN) is a data clustering algorithm proposed by Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu in 1996. It is a density-based clustering non-parametric algorithm: given a set of points in some space, it groups together points that are closely packed (points with many nearby neighbors), and marks as outliers points that lie alone in low-density regions (those whose nearest neighbors are too far away). DBSCAN is one of the most commonly used and cited clustering algorithms.

hierarchical clustering

method of cluster analysis which seeks to build a hierarchy of clusters

neighbor joining

clustering method in bioinformatics

UPGMA (unweighted pair group method with arithmetic mean) is a simple agglomerative (bottom-up) hierarchical clustering method. It also has a weighted variant, WPGMA, and they are generally attributed to Sokal and Michener.

spectral clustering

clustering methods

fuzzy clustering

cluster analysis where membership of a data point in a cluster is fuzzy

OPTICS algorithm

algorithm for finding density based clusters in spatial data

The -medoids method is a classical partitioning technique of clustering that splits a data set of objects into clusters, where the number of clusters is assumed to be known a priori (which implies that the programmer must specify k before the execution of a -medoids algorithm). The "goodness" of the given value of can be assessed with methods such as the silhouette method. The name of the clustering method was coined by Leonard Kaufman and Peter J. Rousseeuw with their PAM (Partitioning Around Medoids) algorithm.

In data mining and machine learning fields, '''k-means++' is an algorithm for choosing the initial values/centroids (or "seeds") for the k-means clustering algorithm. It was proposed in 2007 by David Arthur and Sergei Vassilvitskii, as an approximation algorithm for the NP-hard k-means problem—a way of avoiding the sometimes poor clusterings found by the standard k''-means algorithm. It is similar to the first of three seeding methods proposed, in independent work, in 2006 by Rafail Ostrovsky, Yuval Rabani, Leonard Schulman and Chaitanya Swamy. (The distribution of the first seed is diff

BIRCH (balanced iterative reducing and clustering using hierarchies) is an unsupervised data mining algorithm used to perform hierarchical clustering over particularly large data-sets. With modifications it can also be used to accelerate k-means clustering and Gaussian mixture modeling with the expectation–maximization algorithm. An advantage of BIRCH is its ability to incrementally and dynamically cluster incoming, multi-dimensional metric data points in an attempt to produce the best quality clustering for a given set of resources (memory and time constraints). In most cases, BIRCH only requ

algorithm used to determine the modes of distribution of probability of a dataset

criterion applied in hierarchical cluster analysis

K-medians clustering

cluster analysis algorithm

complete-linkage clustering

agglomerative hierarchical clustering method