Category
page 1Clustering criteria
Jaccard index
measure of similarity and diversity between sets
F1 score
thumb|350px|Precision and recall
In statistical analysis of binary classification and information retrieval systems, the F-score or F-measure is a measure of predictive performance. It is calculated from the precision and recall of the test, where the precision is the number of true positive results divided by the number of all samples predicted to be positive, including those not identified correctly, and the recall is the number of true positive results divided by the number of all samples that should have been identified as positive. Precision is also known as positive predictive value, and
silhouette
method in cluster analysis
similarity measure
function that quantifies the similarity between two objects
Dunn index
metric for evaluating clustering algorithms
Rand index
measure of similarity between two data clusterings
Davies–Bouldin index
metric for evaluating clustering algorithms
MinHash
In computer science and data mining, MinHash (or the min-wise independent permutations locality sensitive hashing scheme) is a technique for quickly estimating how similar two sets are. The scheme was published by Andrei Broder in a 1997 conference, and initially used in the AltaVista search engine to detect duplicate web pages and eliminate them from search results. It has also been applied in large-scale clustering problems, such as clustering documents by the similarity of their sets of words.