Category
page 1Probabilistic data structures
Bloom filter
hashing-based data structure for maintaining a set of items in limited memory, allowing false positives but no false negatives
skip list
data structure that allows fast search within an ordered sequence of elements
nearest neighbor search
(as a form of proximity search (metric space)) optimization problem of finding the point in a given set that is closest (or most similar) to a given point
treap
In computer science, the treap and the randomized binary search tree are two closely related forms of binary search tree data structures that maintain a dynamic set of ordered keys and allow binary searches among the keys. After any sequence of insertions and deletions of keys, the shape of the tree is a random variable with the same probability distribution as a random binary tree; in particular, with high probability its height is proportional to the logarithm of the number of keys, so that each search, insertion, or deletion operation takes logarithmic time to perform.
Locality-sensitive hashing
method of dimension reduction in which closer items have greater probability of being mapped to the same hash bucket
Rapidly exploring random tree
search algorithm
MinHash
In computer science and data mining, MinHash (or the min-wise independent permutations locality sensitive hashing scheme) is a technique for quickly estimating how similar two sets are. The scheme was published by Andrei Broder in a 1997 conference, and initially used in the AltaVista search engine to detect duplicate web pages and eliminate them from search results. It has also been applied in large-scale clustering problems, such as clustering documents by the similarity of their sets of words.