Category

Probabilistic data structures

page 1

hashing-based data structure for maintaining a set of items in limited memory, allowing false positives but no false negatives

data structure that allows fast search within an ordered sequence of elements

nearest neighbor search

(as a form of proximity search (metric space)) optimization problem of finding the point in a given set that is closest (or most similar) to a given point

In computer science, the treap and the randomized binary search tree are two closely related forms of binary search tree data structures that maintain a dynamic set of ordered keys and allow binary searches among the keys. After any sequence of insertions and deletions of keys, the shape of the tree is a random variable with the same probability distribution as a random binary tree; in particular, with high probability its height is proportional to the logarithm of the number of keys, so that each search, insertion, or deletion operation takes logarithmic time to perform.

Locality-sensitive hashing

method of dimension reduction in which closer items have greater probability of being mapped to the same hash bucket

Rapidly exploring random tree

search algorithm

In computer science and data mining, MinHash (or the min-wise independent permutations locality sensitive hashing scheme) is a technique for quickly estimating how similar two sets are. The scheme was published by Andrei Broder in a 1997 conference, and initially used in the AltaVista search engine to detect duplicate web pages and eliminate them from search results. It has also been applied in large-scale clustering problems, such as clustering documents by the similarity of their sets of words.