FALCONN (FAst Lookups of Cosine and Other Nearest Neighbors) is a C++ library with a Python wrapper for similarity search over high-dimensional data. It supports cosine similarity and the Euclidean distance. The main ingredient of FALCONN is a Locality-Sensitive Hashing family for cosine similarity that is:
Below are the results of FALCONN being compared with other open source similarity search algorithms. The dataset, which consists of vector representations for words produced by GloVe, has 1.2M points in 100 dimensions. The results for other algorithms are taken from ann-benchmarks created by Erik Bernhardsson. Note that FALCONN is especially good in the regime of high accuracy (0.8 or more).
The plot axes are: accuracy retrieving 10 closest data points vs. the number of queries per second. The picture is clickable.
The underlying algorithms are described and analyzed in the following paper: