FALCONN
Classes | Typedefs | Enumerations | Functions
falconn Namespace Reference

Classes

class  FalconnError
 
struct  LSHConstructionParameters
 
class  LSHNearestNeighborQuery
 
class  LSHNearestNeighborQueryPool
 
class  LSHNearestNeighborTable
 
class  LSHNearestNeighborTableError
 
class  LSHNNTableSetupError
 
struct  PlainArrayPointSet
 
struct  PointTypeTraits
 
struct  PointTypeTraits< DenseVector< CoordinateType > >
 
struct  PointTypeTraits< SparseVector< CoordinateType, IndexT > >
 
struct  QueryStatistics
 

Typedefs

template<typename CoordinateType >
using DenseVector = Eigen::Matrix< CoordinateType, Eigen::Dynamic, 1, Eigen::ColMajor >
 
template<typename CoordinateType , typename IndexType = int32_t>
using SparseVector = std::vector< std::pair< IndexType, CoordinateType > >
 

Enumerations

enum  LSHFamily { Unknown = 0, LSHFamily::Hyperplane = 1, LSHFamily::CrossPolytope = 2 }
 
enum  DistanceFunction { Unknown = 0, DistanceFunction::NegativeInnerProduct = 1, DistanceFunction::EuclideanSquared = 2 }
 
enum  StorageHashTable {
  Unknown = 0, StorageHashTable::FlatHashTable = 1, StorageHashTable::BitPackedFlatHashTable = 2, StorageHashTable::STLHashTable = 3,
  StorageHashTable::LinearProbingHashTable = 4
}
 

Functions

template<typename PointType >
void compute_number_of_hash_functions (int_fast32_t number_of_hash_bits, LSHConstructionParameters *params)
 
template<typename PointType >
LSHConstructionParameters get_default_parameters (int_fast64_t dataset_size, int_fast32_t dimension, DistanceFunction distance_function, bool is_sufficiently_dense)
 
template<typename PointType , typename KeyType = int32_t, typename PointSet = std::vector<PointType>>
std::unique_ptr< LSHNearestNeighborTable< PointType, KeyType > > construct_table (const PointSet &points, const LSHConstructionParameters &params)
 

Detailed Description

The main namespace.

Typedef Documentation

◆ DenseVector

template<typename CoordinateType >
using falconn::DenseVector = typedef Eigen::Matrix<CoordinateType, Eigen::Dynamic, 1, Eigen::ColMajor>

Type for dense points / vectors. The coordinate type can be either float or double (i.e., use DenseVector<float> or DenseVector<double>). In most cases, float (single precision) should be sufficient.

◆ SparseVector

template<typename CoordinateType , typename IndexType = int32_t>
using falconn::SparseVector = typedef std::vector<std::pair<IndexType, CoordinateType> >

Type for sparse points / vectors. The coordinate type can be either float or double (i.e., use SparseVector<float> or SparseVector<double>). In most cases, float (single precision) should be sufficient.

The elements of the vector must be sorted by the index (the first component of the pair).

Optionally, you can also change the type of the coordinate indices. This might be useful if you have indices that fit into an int16_t and you want to save memory.

Enumeration Type Documentation

◆ DistanceFunction

Supported distance functions.

Note that we use distance functions only to filter the candidates in find_nearest_neighbor, find_k_nearest_neighbors, and find_near_neighbors.

Enumerator
NegativeInnerProduct 

The distance between p and q is -<p, q>. For unit vectors p and q, this means that the nearest neighbor to q has the smallest angle with q.

EuclideanSquared 

The distance is the squared Euclidean distance (same order as the actual Euclidean distance / l2-distance).

◆ LSHFamily

enum falconn::LSHFamily
strong

Supported LSH families.

Enumerator
Hyperplane 

The hyperplane hash proposed in

"Similarity estimation techniques from rounding algorithms" Moses S. Charikar STOC 2002

CrossPolytope 

The cross polytope hash first proposed in

"Spherical LSH for Approximate Nearest Neighbor Search on Unit Kengo Terasawa, Yuzuru Tanaka WADS 2007

Our implementation uses the algorithmic improvements described in

"Practical and Optimal LSH for Angular Distance", Alexandr Andoni, Piotr Indyk, Thijs Laarhoven, Ilya Razenshteyn, Ludwig Schmidt NIPS 2015

◆ StorageHashTable

Supported low-level storage hash tables.

Enumerator
FlatHashTable 

The set of points whose hash is equal to a given one is retrieved using "naive" buckets. One table takes space O(#points + #bins).

BitPackedFlatHashTable 

The same as FlatHashTable, but everything is packed using as few bits as possible. This option is recommended unless the number of bins is much larger than the number of points (in which case we recommend to use LinearProbingHashTable).

STLHashTable 

Here, std::unordered_map is used. One table takes space O(#points), but the leading constant is much higher than that of bucket-based approaches.

LinearProbingHashTable 

The same as STLHashTable, but the custom implementation based on linear probing is used. This option is recommended if the number of bins is much higher than the number of points.

Function Documentation

◆ compute_number_of_hash_functions()

template<typename PointType >
void falconn::compute_number_of_hash_functions ( int_fast32_t  number_of_hash_bits,
LSHConstructionParameters params 
)

Computes the number of hash functions in order to get a hash with the given number of relevant bits. For the cross polytope hash, the last cross polytope dimension will also be determined. The input struct params must contain valid values for the following fields:

  • lsh_family
  • dimension (for the cross polytope hash)
  • feature_hashing_dimension (for the cross polytope hash with sparse vectors) The function will then set the following fields of params:
  • k
  • last_cp_dim (for the cross polytope hash, both dense and sparse)

◆ construct_table()

template<typename PointType , typename KeyType = int32_t, typename PointSet = std::vector<PointType>>
std::unique_ptr<LSHNearestNeighborTable<PointType, KeyType> > falconn::construct_table ( const PointSet &  points,
const LSHConstructionParameters params 
)

Function for constructing an LSH table wrapper. The template parameters PointType and KeyType are as in LSHNearestNeighborTable above. The PointSet template parameter default is set so that a std::vector<PointType> can be passed as the set of points for which a LSH table should be constructed.

For dense data stored in a single large array, you can also use the PlainArrayPointSet struct as the PointSet template parameter in order to pass a densly stored data array.

The points object must stay valid for the lifetime of the LSH table.

The caller assumes ownership of the returned pointer.

◆ get_default_parameters()

template<typename PointType >
LSHConstructionParameters falconn::get_default_parameters ( int_fast64_t  dataset_size,
int_fast32_t  dimension,
DistanceFunction  distance_function,
bool  is_sufficiently_dense 
)

A function that sets default parameters based on the following dataset properties:

  • the size of the dataset (i.e., the number of points)
  • the dimension
  • the distance function
  • and a flag indicating whether the dataset is sufficiently dense (for dense data, fewer pseudo-random rotations suffice)

The parameters should be reasonable for sufficiently nice datasets. If the dataset has special structure, or you want to maximize the performance of FALCONN, you need to set the parameters by hand. See the documentation and the GloVe example to make sense of the parameters.

In particular, the default setting should give very fast preprocessing time. If you are willing to spend more time building the data structure to improve the query time, you should increase l (the number of tables) after calling this function.