Crate faiss_next_sys
source ·Expand description
§faiss-next-sys
faiss-next-sys
wrap c_api
of faiss
into rust
with bindgen
.
Currently supported faiss
version is v1.7.4
§Build faiss
from source
faiss-next-sys
requires faiss
compiled with FAISS_ENABLE_C_API=ON
and BUILD_SHARED_LIBS=ON
.
Some facebookresearch/faiss distributions, like of brew
on mac, does not provide faiss_c
library.
So, building faiss
from source is necessary time to time.
facebookresearch/faiss provides installation document officially to guide how to build faiss
from source.
But, on windows
, building faiss
will fail, because of msvc
c++ compiler’s implemention of C++17
syntax: issue.
So, a hecked v1.7.4
version is made: link to solve the issue.
If windows
is not the target platform, just clone faiss
and check v1.7.4
branch out, will just work fine.
- link: hecked version
- link offical version
Pick one of above, download, unzip, then start building:
§MacOS
xcode
and brew
needed, install in advance.
# install cmake openblas and llvm
brew install cmake openblas llvm
# configure
cmake -B build -DCMAKE_C_COMPILER=/opt/homebrew/opt/llvm/bin/clang -DCMAKE_CXX_COMPILER=/opt/homebrew/opt/llvm/bin/clang++ -DFAISS_ENABLE_C_API=ON -DBUILD_SHARED_LIBS=ON -DCMAKE_BUILD_TYPE=Release -DFAISS_ENABLE_GPU=OFF -DFAISS_ENABLE_PYTHON=OFF -DBUILD_TESTING=OFF
# compile
cmake --build build --config Release
# install
cmake --install build --prefix=$HOME/faiss
cp build/c_api/libfaiss_c.dylib $HOME/faiss/lib/
§Linux
gcc
, cmake
, intelmkl
, cuda
needed, install in advance.
# configure
cmake -B build -DFAISS_ENABLE_C_API=ON -DBUILD_SHARED_LIBS=ON -DCMAKE_BUILD_TYPE=Release -DFAISS_ENABLE_GPU=ON -DFAISS_ENABLE_PYTHON=OFF -DBUILD_TESTING=OFF
# compile
cmake --build build --config Release
# install
cmake --install build --prefix=$HOME/faiss
cp build/c_api/libfaiss_c.so $HOME/faiss/lib/
§Windows
Visual Studio 2022
, cmake
, intelmkl
, cuda
needed, install in advance.
# configure
cmake -B build -DFAISS_ENABLE_C_API=ON -DBUILD_SHARED_LIBS=ON -DCMAKE_BUILD_TYPE=Release -DFAISS_ENABLE_GPU=ON -DFAISS_ENABLE_PYTHON=OFF -DBUILD_TESTING=OFF
# compile
cmake --build build --config Release
# install
cmake --install build --prefix=%USERPROFILE%\faiss
copy build\c_api\Release\faiss_c.dll %USERPROFILE%\faiss\bin
copy build\c_api\Release\faiss_c.lib %USERPROFILE%\faiss\lib\
§Bindings
Bindings was generated by running the follow commands under faiss-next-sys
folder
cargo build --features bindgen
or, generate bindings with gpu
enabled
cargo build --features bindgen,gpu
Generated bindings looks like:
└── src
├── lib.rs
├── linux
│ ├── bindings.rs #linux cpu bindings
│ └── bindings_gpu.rs #linux gpu bindings
├── macos
│ └── bindings.rs #macos cpu bindings, gpu is not supported
└── windows
├── bindings.rs #windows cpu bindings
└── bindings_gpu.rs #windows gpu bindings
Structs§
- List of temporary buffers used to store results before they are copied to the RangeSearchResult object.
- Class for the clustering parameters. Can be passed to the constructor of the Clustering object.
Enums§
- An error code which depends on the exception thrown from the previous operation. See
faiss_get_last_error
to retrieve the error message. - Some algorithms support both an inner product version and a L2 search version.
Functions§
- copy elemnts ofs:ofs+n-1 seen as linear data in the buffers to tables dest_ids, dest_dis
- Sets the ClusteringParameters object with reasonable defaults
- getter for centroids (size = k * d)
- getter for iteration stats
- the only mandatory parameters are k and d
- called before computing distances
- compute distance between two stored vectors
- Compute distance of vector i to current query. This function corresponds to the function call operator: DistanceComputer::operator()
- Remove ids from a set. Repetitions of ids in the indices set passed to the constructor does not hurt performance. The hash function used for the bloom filter and GCC’s implementation of unordered_set are just the least significant bits of the id. This works fine for random ids or ids in sequences but will produce many hash collisions if lsb’s are always the same
- remove ids between [imni, imax)
- Encapsulates a set of ids to remove.
- Getter for random_rotation
- Add n vectors of dimension d to the index.
- Same as add, but stores xids instead of sequential ids.
- return the indexes of the k vectors closest to the query x.
- query n vectors of dimension d to the index.
- Reconstruct a stored vector (or an approximation if lossy coding)
- Reconstruct vectors i0 to i0 + ni - 1
- removes IDs from the index. Not supported by all indexes @param index opaque pointer to index object @param nremove output for the number of IDs removed
- removes all elements from the database. @param index opaque pointer to index object
- query n vectors of dimension d to the index.
- Perform training on a representative set of vectors
- Opaque type for IndexFlat1D
- Opaque type for IndexFlatIP
- Opaque type for IndexFlatL2
- compute distance with a subset of vectors
- Opaque type for IndexFlat
- get a pointer to the index’s internal data (the
xb
field). The outputs become invalid after any data addition or removal operation. - make the rev_map from scratch
- get a pointer to the index map’s internal ID vector (the
id_map
field). The outputs of this function become invalid after any operation that can modify the index. - same as IndexIDMap but also provides an efficient reconstruction implementation via a 2-way index
- get a pointer to the sub-index (the
index
field). The outputs of this function become invalid after any operation that can modify the index. - get a pointer to the index map’s internal ID vector (the
id_map
field). The outputs of this function become invalid after any operation that can modify the index. - Index that translates search results to ids
- get a pointer to the sub-index (the
index
field). The outputs of this function become invalid after any operation that can modify the index. - whether object owns the quantizer
- Update a subset of vectors.
- whether object owns the quantizer
- Opaque type for IndexIVFScalarQuantizer
- copy a subset of the entries index to the other index
- Check the inverted lists’ imbalance factor.
- Get the IDs in an inverted list. IDs are written to
invlist
, which must be large enough to accommodate the full list. - initialize a direct map
- moves the entries from another dataset to self. On output, other is empty. add_id is added to all moved ids (for sequential ids, this would be this->ntotal
- display some stats about the inverted lists of the index
- search a set of vectors, that are pre-quantized by the IVF quantizer. Fill in the corresponding heaps with the query results. search() calls this.
- The sign of each vector component is put in a binary signature
- Index that applies a LinearTransform transform on vectors before handing them over to a sub-index
- Opaque type for IndexRefineFlat
- Index that concatenates the results from several sub-indexes
- Opaque type for IndexScalarQuantizer
- Index that concatenates the results from several sub-indexes
- Add n vectors of dimension d to the index.
- Same as add, but stores xids instead of sequential ids.
- return the indexes of the k vectors closest to the query x.
- Computes a residual vector after indexing encoding.
- Computes a residual vector after indexing encoding.
- query n vectors of dimension d to the index.
- Reconstruct a stored vector (or an approximation if lossy coding)
- Reconstruct vectors i0 to i0 + ni - 1
- removes IDs from the index. Not supported by all indexes @param index opaque pointer to index object @param nremove output for the number of IDs removed
- removes all elements from the database. @param index opaque pointer to index object
- The size of the produced codes in bytes.
- decode a set of vectors
- encode a set of vectors
- query n vectors of dimension d to the index.
- query n vectors of dimension d with seach parameters to the index.
- Perform training on a representative set of vectors
- compute A^T * A to set the is_orthonormal flag
- compute x = A^T * (x - b) is reverse transform if A has orthonormal lines
- Getter for do_pca
- Getter for the values in the range. The output values are invalidated upon any other modification of the range.
- add a new parameter (or return it if it exists)
- get string representation of the combination by writing it to the given character buffer. A buffer size of 1000 ensures that the full name is collected.
- print a description on stdout
- nb of combinations, = product of values sizes
- Parameter space default constructor
- set one of the parameters
- set a combination of parameters described by a string
- set a combination of parameters on an index
- Getter for is_orthonormal
- result structure for a single query
- called by range_search before do_allocation
- called when lims contains the nb of elements result entries for each query
- getter for labels and respective distances (not sorted): result for query i is labels[lims[i]:lims[i+1]]
- getter for lims: size (nq + 1)
- apply the random rotation, return new allocated matrix @param x size n * d_in @return size n * d_out
- apply transformation and result is pre-allocated @param x size n * d_in @param xt size n * d_out
- reverse transformation. May not be implemented or may return approximate result
- Perform training on a representative set of vectors
- Clone an index. This is equivalent to
faiss::clone_index
- compute ny square L2 distance between x and a set of contiguous y vectors
- compute the inner product between nx vectors x and one y
- squared norm of a vector
- compute the L2 norms for a set of vectors
- same as fvec_norms_L2, but computes squared norms
- L2-renormalize a set of vector. Nothing done if the vector is 0-normed
- Getter of block sizes value for BLAS distance computations
- Getter of block sizes value for BLAS distance computations
- Getter of threshold value on nx above which we switch to BLAS to compute distances
- Getter of number of results we switch to a reservoir to collect results rather than a heap
- global var that collects all statists
- Get the error message of the last failed operation performed by Faiss. The given pointer is only invalid until another Faiss function is called.
- Build and index with the sequence of processing steps described in the string.
- simplified interface
- Compute pairwise distances between sets of vectors
- Compute pairwise distances between sets of vectors arguments from “faiss_pairwise_L2sqr” ldq equal -1 by default ldb equal -1 by default ldd equal -1 by default
- Read index from a file. This is equivalent to
faiss:read_index
when a file descriptor is given. - Read index from a file. This is equivalent to
faiss:read_index_binary
when a file descriptor is given. - Read index from a file. This is equivalent to
faiss:read_index_binary
when a file path is given. - Read index from a file. This is equivalent to
faiss:read_index
when a file path is given. - Setter of block sizes value for BLAS distance computations
- Setter of block sizes value for BLAS distance computations
- Setter of threshold value on nx above which we switch to BLAS to compute distances
- Setter of number of results we switch to a reservoir to collect results rather than a heap
- Write index to a file. This is equivalent to
faiss::write_index
when a file descriptor is provided. - Write index to a file. This is equivalent to
faiss::write_index_binary
when a file descriptor is provided. - Write index to a file. This is equivalent to
faiss::write_index_binary
when a file path is provided. - Write index to a file. This is equivalent to
faiss::write_index
when a file path is provided.