Crate diskann_quantization

Expand description

A collection of utilities and algorithms for training quantizers, compressing data, and performing distance computations on compressed data.

§Quantizers

The currently implemented quantizers are listed below. Refer to the documentation within each module for more information.

Note that the capabilities of each quantizer are varied and some aspects are still in progress. If you have any questions or need something implemented, please reach out!

Scalar: Compressing dimensions to 8-bits and lower.
Binary: Light weight 1-bit compression.
Product: Compressing multiple dimensions into a single code. NOTE: Support for product quantization is currently limited as the implementation is spread between this crate and the original implementation in DiskANN.
Spherical: A compression technique similar to scalar quantization, put with the characteristic that vectors are normalized, transformed, and then projected onto a unit sphere.

This class of algorithms is based off the RabitQ paper and associated work.
MinMax: A compression similar to scalar quantization, but where the the coefficients to quantize are computed on a per-vector basis; allowing this to be used in a streaming setting (i.e. without any need for training).

§BitSlice

A primary abstraction introduced is a BitSlice. Think of it as a Rust slice, but for packing together bits. It currently supports unsigned integers of widths 1 to 8. The borrowed version of the slice consists of a pointer (with a lifetime) and a length and is therefore just 16-bytes in size and amenable to the niche optimization. It also comes in a boxed variant for convenience.

Some trickery is involved in the constructor to ensure that the pointer can only ever be obtained in a safe way.

§Distances over BitSlices

Distances over bit-slices are a key-component for building efficient distances for binary and scalar quantized vectors.

Performance characteristics of the implemented functions are listed here:

crate::distances::SquaredL2/crate::distances::InnerProduct: Fast SIMD implementations available for 4 and 8-bit vectors. Fallback implementations are used for 2, 3, 5, 6, and 7-bit vectors. Scalar pop-count used for 1-bit distances.
crate::distances::Hamming: Scalar pop-count used.

§Adding New Quantizers

Adding new quantizers is a fairly straight-forward.

Implement training using a similar style to existing quantizers by:

A. Define a quantizer that will hold the final any central quantization state.

B. Implement a training module with training parameters and an associated train method (available either as a trait or inherhent method, whichever makes more sense).

Determine the representation for compressed vectors. This can be done using combinations of crate::bits::BitSlices, utilities in the ownership module and more.

Ensure that this representation implements diskann_utils::Reborrow and diskann_utils::ReborrowMut so that working with generalized references types is sane.
Implement distance functors (e.g. crate::scalar::CompensatedIP) to enable distance computations on compressed vectors and between compress and full-precision vectors.

Additionally, implement crate::AsFunctor for the central quantizer for each distance functor.

§Design Considerations

In this section, we will cover some of the design philosophy regarding quantizers, particularly how this interacts with vector search and data providers.

§No Inherent Methods for Distances in Quantizers

Inherent methods for distances on quantizer structs such as

fn distance_l2(&self, full_precision: &[f32], quantized: &[u8]) -> f32
fn distance_qq_l2(&self, quantized_0: &[u8], quantized_1: &[u8]) -> f32

do not work well for a number of reasons:

Inherent methods are unavailable to call via trait restrictions on generic arguments.
Fixed argument/return types limit flexibility. One example is marking the difference between diskann_vector::MathematicalValue and diskann_vector::SimilarityScore.

The solution to accepting multiple argument/return types is exactly the approach taken in this crate, i.e. using auxiliary diskann_vector::DistanceFunction implementations on light-weight types (e.g. crate::scalar::CompensatedIP).
This approach does not work when hefty query preprocessing can make distance computations faster, such as with table-based PQ implementations.

Instead, creating a diskann_vector::PreprocessedDistanceFunction can be used.
Does not really extend well to performing batches of distance computations when that can be computed more efficiently.

§No Common Trait for Quantizer-based Distances

Creating a trait for quantizer-based distances would solve point number 1 above about usability in generic contexts, but the other points still remain. I argue that a functor-based interface is generally more flexible and useful.

Additionally, not all quantizers implement the same distance functions. For example, binary quantization only uses crate::distances::Hamming.

§No Common Trait for Training

The requirements for training between scalar quantization and produce quantization are sufficiently different that using a common training interface feels overly restrictive.

§Thoughts on Canonical Layouts for Compressed Vector Representations

As much as possible, we try to avoid using raw &[u8] as the representation for compressed vectors in favor of richer types such as crate::bits::BitSlice and crate::scalar::CompensatedVectorRef (though product-quantization still uses this for historical reasons). This is because compressed representations may need to contain more than just a raw encoding (e.g. the compensation coefficient for scalar quantization) and using a richer type makes this less error prone.

But this opens a can of worms regarding storage within the data provider. Lets take, for example, crate::scalar::CompensatedVector, which consists of a slice for scalar quantization codes and a compensation coefficient. The provider can choose to store these two components in separate data structures or fuse them together into a single block of memory (note that crate::scalar::MutCompensatedVectorRef is defined so that both the codes and compensation coefficient can be updated in-place. The provider may also wish to perform cache-line padding.

Because of the myriad of ways data can be stored and retrieved, the utilities in this crate instead focus on providing tools to express the compressed state that allow for this design space exploration on the data provider level.

Eventually, however, we may wish to impose canonical layouts for these representations within the quantization crate if that turns out to be necessary.

Modules§

algorithms
alloc
binary: Binary quantization compression and distance comparisions.
bits: Utilities for working with packed 1-bit to 8-bit integers (inclusive).
cancel: Utilities to support cancelation of long-running operations.
distances: Zero-sized types representing distance functions.
error: Formatting utilities for error chains.
meta
minmax: MinMax Quantization
multi_vector: Multi-vector matrix types and distance functions.
num: Number types with limited dynamic range.
product: Product quantization training and compression.
random
scalar: Scalar quantization training, compression data, and distance comparisons.
spherical
views

Macros§

check_lengths: Check that x.len() == y.len(), returning Err(diskann_vector::distances::UnequalLengths) if the results are different.
poly

Enums§

Parallelism: Selector for the parallelization strategy used by some algorithms.

Traits§

AsFunctor: Create a distance function (i.e. diskann_vector::DistanceFunction) from a quantizer.
CompressInto: A trait implemented by a quantizer, indicating that it is capable of quantizing the contents oF From into To.
CompressIntoWith: A trait implemented by a quantizer, indicating that it is capable of quantizing the contents oF From into To with additional help from an argument of type A.