vq 0.2.0-alpha.1

Vq

Vq (v[ector] q[uantizer]) is a vector quantization library for Rust. It provides implementations of popular quantization algorithms, including binary quantization (BQ), scalar quantization (SQ), product quantization (PQ), and tree-structured vector quantization (TSVQ).

Vector quantization is a technique to reduce the size of high-dimensional vectors by approximating them with a smaller set of representative vectors. It can be used for various applications such as data compression and nearest neighbor search to reduce the memory footprint and speed up search.

Features

A simple and generic API for all quantizers
Can reduce storage size by up to 75%
Good performance via SIMD acceleration, multi-threading, and zero-copying
Support for multiple distances including Euclidean, cosine, and Manhattan distances
Python bindings via PyVq package

See ROADMAP.md for the list of implemented and planned features.

[!IMPORTANT] Vq is in early development, so bugs and breaking changes are expected. Please use the issues page to report bugs or request features.

Supported Algorithms

Algorithm	Training Complexity	Quantization Complexity	Supported Distances	Input Type	Output Type	Compression
BQ	$O(1)$	$O(nd)$	—	`&[f32]`	`Vec<u8>`	75%
SQ	$O(1)$	$O(nd)$	—	`&[f32]`	`Vec<u8>`	75%
PQ	$O(nkd)$	$O(nd)$	All	`&[f32]`	`Vec<f16>`	50%
TSVQ	$O(n \log k)$	$O(d \log k)$	All	`&[f32]`	`Vec<f16>`	50%

$n$: number of vectors
$d$: dimensionality of vectors
$k$: number of centroids or clusters

Getting Started

Installation

Add vq to your Cargo.toml:

cargo add vq --features parallel simd

[!NOTE] The parallel and simd features enables multi-threading support and SIMD acceleration support for training phase of PQ and TSVQ algorithms. This can significantly speed up training time, especially for large datasets. Note that the enable simd feature a modern C compiler (like GCC or Clang) that supports C11 standard is needed.

Vq requires Rust 1.85 or later.

Python Bindings

Python bindings for Vq are available via PyVq package.

pip install pyvq

For more information, check out the pyvq directory.

Documentation

Check out the latest API documentation on docs.rs.

Quick Example

Here's a simple example using the BQ and SQ algorithms to quantize vectors:

use vq::{BinaryQuantizer, ScalarQuantizer, Quantizer, VqResult};

fn main() -> VqResult<()> {
    // Binary quantization
    let bq = BinaryQuantizer::new(0.0, 0, 1)?;
    let quantized = bq.quantize(&[0.5, -0.3, 0.8])?;

    // Scalar quantization
    let sq = ScalarQuantizer::new(0.0, 1.0, 256)?;
    let quantized = sq.quantize(&[0.1, 0.5, 0.9])?;

    Ok(())
}

Product Quantizer Example

use vq::{ProductQuantizer, Distance, VqResult};

fn main() -> VqResult<()> {
    // Training data (each inner slice is a vector)
    let training: Vec<Vec<f32>> = (0..100)
        .map(|i| (0..10).map(|j| ((i + j) % 50) as f32).collect())
        .collect();
    let training_refs: Vec<&[f32]> = training.iter().map(|v| v.as_slice()).collect();

    // Train the quantizer
    let pq = ProductQuantizer::new(
        &training_refs,
        2,  // m: number of subspaces
        4,  // k: centroids per subspace
        10, // max_iters
        Distance::Euclidean,
        42, // seed
    )?;

    // Quantize a vector
    let quantized = pq.quantize(&training[0])?;

    Ok(())
}

Contributing

See CONTRIBUTING.md for details on how to make a contribution.

License

Vq is available under either of the following licenses:

MIT License (LICENSE-MIT)
Apache License, Version 2.0 (LICENSE-APACHE)

Acknowledgements

This project uses Hsdlib library for SIMD acceleration.