Crate lnmp_quant

Crate lnmp_quant 

Source
Expand description

§lnmp-quant

Quantization and compression for LNMP embedding vectors.

This crate provides efficient quantization schemes to compress embedding vectors while maintaining high semantic accuracy. Multiple schemes available:

  • QInt8: 4x compression with ~99% accuracy
  • QInt4: 8x compression with ~95-97% accuracy
  • Binary: 32x compression with ~85-90% similarity

§Quick Start

use lnmp_quant::{quantize_embedding, dequantize_embedding, QuantScheme};
use lnmp_embedding::Vector;

// Create an embedding
let embedding = Vector::from_f32(vec![0.12, -0.45, 0.33]);

// Quantize to QInt8
let quantized = quantize_embedding(&embedding, QuantScheme::QInt8).unwrap();
println!("Original size: {} bytes", embedding.dim * 4);
println!("Quantized size: {} bytes", quantized.data_size());
println!("Compression ratio: {:.1}x", quantized.compression_ratio());

// Dequantize back
let restored = dequantize_embedding(&quantized).unwrap();

§Quantization Schemes

  • QInt8: 8-bit signed integer quantization (4x compression, ~99% accuracy)
  • QInt4: 4-bit packed quantization (8x compression, ~95-97% accuracy)
  • Binary: 1-bit sign-based quantization (32x compression, ~85-90% similarity)
  • FP16: Half-precision float (2x compression, ~99.9% accuracy, near-lossless)

Re-exports§

pub use decode::dequantize_embedding;
pub use encode::quantize_embedding;
pub use error::QuantError;
pub use metrics::QuantMetrics;
pub use scheme::QuantScheme;
pub use vector::QuantizedVector;

Modules§

adaptive
Adaptive quantization module
batch
Batch quantization module
binary
Binary: 1-bit quantization for maximum compression
decode
encode
error
fp16
FP16: Half-precision floating-point quantization
metrics
qint4
QInt4: 4-bit quantization with nibble packing
scheme
vector