lnmp-quant
Quantization and compression for LNMP embedding vectors with minimal accuracy loss
FID Registry: All examples use official Field IDs from
registry/fids.yaml.
Overview
lnmp-quant provides efficient quantization schemes to compress embedding vectors while maintaining high semantic accuracy. It offers a spectrum of compression options from 4x to 32x:
- Multiple schemes: QInt8 (4x), QInt4 (8x), Binary (32x)
- Fast quantization/dequantization (sub-microsecond performance for 512-dim)
- LNMP protocol integration for efficient agent-to-agent communication
- Flexible accuracy trade-offs (99% to 85% similarity preservation)
Key Benefits
| Scheme | Compression | Accuracy | 512-dim Quantize | 512-dim Dequantize |
|---|---|---|---|---|
| FP16 | 2x | ~99.9% | ~300 ns | ~150 ns |
| QInt8 | 4x | ~99% | 1.17 µs | 457 ns |
| QInt4 | 8x | ~95-97% | ~600 ns | ~230 ns |
| Binary | 32x | ~85-90% | ~200 ns | ~100 ns |
Quick Start
Add to your Cargo.toml:
[]
= "0.5.2"
= "0.5.2"
Basic Usage
use ;
use Vector;
// Create an embedding
let embedding = from_f32;
// Quantize to QInt8
let quantized = quantize_embedding?;
println!;
println!;
println!;
// Dequantize back to F32
let restored = dequantize_embedding?;
// Verify accuracy
use SimilarityMetric;
let similarity = embedding.similarity?;
assert!;
LNMP Integration
use ;
use quantize_embedding;
// Quantize an embedding
let quantized = quantize_embedding?;
// Add to LNMP record (F512=embedding from registry)
let mut record = new;
record.add_field;
// Type hint support
let hint = QuantizedEmbedding; // :qv
assert_eq!;
Adaptive Quantization
Automatically select the best scheme based on your requirements:
use ;
// Maximum accuracy (FP16)
let q = quantize_adaptive?;
// High accuracy (QInt8)
let q = quantize_adaptive?;
// Balanced (QInt4)
let q = quantize_adaptive?;
// Compact (Binary)
let q = quantize_adaptive?;
Batch Processing
Efficiently process multiple embeddings with statistics tracking:
use quantize_batch;
let embeddings = vec!;
let result = quantize_batch;
println!;
println!;
for q in result.results
For detailed benchmarks, see PERFORMANCE.md.
Quantization Schemes
FP16Passthrough: Near-Lossless (2x)
- Compression: 2x (half-precision float)
- Accuracy: ~99.9% (near-lossless)
- Use Case: High accuracy with moderate space savings
- Status: ✅ Production Ready
- Note: Hardware-accelerated on modern GPUs/CPUs
QInt8: High Accuracy (4x)
- Range: -128 to 127 (8-bit signed)
- Compression: 4x (F32 → Int8)
- Accuracy: ~99% cosine similarity
- Use Case: General purpose, high accuracy needed
- Status: ✅ Production Ready
QInt4: Balanced (8x)
- Range: 0 to 15 (4-bit unsigned, nibble-packed)
- Compression: 8x (2 values per byte)
- Accuracy: ~95-97% cosine similarity
- Use Case: Large-scale storage, balanced compression
- Status: ✅ Production Ready
Binary: Maximum Compression (32x)
- Range: {0, 1} (1-bit sign-based)
- Compression: 32x (8 values per byte)
- Accuracy: ~85-90% similarity preservation
- Use Case: Similarity search, ANN indexing, maximum compression
- Status: ✅ Production Ready
- Note: Dequantizes to normalized +1/-1 values
FP16Passthrough: Near-Lossless (2x)
- Compression: 2x (half-precision float)
- Accuracy: ~99.9% (near-lossless)
- Status: 🔜 Roadmap
How It Works
Quantization Algorithm (QInt8)
- Min/Max Calculation: Find value range
[min_val, max_val] - Scale Computation:
scale = (max_val - min_val) / 255 - Normalization:
normalized = (value - min_val) / scale - Quantization:
quantized = int8(normalized - 128) - Storage: Pack into byte vector with metadata
Dequantization
- Unpack: Read quantized bytes as
i8values - Reconstruction:
value = (quantized + 128) * scale + min_val - Return: F32 vector with approximate values
Use Cases
🤖 Robot Control
// Brake sensitivity embedding quantized for microsecond transfer
let brake_embedding = from_f32;
let quantized = quantize_embedding?;
// Send over low-latency channel
send_to_controller;
🧠 Multi-Agent Systems
// 30 agents sharing embedding pool with minimal bandwidth
for agent in agents
🌐 Edge AI
// Low bandwidth, high intelligence
let edge_embedding = get_local_embedding;
let quantized = quantize_embedding?;
// 4x smaller payload for network transfer
send_to_cloud;
API Reference
Main Functions
quantize_embedding
Quantizes an F32 embedding vector using the specified scheme.
dequantize_embedding
Dequantizes back to approximate F32 representation.
Types
QuantizedVector
QuantScheme
QuantError
Performance
Benchmarks on standard hardware (512-dimensional embeddings):
QInt8 (Optimized)
quantize_512dim time: [1.17 µs]
dequantize_512dim time: [457 ns]
roundtrip_512dim time: [1.63 µs]
accuracy cosine: >0.99
QInt4 (Nibble-Packed)
quantize_512dim time: [~600 ns]
dequantize_512dim time: [~230 ns]
compression ratio: 8.0x
accuracy cosine: >0.95
Binary (Bit-Packed)
quantize_512dim time: [~200 ns]
dequantize_512dim time: [~100 ns]
compression ratio: 32.0x
accuracy similarity: >0.85
Run benchmarks:
Examples
See examples/ directory:
quant_basic.rs- Basic quantization/dequantizationlnmp_integration.rs- Integration with LNMP recordsquant_debug.rs- Debugging quantization behavior
Run an example:
Testing
# Run all tests
# Run roundtrip tests only
# Run accuracy tests
Roadmap
Completed ✅
- QInt8 quantization (4x compression)
- QInt4 packed quantization (8x compression)
- Binary (1-bit) quantization (32x compression)
- LNMP TypeHint integration (
:qv) - Comprehensive test suite (32 tests)
- Benchmark suite with Criterion
- Sub-microsecond quantization performance
- Codec integration (text & binary)
Future Enhancements
- FP16 passthrough (2x, near-lossless)
- SIMD optimization (AVX2/NEON)
- GPU-accelerated quantization
- Adaptive quantization (auto-select scheme)
- Batch quantization APIs
Contributing
Contributions welcome! Please see CONTRIBUTING.md for guidelines.
License
Licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
Related Crates
lnmp-core- Core LNMP type definitionslnmp-embedding- Vector embedding support with delta encodinglnmp-codec- Binary codec for LNMP protocol