Expand description
Scalar Int8 Quantization for Embedding Retrieval
Implements the scalar int8 rescoring retriever specification with:
- 4x memory reduction (f32 -> i8)
- 99% accuracy retention with rescoring
- 3.66x speedup via SIMD acceleration
§References
- Jacob et al. (2018) - Quantization and Training of Neural Networks
- Gholami et al. (2022) - Survey of Quantization Methods
- Wu et al. (2020) - Integer Quantization Principles
§Toyota Way Principles
- Jidoka: Auto-stop on quantization error > threshold
- Poka-Yoke: Type-safe precision levels, compile-time checks
- Heijunka: Batched rescoring with backpressure
- Kaizen: Continuous calibration improvement
- Genchi Genbutsu: Hardware-specific benchmarks
- Muda: 4x memory reduction via quantization
Structs§
- Calibration
Stats - Calibration statistics for quantization
- Quantization
Params - Quantization parameters for int8 scalar quantization
- Quantized
Embedding - Int8 quantized embedding with metadata
- Rescore
Result - Result from rescoring retrieval
- Rescore
Retriever - Two-stage rescoring retriever
- Rescore
Retriever Config - Two-stage rescoring retriever configuration
Enums§
- Quantization
Error - Error types for quantization operations (Jidoka halt conditions)
- Simd
Backend - SIMD backend selection (Jidoka auto-detection)
Functions§
- compute_
hash - Compute content hash for integrity verification (Poka-Yoke)
- dot_
i8_ scalar - Scalar dot product fallback
- validate_
embedding - Validate an embedding for common error conditions (Poka-Yoke)