Skip to main content

Module quantization

Module quantization 

Source
Expand description

Scalar Int8 Quantization for Embedding Retrieval

Implements the scalar int8 rescoring retriever specification with:

  • 4x memory reduction (f32 -> i8)
  • 99% accuracy retention with rescoring
  • 3.66x speedup via SIMD acceleration

§References

  • Jacob et al. (2018) - Quantization and Training of Neural Networks
  • Gholami et al. (2022) - Survey of Quantization Methods
  • Wu et al. (2020) - Integer Quantization Principles

§Toyota Way Principles

  • Jidoka: Auto-stop on quantization error > threshold
  • Poka-Yoke: Type-safe precision levels, compile-time checks
  • Heijunka: Batched rescoring with backpressure
  • Kaizen: Continuous calibration improvement
  • Genchi Genbutsu: Hardware-specific benchmarks
  • Muda: 4x memory reduction via quantization

Structs§

CalibrationStats
Calibration statistics for quantization
QuantizationParams
Quantization parameters for int8 scalar quantization
QuantizedEmbedding
Int8 quantized embedding with metadata
RescoreResult
Result from rescoring retrieval
RescoreRetriever
Two-stage rescoring retriever
RescoreRetrieverConfig
Two-stage rescoring retriever configuration

Enums§

QuantizationError
Error types for quantization operations (Jidoka halt conditions)
SimdBackend
SIMD backend selection (Jidoka auto-detection)

Functions§

compute_hash
Compute content hash for integrity verification (Poka-Yoke)
dot_i8_scalar
Scalar dot product fallback
validate_embedding
Validate an embedding for common error conditions (Poka-Yoke)