pub struct Codebook { /* private fields */ }Expand description
Immutable lookup table mapping quantized u8 indices to f32 values.
Construction always validates three invariants:
entries.len() == 1 << bit_width(matchesCodecConfig::num_codebook_entries).- Entries are strictly ascending under
f32::total_cmp. - All entries are distinct (no adjacent equals).
The inner buffer is an Arc<[f32]> so Clone is O(1). Equality
compares the numerical contents, not the allocation identity.
Implementations§
Source§impl Codebook
impl Codebook
Sourcepub fn new(entries: Box<[f32]>, bit_width: u8) -> Result<Self, CodecError>
pub fn new(entries: Box<[f32]>, bit_width: u8) -> Result<Self, CodecError>
Build a codebook from a caller-owned Box<[f32]>.
§Errors
CodecError::CodebookEntryCount—entries.len()does not equal2^bit_width.CodecError::CodebookNotSorted— entries are not in strictly ascending order (excluding duplicates).CodecError::CodebookDuplicate— two adjacent entries compare equal.
Sourcepub fn train(vectors: &[f32], config: &CodecConfig) -> Result<Self, CodecError>
pub fn train(vectors: &[f32], config: &CodecConfig) -> Result<Self, CodecError>
Train a codebook by uniform-quantile estimation over a flattened f32 sample buffer.
Mirrors Python’s
np.quantile(flat.astype(np.float64), np.linspace(0, 1, num_entries)).astype(np.float32) exactly:
- Promote every sample to
f64. - Sort with
f64::total_cmp. - For each
kin0..num_entries, compute the linearly- interpolated quantile value inf64. - Cast to
f32(round-to-nearest-even) and enforce distinctness.
config.bit_width determines the number of entries; config.seed
and config.dimension are not consulted by this function.
§Errors
CodecError::InsufficientTrainingData—vectorsis empty or produces fewer thannum_entriesdistinct quantile representatives.- Any error from
Codebook::newon the freshly-built entries.
Sourcepub fn num_entries(&self) -> u32
pub fn num_entries(&self) -> u32
Number of entries (2^bit_width).
Sourcepub fn quantize_into(
&self,
values: &[f32],
indices: &mut [u8],
) -> Result<(), CodecError>
pub fn quantize_into( &self, values: &[f32], indices: &mut [u8], ) -> Result<(), CodecError>
Quantize values into indices by finding the nearest entry for
each value. Ties favor the right (higher-valued) neighbor, matching
Python’s strict < tie-break.
Under feature = "simd" this delegates to
[crate::codec::simd_api::quantize_into], which is the single
source of truth for dispatch selection. Without the feature,
it calls the scalar reference kernel directly.
§Errors
CodecError::LengthMismatch—values.len() != indices.len().
Sourcepub fn dequantize_into(
&self,
indices: &[u8],
values: &mut [f32],
) -> Result<(), CodecError>
pub fn dequantize_into( &self, indices: &[u8], values: &mut [f32], ) -> Result<(), CodecError>
Dequantize indices into values by gathering the corresponding
codebook entries.
Under feature = "simd" this delegates to
[crate::codec::simd_api::dequantize_into].
§Errors
CodecError::LengthMismatch—indices.len() != values.len().CodecError::IndexOutOfRange— any index is>= num_entries().