Function resolve_codec

Source
pub fn resolve_codec(
    input: &[u64],
    spec: CodecSpec,
) -> Result<Encoding, IntVecError>
Expand description

Resolves a user-provided CodecSpec into a concrete Encoding variant.

This function is the core of the codec selection mechanism. It translates the user’s high-level request (the spec) into a fully-parameterized, concrete Encoding that can be used for compression.

If the spec includes requests for automatic parameter selection (e.g., CodecSpec::Auto or variants with None parameters), this function analyzes the provided input data slice to determine the optimal settings.

§Arguments

  • input: The data slice used to determine optimal parameters for automatic selection. This is ignored for specs with fully-fixed parameters.
  • spec: The CodecSpec indicating the desired codec and parameter settings.

§Returns

A Result containing the concrete Encoding variant or an IntVecError::InvalidParameters if the configuration is invalid.

§Heuristics and Justification for Automatic Selection

When a CodecSpec variant with None parameters or CodecSpec::Auto is provided, this function uses data-driven heuristics.

  • CodecSpec::FixedLength { num_bits: None }: The function scans the entire input slice to find the maximum value. It then calculates the minimum number of bits required to represent this value. A full scan is necessary here to guarantee correctness.

  • CodecSpec::Rice { log2_b: None } / CodecSpec::Golomb { b: None }: These codes are optimal for geometrically distributed data. This function computes the average of the input data to estimate the optimal parameter.

  • CodecSpec::Zeta { k: None }, Pi { k: None }, ExpGolomb { k: None }: These fall back to reasonable default parameters (k=3, k=3, k=2 respectively).

  • CodecSpec::Auto: This triggers the most sophisticated heuristic. It uses a dynamic sampling strategy to balance analysis speed and accuracy:

    1. For small inputs (<= 10,000 elements), it analyzes the entire dataset. This gives a perfect statistical profile, ensuring the best possible codec choice without a noticeable performance penalty.
    2. For larger inputs, it takes a uniform sample of ~10,000 elements by selecting values at regular intervals across the entire input slice. This provides a high-quality, representative sample while ensuring the analysis step remains extremely fast, regardless of input size.
    3. Based on this analysis, it uses the CodesStats utility to select the variable-length code predicted to be the most space-efficient.