Function resolve_codec

Source

pub fn resolve_codec(
    input: &[u64],
    spec: CodecSpec,
) -> Result<Encoding, IntVecError>

Expand description

Resolves a user-provided CodecSpec into a concrete Encoding variant.

This function is the core of the codec selection mechanism. It translates the user’s high-level request (the spec) into a fully-parameterized, concrete Encoding that can be used for compression.

If the spec includes requests for automatic parameter selection (e.g., CodecSpec::Auto or variants with None parameters), this function analyzes the provided input data slice to determine the optimal settings.

§Arguments

input: The data slice used to determine optimal parameters for automatic selection. This is ignored for specs with fully-fixed parameters.
spec: The CodecSpec indicating the desired codec and parameter settings.

§Returns

A Result containing the concrete Encoding variant or an IntVecError::InvalidParameters if the configuration is invalid.

§Heuristics and Justification for Automatic Selection

When a CodecSpec variant with None parameters or CodecSpec::Auto is provided, this function uses data-driven heuristics.

CodecSpec::FixedLength { num_bits: None }: The function scans the entire input slice to find the maximum value. It then calculates the minimum number of bits required to represent this value. A full scan is necessary here to guarantee correctness.
CodecSpec::Rice { log2_b: None } / CodecSpec::Golomb { b: None }: These codes are optimal for geometrically distributed data. This function computes the average of the input data to estimate the optimal parameter.
CodecSpec::Zeta { k: None }, Pi { k: None }, ExpGolomb { k: None }: These fall back to reasonable default parameters (k=3, k=3, k=2 respectively).
CodecSpec::Auto: This triggers the most sophisticated heuristic. It uses a dynamic sampling strategy to balance analysis speed and accuracy:
1. For small inputs (<= 10,000 elements), it analyzes the entire dataset. This gives a perfect statistical profile, ensuring the best possible codec choice without a noticeable performance penalty.
2. For larger inputs, it takes a uniform sample of ~10,000 elements by selecting values at regular intervals across the entire input slice. This provides a high-quality, representative sample while ensuring the analysis step remains extremely fast, regardless of input size.
3. Based on this analysis, it uses the CodesStats utility to select the variable-length code predicted to be the most space-efficient.

Function resolve_codecCopy item path

§Arguments

§Returns

§Heuristics and Justification for Automatic Selection

Function resolve_codec