pub fn resolve_codec(
input: &[u64],
spec: CodecSpec,
) -> Result<Encoding, IntVecError>
Expand description
Resolves a user-provided CodecSpec
into a concrete Encoding
variant.
This function is the core of the codec selection mechanism. It translates the
user’s high-level request (the spec
) into a fully-parameterized, concrete
Encoding
that can be used for compression.
If the spec
includes requests for automatic parameter selection (e.g.,
CodecSpec::Auto
or variants with None
parameters), this function analyzes
the provided input
data slice to determine the optimal settings.
§Arguments
input
: The data slice used to determine optimal parameters for automatic selection. This is ignored for specs with fully-fixed parameters.spec
: TheCodecSpec
indicating the desired codec and parameter settings.
§Returns
A Result
containing the concrete Encoding
variant or an
IntVecError::InvalidParameters
if the configuration is invalid.
§Heuristics and Justification for Automatic Selection
When a CodecSpec
variant with None
parameters or CodecSpec::Auto
is
provided, this function uses data-driven heuristics.
-
CodecSpec::FixedLength { num_bits: None }
: The function scans the entireinput
slice to find the maximum value. It then calculates the minimum number of bits required to represent this value. A full scan is necessary here to guarantee correctness. -
CodecSpec::Rice { log2_b: None }
/CodecSpec::Golomb { b: None }
: These codes are optimal for geometrically distributed data. This function computes the average of theinput
data to estimate the optimal parameter. -
CodecSpec::Zeta { k: None }
,Pi { k: None }
,ExpGolomb { k: None }
: These fall back to reasonable default parameters (k=3
,k=3
,k=2
respectively). -
CodecSpec::Auto
: This triggers the most sophisticated heuristic. It uses a dynamic sampling strategy to balance analysis speed and accuracy:- For small inputs (<= 10,000 elements), it analyzes the entire dataset. This gives a perfect statistical profile, ensuring the best possible codec choice without a noticeable performance penalty.
- For larger inputs, it takes a uniform sample of ~10,000 elements by selecting values at regular intervals across the entire input slice. This provides a high-quality, representative sample while ensuring the analysis step remains extremely fast, regardless of input size.
- Based on this analysis, it uses the
CodesStats
utility to select the variable-length code predicted to be the most space-efficient.