Expand description
Codec selection for variable-length integer compression.
This module defines VariableCodecSpec
, an enum for controlling the
compression strategy of an IntVec
. The choice of codec is a critical
performance parameter, as its effectiveness depends on the statistical
properties of the data being compressed.
§Codec Selection Strategy
Codec selection is performed by a statistical analysis of the entire input dataset at construction time.
The VariableCodecSpec
enum provides several ways to specify the compression method:
- Explicit Specification: A specific codec and all its parameters are provided. This is suitable when the data characteristics are known in advance.
use compressed_intvec::prelude::*;
let data: &[u32] = &(0..1000).collect::<Vec<_>>();
// Explicitly specify a non-parametric codec
let delta_vec: UIntVec<u32> = IntVec::builder()
.codec(VariableCodecSpec::Delta)
.k(16)
.build(&data)
.unwrap();
// Explicitly specify a parametric codec with a fixed parameter
let zeta_vec: UIntVec<u32> = IntVec::builder()
.codec(VariableCodecSpec::Zeta { k: Some(3) })
.build(&data)
.unwrap();
- Automatic Parameter Estimation: A specific codec family is chosen, but
the optimal parameter is determined by the builder based on a full data
analysis. This is achieved by providing
None
as the parameter value.- Example:
Rice { log2_b: None }
will find the bestlog2_b
for the given data.
- Example:
use compressed_intvec::prelude::*;
let data: &[u32] = &(0..1000).collect::<Vec<_>>();
// Automatically select the best Rice parameter
let rice_vec: UIntVec<u32> = IntVec::builder()
.codec(VariableCodecSpec::Rice { log2_b: None })
.build(&data)
.unwrap();
- Fully Automatic Selection: The builder analyzes the data against all
available codecs and their standard parameter ranges to find the single
best configuration. This is activated by using
VariableCodecSpec::Auto
.
use compressed_intvec::prelude::*;
let data: &[u32] = &(0..1000).collect::<Vec<_>>();
// Automatically select the best codec and parameters for the data
let auto_vec: UIntVec<u32> = IntVec::builder()
.codec(VariableCodecSpec::Auto)
.build(&data)
.unwrap();
§Analysis Mechanism
The selection logic uses the CodesStats
utility from the dsi-bitstream
crate. For a given sequence of integers, CodesStats
calculates the exact
total bit cost for encoding the sequence with a wide range of instantaneous
codes and their common parameterizations.
§Construction Overhead
The full-dataset analysis has a one-time computational cost at construction.
The complexity is O(N * C)
, where N
is the number of elements in the
input and C
is the number of codec configurations tested by CodesStats
(approximately 70).
This trade-off is suitable for read-heavy workloads where a higher initial cost is acceptable for better compression and subsequent read performance.
§Implementation Notes
- The parameter ranges for codecs like Zeta and Rice are defined by the
const generics
of theCodesStats
struct indsi-bitstream
. The default values cover common and effective parameter ranges. - If a data distribution benefits from a parameter outside of the tested
range (e.g., Zeta with
k=20
), it must be specified explicitly in the builder via.codec(VariableCodecSpec::Zeta { k: Some(20) })
.
Enums§
- Variable
Codec Spec - Specifies the compression codec and its parameters for an
IntVec
.