pub trait Scheme:
Debug
+ Send
+ Sync {
// Required methods
fn scheme_name(&self) -> &'static str;
fn matches(&self, canonical: &Canonical) -> bool;
fn expected_compression_ratio(
&self,
_data: &mut ArrayAndStats,
_ctx: CompressorContext,
) -> CompressionEstimate;
fn compress(
&self,
compressor: &CascadingCompressor,
data: &mut ArrayAndStats,
ctx: CompressorContext,
) -> VortexResult<ArrayRef>;
// Provided methods
fn stats_options(&self) -> GenerateStatsOptions { ... }
fn num_children(&self) -> usize { ... }
fn descendant_exclusions(&self) -> Vec<DescendantExclusion> { ... }
fn ancestor_exclusions(&self) -> Vec<AncestorExclusion> { ... }
}Expand description
A single compression encoding that the CascadingCompressor can select from.
The compressor evaluates every registered scheme whose matches returns true for a given
array, picks the one with the highest expected_compression_ratio, and calls compress on
the winner.
One of the key features of the compressor in this crate is that schemes may “cascade”. A
scheme’s compress can call back into the compressor via
CascadingCompressor::compress_child to compress child or transformed arrays, building up
multiple encoding layers (e.g. frame-of-reference and then bit-packing).
§Scheme IDs
Every scheme has a globally unique name returned by scheme_name. The SchemeExt::id
method (auto-implemented, cannot be overridden) wraps that name in an opaque SchemeId used
for equality, hashing, and exclusion rules (see below).
§Cascading and children
Schemes that produce child arrays for further compression must declare num_children > 0.
Each child should be identified by a stable index. Cascading schemes should use
CascadingCompressor::compress_child to compress each child array, which handles cascade
level / budget tracking and context management automatically.
No scheme may appear twice in a cascade (descendant) chain (enforced by the compressor). This keeps the search space a tree.
§Exclusion rules
Schemes declare exclusion rules to prevent incompatible scheme combinations in the cascade chain:
descendant_exclusions(push): “exclude scheme X from my child Y’s subtree.” Used when the declaring scheme knows about the excluded scheme.ancestor_exclusions(pull): “exclude me if ancestor X’s child Y is above me.” Used when the declaring scheme knows about the ancestor.
We do this because different schemes will live in different crates, and we cannot know the dependency direction ahead of time.
§Implementing a scheme
expected_compression_ratio should return CompressionEstimate::Sample when a cheap
heuristic is not available, asking the compressor to estimate via sampling. Implementors should
return a more specific variant when possible (e.g. CompressionEstimate::AlwaysUse for
constant detection or CompressionEstimate::Skip for early rejection based on stats).
Schemes that need statistics that may be expensive to compute should override stats_options
to declare what they require. The compressor merges all eligible schemes’ options before
generating stats, so each stat is always computed at most once for a given array.
Required Methods§
Sourcefn scheme_name(&self) -> &'static str
fn scheme_name(&self) -> &'static str
The globally unique name for this scheme (e.g. "vortex.int.bitpacking").
Sourcefn matches(&self, canonical: &Canonical) -> bool
fn matches(&self, canonical: &Canonical) -> bool
Whether this scheme can compress the given canonical array.
Sourcefn expected_compression_ratio(
&self,
_data: &mut ArrayAndStats,
_ctx: CompressorContext,
) -> CompressionEstimate
fn expected_compression_ratio( &self, _data: &mut ArrayAndStats, _ctx: CompressorContext, ) -> CompressionEstimate
Cheaply estimate the compression ratio for this scheme on the given array.
This method should be fast and infallible. Any expensive or fallible work should be deferred
to the compressor by returning CompressionEstimate::Sample or
CompressionEstimate::Estimate.
The compressor will ask all schemes what their expected compression ratio is given the array and statistics. The scheme with the highest estimated ratio will then be applied to the entire array.
Note that the compressor will also use this method when compressing samples, so some
statistics that might hold for the samples may not hold for the entire array (e.g.,
Constant). Implementations should check ctx.is_sample to make sure that they are
returning the correct information.
The compressor guarantees that empty and all-null arrays are handled before this method is called. Implementations may assume the array has at least one valid element. However, a constant scheme should still be registered with the compressor to detect single-value arrays that are not all-null.
Sourcefn compress(
&self,
compressor: &CascadingCompressor,
data: &mut ArrayAndStats,
ctx: CompressorContext,
) -> VortexResult<ArrayRef>
fn compress( &self, compressor: &CascadingCompressor, data: &mut ArrayAndStats, ctx: CompressorContext, ) -> VortexResult<ArrayRef>
Provided Methods§
Sourcefn stats_options(&self) -> GenerateStatsOptions
fn stats_options(&self) -> GenerateStatsOptions
Returns the stats generation options this scheme requires. The compressor merges all eligible schemes’ options before generating stats so that a single stats pass satisfies every scheme.
Sourcefn num_children(&self) -> usize
fn num_children(&self) -> usize
The number of child arrays this scheme produces when cascading. Returns 0 for leaf schemes that produce a final encoded array.
Sourcefn descendant_exclusions(&self) -> Vec<DescendantExclusion>
fn descendant_exclusions(&self) -> Vec<DescendantExclusion>
Schemes to exclude from specific children’s subtrees (push direction).
Each rule says: “when I cascade through child Y, do not use scheme X anywhere in that
subtree.” Only meaningful when num_children > 0.
Sourcefn ancestor_exclusions(&self) -> Vec<AncestorExclusion>
fn ancestor_exclusions(&self) -> Vec<AncestorExclusion>
Ancestors that make this scheme ineligible (pull direction).
Each rule says: “if ancestor X cascaded through child Y somewhere above me in the chain, do not try me.”