Trait Scheme

Source

pub trait Scheme:
    Debug
    + Send
    + Sync {
    // Required methods
    fn scheme_name(&self) -> &'static str;
    fn matches(&self, canonical: &Canonical) -> bool;
    fn expected_compression_ratio(
        &self,
        _data: &mut ArrayAndStats,
        _ctx: CompressorContext,
    ) -> CompressionEstimate;
    fn compress(
        &self,
        compressor: &CascadingCompressor,
        data: &mut ArrayAndStats,
        ctx: CompressorContext,
    ) -> VortexResult<ArrayRef>;

    // Provided methods
    fn stats_options(&self) -> GenerateStatsOptions { ... }
    fn num_children(&self) -> usize { ... }
    fn descendant_exclusions(&self) -> Vec<DescendantExclusion> { ... }
    fn ancestor_exclusions(&self) -> Vec<AncestorExclusion> { ... }
}

Expand description

A single compression encoding that the CascadingCompressor can select from.

The compressor evaluates every registered scheme whose matches returns true for a given array, picks the one with the highest expected_compression_ratio, and calls compress on the winner.

One of the key features of the compressor in this crate is that schemes may “cascade”. A scheme’s compress can call back into the compressor via CascadingCompressor::compress_child to compress child or transformed arrays, building up multiple encoding layers (e.g. frame-of-reference and then bit-packing).

§Scheme IDs

Every scheme has a globally unique name returned by scheme_name. The SchemeExt::id method (auto-implemented, cannot be overridden) wraps that name in an opaque SchemeId used for equality, hashing, and exclusion rules (see below).

§Cascading and children

Schemes that produce child arrays for further compression must declare num_children > 0. Each child should be identified by a stable index. Cascading schemes should use CascadingCompressor::compress_child to compress each child array, which handles cascade level / budget tracking and context management automatically.

No scheme may appear twice in a cascade (descendant) chain (enforced by the compressor). This keeps the search space a tree.

§Exclusion rules

Schemes declare exclusion rules to prevent incompatible scheme combinations in the cascade chain:

descendant_exclusions (push): “exclude scheme X from my child Y’s subtree.” Used when the declaring scheme knows about the excluded scheme.
ancestor_exclusions (pull): “exclude me if ancestor X’s child Y is above me.” Used when the declaring scheme knows about the ancestor.

We do this because different schemes will live in different crates, and we cannot know the dependency direction ahead of time.

§Implementing a scheme

expected_compression_ratio should return CompressionEstimate::Sample when a cheap heuristic is not available, asking the compressor to estimate via sampling. Implementors should return a more specific variant when possible (e.g. CompressionEstimate::AlwaysUse for constant detection or CompressionEstimate::Skip for early rejection based on stats).

Schemes that need statistics that may be expensive to compute should override stats_options to declare what they require. The compressor merges all eligible schemes’ options before generating stats, so each stat is always computed at most once for a given array.

Required Methods§

Source

fn scheme_name(&self) -> &'static str

The globally unique name for this scheme (e.g. "vortex.int.bitpacking").

Source

fn matches(&self, canonical: &Canonical) -> bool

Whether this scheme can compress the given canonical array.

Source

fn expected_compression_ratio( &self, _data: &mut ArrayAndStats, _ctx: CompressorContext, ) -> CompressionEstimate

Cheaply estimate the compression ratio for this scheme on the given array.

This method should be fast and infallible. Any expensive or fallible work should be deferred to the compressor by returning CompressionEstimate::Sample or CompressionEstimate::Estimate.

The compressor will ask all schemes what their expected compression ratio is given the array and statistics. The scheme with the highest estimated ratio will then be applied to the entire array.

Note that the compressor will also use this method when compressing samples, so some statistics that might hold for the samples may not hold for the entire array (e.g., Constant). Implementations should check ctx.is_sample to make sure that they are returning the correct information.

The compressor guarantees that empty and all-null arrays are handled before this method is called. Implementations may assume the array has at least one valid element. However, a constant scheme should still be registered with the compressor to detect single-value arrays that are not all-null.

Source

fn compress( &self, compressor: &CascadingCompressor, data: &mut ArrayAndStats, ctx: CompressorContext, ) -> VortexResult<ArrayRef>

Compress the array using this scheme.

§Errors

Returns an error if compression fails.

Provided Methods§

Source

fn stats_options(&self) -> GenerateStatsOptions

Returns the stats generation options this scheme requires. The compressor merges all eligible schemes’ options before generating stats so that a single stats pass satisfies every scheme.

Source