Skip to main content

Scheme

Trait Scheme 

Source
pub trait Scheme:
    Debug
    + Send
    + Sync {
    // Required methods
    fn scheme_name(&self) -> &'static str;
    fn matches(&self, canonical: &Canonical) -> bool;
    fn expected_compression_ratio(
        &self,
        _data: &mut ArrayAndStats,
        _ctx: CompressorContext,
    ) -> CompressionEstimate;
    fn compress(
        &self,
        compressor: &CascadingCompressor,
        data: &mut ArrayAndStats,
        ctx: CompressorContext,
    ) -> Result<ArrayRef, VortexError>;

    // Provided methods
    fn stats_options(&self) -> GenerateStatsOptions { ... }
    fn num_children(&self) -> usize { ... }
    fn descendant_exclusions(&self) -> Vec<DescendantExclusion> { ... }
    fn ancestor_exclusions(&self) -> Vec<AncestorExclusion> { ... }
}
Expand description

A single compression encoding that the CascadingCompressor can select from.

The compressor evaluates every registered scheme whose matches returns true for a given array, picks the one with the highest expected_compression_ratio, and calls compress on the winner.

One of the key features of the compressor in this crate is that schemes may “cascade”. A scheme’s compress can call back into the compressor via CascadingCompressor::compress_child to compress child or transformed arrays, building up multiple encoding layers (e.g. frame-of-reference and then bit-packing).

§Scheme IDs

Every scheme has a globally unique name returned by scheme_name. The SchemeExt::id method (auto-implemented, cannot be overridden) wraps that name in an opaque SchemeId used for equality, hashing, and exclusion rules (see below).

§Cascading and children

Schemes that produce child arrays for further compression must declare num_children > 0. Each child should be identified by a stable index. Cascading schemes should use CascadingCompressor::compress_child to compress each child array, which handles cascade level / budget tracking and context management automatically.

No scheme may appear twice in a cascade (descendant) chain (enforced by the compressor). This keeps the search space a tree.

§Exclusion rules

Schemes declare exclusion rules to prevent incompatible scheme combinations in the cascade chain:

  • descendant_exclusions (push): “exclude scheme X from my child Y’s subtree.” Used when the declaring scheme knows about the excluded scheme.
  • ancestor_exclusions (pull): “exclude me if ancestor X’s child Y is above me.” Used when the declaring scheme knows about the ancestor.

We do this because different schemes will live in different crates, and we cannot know the dependency direction ahead of time.

§Implementing a scheme

expected_compression_ratio should return CompressionEstimate::Deferred(DeferredEstimate::Sample) when a cheap heuristic is not available, asking the compressor to estimate via sampling. Implementors should return an immediate CompressionEstimate::Verdict when possible.

Schemes that need statistics that may be expensive to compute should override stats_options to declare what they require. The compressor merges all eligible schemes’ options before generating stats, so each stat is always computed at most once for a given array.

Required Methods§

Source

fn scheme_name(&self) -> &'static str

The globally unique name for this scheme (e.g. "vortex.int.bitpacking").

Source

fn matches(&self, canonical: &Canonical) -> bool

Whether this scheme can compress the given canonical array.

Source

fn expected_compression_ratio( &self, _data: &mut ArrayAndStats, _ctx: CompressorContext, ) -> CompressionEstimate

Cheaply estimate the compression ratio for this scheme on the given array.

This method should be fast and infallible. Any expensive or fallible work should be deferred to the compressor by returning CompressionEstimate::Deferred(DeferredEstimate::Sample) or CompressionEstimate::Deferred(DeferredEstimate::Callback(...)).

The compressor will ask all schemes what their expected compression ratio is given the array and statistics. The scheme with the highest estimated ratio will then be applied to the entire array.

CompressionEstimate::Verdict means the scheme already knows the terminal crate::estimate::EstimateVerdict. CompressionEstimate::Deferred(DeferredEstimate::Sample) asks the compressor to sample. CompressionEstimate::Deferred(DeferredEstimate::Callback(...)) asks the compressor to run custom deferred work. Deferred callbacks must return a crate::estimate::EstimateVerdict directly, never another deferred request.

Note that the compressor will also use this method when compressing samples, so some statistics that might hold for the samples may not hold for the entire array (e.g., Constant). Implementations should check ctx.is_sample to make sure that they are returning the correct information.

The compressor guarantees that empty and all-null arrays are handled before this method is called. Implementations may assume the array has at least one valid element. However, a constant scheme should still be registered with the compressor to detect single-value arrays that are not all-null.

Source

fn compress( &self, compressor: &CascadingCompressor, data: &mut ArrayAndStats, ctx: CompressorContext, ) -> Result<ArrayRef, VortexError>

Compress the array using this scheme.

§Errors

Returns an error if compression fails.

Provided Methods§

Source

fn stats_options(&self) -> GenerateStatsOptions

Returns the stats generation options this scheme requires. The compressor merges all eligible schemes’ options before generating stats so that a single stats pass satisfies every scheme.

Source

fn num_children(&self) -> usize

The number of child arrays this scheme produces when cascading. Returns 0 for leaf schemes that produce a final encoded array.

Source

fn descendant_exclusions(&self) -> Vec<DescendantExclusion>

Schemes to exclude from specific children’s subtrees (push direction).

Each rule says: “when I cascade through child Y, do not use scheme X anywhere in that subtree.” Only meaningful when num_children > 0.

Source

fn ancestor_exclusions(&self) -> Vec<AncestorExclusion>

Ancestors that make this scheme ineligible (pull direction).

Each rule says: “if ancestor X cascaded through child Y somewhere above me in the chain, do not try me.”

Trait Implementations§

Source§

impl Hash for dyn Scheme

Source§

fn hash<H>(&self, state: &mut H)
where H: Hasher,

Feeds this value into the given Hasher. Read more
Source§

impl PartialEq for dyn Scheme

Source§

fn eq(&self, other: &(dyn Scheme + 'static)) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
Source§

impl Eq for dyn Scheme

Implementors§