Kernel

Trait Kernel 

Source
pub trait Kernel {
    // Required method
    fn step(
        &mut self,
        ctx: &KernelContext,
        selected: BitView<'_>,
        out: &mut ViewMut<'_>,
    ) -> VortexResult<()>;

    // Provided method
    fn seek(&mut self, chunk_idx: usize) -> VortexResult<()> { ... }
}
Expand description

A pipeline provides a push-based way to emit a stream of canonical data.

By passing multiple vector computations through the same pipeline, we can amortize the setup costs (such as DType validation, stats short-circuiting, etc.), and to make better use of CPU caches by performing all operations while the data is hot.

By passing a mask into the step function, we give encodings visibility into the data that will be read by their parents. Some encodings may choose to decode all N elements, and then set the given selection mask on the output vector. Other encodings may choose to only unpack the selected elements.

We are considering further adding a defined parameter that indicates which elements are defined and will be interpreted by the parent. This differs from masking, in that undefined elements should still live in the correct location, it just doesn’t matter what their value is. This will allow, e.g. a validity encoding to tell its children that the values in certain positions are going to be masked out anyway, so don’t bother doing any expensive compute.

Required Methods§

Source

fn step( &mut self, ctx: &KernelContext, selected: BitView<'_>, out: &mut ViewMut<'_>, ) -> VortexResult<()>

Attempts to perform a single step of the pipeline, writing data to the output vector. Returns Poll::Done if the pipeline is complete, or Poll::Pending if buffers are required to continue.

The selected parameter defines which elements of the chunk should be exported, where None indicates that all elements are selected.

Provided Methods§

Source

fn seek(&mut self, chunk_idx: usize) -> VortexResult<()>

Seek the kernel to a specific chunk offset.

Note this will be called on all kernels in a pipeline.

i.e. the resulting row offset is idx * N, where N is the number of elements in a chunk.

The reason for a separate seek function (vs passing an offset directly to step) is that it allows the pipeline to optimize for sequential access patterns, which is common in many encodings. For example, a run-length encoding can efficiently seek to the start of a chunk without needing to perform a full binary search of the ends in each step.

Implementors§