pub trait Kernel {
// Required method
fn step(
&mut self,
ctx: &KernelContext,
selected: BitView<'_>,
out: &mut ViewMut<'_>,
) -> VortexResult<()>;
// Provided method
fn seek(&mut self, chunk_idx: usize) -> VortexResult<()> { ... }
}
Expand description
A pipeline provides a push-based way to emit a stream of canonical data.
By passing multiple vector computations through the same pipeline, we can amortize the setup costs (such as DType validation, stats short-circuiting, etc.), and to make better use of CPU caches by performing all operations while the data is hot.
By passing a mask into the step
function, we give encodings visibility into the data that
will be read by their parents. Some encodings may choose to decode all N
elements, and then
set the given selection mask on the output vector. Other encodings may choose to only unpack
the selected elements.
We are considering further adding a defined
parameter that indicates which elements are
defined and will be interpreted by the parent. This differs from masking, in that undefined
elements should still live in the correct location, it just doesn’t matter what their value
is. This will allow, e.g. a validity encoding to tell its children that the values in certain
positions are going to be masked out anyway, so don’t bother doing any expensive compute.
Required Methods§
Sourcefn step(
&mut self,
ctx: &KernelContext,
selected: BitView<'_>,
out: &mut ViewMut<'_>,
) -> VortexResult<()>
fn step( &mut self, ctx: &KernelContext, selected: BitView<'_>, out: &mut ViewMut<'_>, ) -> VortexResult<()>
Attempts to perform a single step of the pipeline, writing data to the output vector.
Returns Poll::Done
if the pipeline is complete, or Poll::Pending
if buffers are
required to continue.
The selected
parameter defines which elements of the chunk should be exported, where
None
indicates that all elements are selected.
Provided Methods§
Sourcefn seek(&mut self, chunk_idx: usize) -> VortexResult<()>
fn seek(&mut self, chunk_idx: usize) -> VortexResult<()>
Seek the kernel to a specific chunk offset.
Note this will be called on all kernels in a pipeline.
i.e. the resulting row offset is idx * N
, where N
is the number of elements in a chunk.
The reason for a separate seek function (vs passing an offset directly to step
) is that
it allows the pipeline to optimize for sequential access patterns, which is common in
many encodings. For example, a run-length encoding can efficiently seek to the start of a
chunk without needing to perform a full binary search of the ends in each step.