Trait lance_encoding::decoder::LogicalPageScheduler

source ·
pub trait LogicalPageScheduler: Send + Sync + Debug {
    // Required methods
    fn schedule_ranges(
        &self,
        ranges: &[Range<u32>],
        scheduler: &Arc<dyn EncodingsIo>,
        sink: &UnboundedSender<Box<dyn LogicalPageDecoder>>
    ) -> Result<()>;
    fn schedule_take(
        &self,
        indices: &[u32],
        scheduler: &Arc<dyn EncodingsIo>,
        sink: &UnboundedSender<Box<dyn LogicalPageDecoder>>
    ) -> Result<()>;
    fn num_rows(&self) -> u32;
}
Expand description

A scheduler for a field’s worth of data

Each page of incoming data maps to one LogicalPageScheduler instance. However, this page may map to many pages transitively. For example, one page of struct data may cover many pages of primitive child data. In fact, the entire file is treated as one page of SimpleStruct data.

The scheduler is responsible for calculating the neccesary I/O. One schedule_range request could trigger mulitple batches of I/O across multiple columns. The scheduler should emit decoders into the sink as quickly as possible.

As soon as a batch of data that can decoded then the scheduler should emit a decoder in the “unloaded” state. The decode stream will pull the decoder and start decoding.

The order in which decoders are emitted is important. Pages should be emitted in row-major order allowing decode of complete rows as quickly as possible.

The LogicalPageScheduler should be stateless and Send and Sync. This is because it might need to be shared. For example, a list page has a reference to the page schedulers for its items column. This is shared with the follow-up I/O task created when the offsets are loaded.

See crate::decoder for more information

Required Methods§

source

fn schedule_ranges( &self, ranges: &[Range<u32>], scheduler: &Arc<dyn EncodingsIo>, sink: &UnboundedSender<Box<dyn LogicalPageDecoder>> ) -> Result<()>

Schedules I/O for the requested portions of the page.

Note: ranges must be ordered and non-overlapping TODO: Support unordered or overlapping ranges in file scheduler

source

fn schedule_take( &self, indices: &[u32], scheduler: &Arc<dyn EncodingsIo>, sink: &UnboundedSender<Box<dyn LogicalPageDecoder>> ) -> Result<()>

Schedules I/O for the requested rows (identified by row offsets from start of page) TODO: implement this using schedule_ranges

source

fn num_rows(&self) -> u32

The number of rows covered by this page

Implementors§