Trait lance_encoding::decoder::LogicalPageScheduler
source · pub trait LogicalPageScheduler: Send + Sync + Debug {
// Required methods
fn schedule_ranges(
&self,
ranges: &[Range<u32>],
scheduler: &Arc<dyn EncodingsIo>,
sink: &UnboundedSender<Box<dyn LogicalPageDecoder>>
) -> Result<()>;
fn schedule_take(
&self,
indices: &[u32],
scheduler: &Arc<dyn EncodingsIo>,
sink: &UnboundedSender<Box<dyn LogicalPageDecoder>>
) -> Result<()>;
fn num_rows(&self) -> u32;
}
Expand description
A scheduler for a field’s worth of data
Each page of incoming data maps to one LogicalPageScheduler
instance. However, this
page may map to many pages transitively. For example, one page of struct data may cover
many pages of primitive child data. In fact, the entire file is treated as one page
of SimpleStruct data.
The scheduler is responsible for calculating the neccesary I/O. One schedule_range request could trigger mulitple batches of I/O across multiple columns. The scheduler should emit decoders into the sink as quickly as possible.
As soon as a batch of data that can decoded then the scheduler should emit a decoder in the “unloaded” state. The decode stream will pull the decoder and start decoding.
The order in which decoders are emitted is important. Pages should be emitted in row-major order allowing decode of complete rows as quickly as possible.
The LogicalPageScheduler
should be stateless and Send
and Sync
. This is
because it might need to be shared. For example, a list page has a reference to
the page schedulers for its items column. This is shared with the follow-up I/O
task created when the offsets are loaded.
See crate::decoder
for more information
Required Methods§
sourcefn schedule_ranges(
&self,
ranges: &[Range<u32>],
scheduler: &Arc<dyn EncodingsIo>,
sink: &UnboundedSender<Box<dyn LogicalPageDecoder>>
) -> Result<()>
fn schedule_ranges( &self, ranges: &[Range<u32>], scheduler: &Arc<dyn EncodingsIo>, sink: &UnboundedSender<Box<dyn LogicalPageDecoder>> ) -> Result<()>
Schedules I/O for the requested portions of the page.
Note: ranges
must be ordered and non-overlapping
TODO: Support unordered or overlapping ranges in file scheduler
sourcefn schedule_take(
&self,
indices: &[u32],
scheduler: &Arc<dyn EncodingsIo>,
sink: &UnboundedSender<Box<dyn LogicalPageDecoder>>
) -> Result<()>
fn schedule_take( &self, indices: &[u32], scheduler: &Arc<dyn EncodingsIo>, sink: &UnboundedSender<Box<dyn LogicalPageDecoder>> ) -> Result<()>
Schedules I/O for the requested rows (identified by row offsets from start of page) TODO: implement this using schedule_ranges