Trait lance_encoding::encoder::FieldEncoder
source · pub trait FieldEncoder: Send {
// Required methods
fn maybe_encode(&mut self, array: ArrayRef) -> Result<Vec<EncodeTask>>;
fn flush(&mut self) -> Result<Vec<EncodeTask>>;
fn finish(&mut self) -> BoxFuture<'_, Result<Vec<EncodedColumn>>>;
fn num_columns(&self) -> u32;
}
Expand description
Top level encoding trait to code any Arrow array type into one or more pages.
The field encoder implements buffering and encoding of a single input column but it may map to multiple output columns. For example, a list array or struct array will be encoded into multiple columns.
Also, fields may be encoded at different speeds. For example, given a struct column with three fields (a boolean field, an int32 field, and a 4096-dimension tensor field) the tensor field is likely to emit encoded pages much more frequently than the boolean field.
Required Methods§
sourcefn maybe_encode(&mut self, array: ArrayRef) -> Result<Vec<EncodeTask>>
fn maybe_encode(&mut self, array: ArrayRef) -> Result<Vec<EncodeTask>>
Buffer the data and, if there is enough data in the buffer to form a page, return an encoding task to encode the data.
This may return more than one task because a single column may be mapped to multiple output columns. For example, if encoding a struct column with three children then up to three tasks may be returned from each call to maybe_encode.
It may also return multiple tasks for a single column if the input array is larger than a single disk page.
It could also return an empty Vec if there is not enough data yet to encode any pages.
sourcefn flush(&mut self) -> Result<Vec<EncodeTask>>
fn flush(&mut self) -> Result<Vec<EncodeTask>>
Flush any remaining data from the buffers into encoding tasks
This may be called intermittently throughout encoding but will always be called once at the end of encoding just before calling finish
sourcefn finish(&mut self) -> BoxFuture<'_, Result<Vec<EncodedColumn>>>
fn finish(&mut self) -> BoxFuture<'_, Result<Vec<EncodedColumn>>>
Finish encoding and return column metadata
This is called only once, after all encode tasks have completed
This returns a Vec because a single field may have created multiple columns
sourcefn num_columns(&self) -> u32
fn num_columns(&self) -> u32
The number of output columns this encoding will create