Module encoder

Source
Expand description

The top-level encoding module for Lance files.

Lance files are encoded using a FieldEncodingStrategy which choose what encoder to use for each field.

The current strategy is the StructuralEncodingStrategy which uses “structural” encoding. A tree of encoders is built up for each field. The struct & list encoders simply pull off the validity and offsets and collect them. Then, in the primitive leaf encoder the validity, offsets, and values are accumulated in an accumulation buffer. Once enough data has been collected the primitive encoder will either use a miniblock encoding or a full zip encoding to create a page of data from the accumulation buffer.

Structs§

BatchEncoder
A batch encoder that encodes RecordBatch objects by delegating to field encoders for each top-level field in the batch.
ColumnIndexSequence
Keeps track of the current column index and makes a mapping from field id to column index
EncodedBatch
An encoded batch of data and a page table describing it
EncodedColumn
EncodedPage
An encoded page of data
EncodingOptions
Options that control the encoding process
OutOfLineBuffers
A tool to reserve space for buffers that are not in-line with the data
StructuralEncodingStrategy
An encoding strategy used for 2.1+ files

Constants§

MIN_PAGE_BUFFER_ALIGNMENT
The minimum alignment for a page buffer. Writers must respect this.

Traits§

FieldEncoder
Top level encoding trait to code any Arrow array type into one or more pages.
FieldEncodingStrategy
A trait to pick which kind of field encoding to use for a field

Functions§

default_encoding_strategy
default_encoding_strategy_with_params
Create an encoding strategy with user-configured compression parameters
encode_batch
Helper method to encode a batch of data into memory

Type Aliases§

EncodeTask
A task to create a page of data