Skip to main content

LiquidByteViewArray

Struct LiquidByteViewArray 

Source
pub struct LiquidByteViewArray<B: FsstBacking> { /* private fields */ }
Expand description

An array that stores strings using the FSST format with compact offsets:

  • Dictionary keys with 2-byte keys stored in memory
  • Compact offsets with variable-size residuals (1, 2, or 4 bytes) stored in memory
  • Per-value prefix keys (7-byte prefix + len) stored in memory
  • FSST buffer can be stored in memory or on disk

§Initialization

The recommended way to create a LiquidByteViewArray is using the from_*_array constructors which build a compact (offset + prefix key) representation directly from Arrow inputs.

let liquid_array = LiquidByteViewArray::from_string_array(&input, compressor);

Data access flow:

  1. Use dictionary key to index into compact offsets buffer
  2. Reconstruct actual offset from linear regression (predicted + residual)
  3. Use prefix keys for quick comparisons to avoid decompression when possible
  4. Decompress bytes from FSST buffer to get the full value when needed

Implementations§

Source§

impl LiquidByteViewArray<FsstArray>

Source

pub fn compare_with(&self, needle: &[u8], op: &ByteViewOperator) -> BooleanArray

Compare with prefix optimization and fallback to Arrow operations

Source§

impl<B: FsstBacking> LiquidByteViewArray<B>

Source

pub fn prefix_compare_counts( &self, needle: &[u8], op: &Comparison, ) -> (usize, usize, usize)

Return (selected_rows, ambiguous_rows, unique_rows) based on prefix-only comparison.

Source§

impl<B: FsstBacking> LiquidByteViewArray<B>

Source

pub fn from_string_view_array( array: &StringViewArray, compressor: Arc<Compressor>, ) -> LiquidByteViewArray<FsstArray>

Create a LiquidByteViewArray from an Arrow StringViewArray

Source

pub fn from_binary_view_array( array: &BinaryViewArray, compressor: Arc<Compressor>, ) -> LiquidByteViewArray<FsstArray>

Create a LiquidByteViewArray from an Arrow BinaryViewArray

Source

pub fn from_string_array( array: &StringArray, compressor: Arc<Compressor>, ) -> LiquidByteViewArray<FsstArray>

Create a LiquidByteViewArray from an Arrow StringArray

Source

pub fn from_binary_array( array: &BinaryArray, compressor: Arc<Compressor>, ) -> LiquidByteViewArray<FsstArray>

Create a LiquidByteViewArray from an Arrow BinaryArray

Source

pub fn train_from_string_view( array: &StringViewArray, ) -> (Arc<Compressor>, LiquidByteViewArray<FsstArray>)

Train a compressor from an Arrow StringViewArray

Source

pub fn train_from_binary_view( array: &BinaryViewArray, ) -> (Arc<Compressor>, LiquidByteViewArray<FsstArray>)

Train a compressor from an Arrow BinaryViewArray

Source

pub fn train_from_arrow<T: ByteArrayType>( array: &GenericByteArray<T>, ) -> (Arc<Compressor>, LiquidByteViewArray<FsstArray>)

Train a compressor from an Arrow ByteArray.

Source

pub unsafe fn from_unique_dict_array( array: &DictionaryArray<UInt16Type>, compressor: Arc<Compressor>, ) -> LiquidByteViewArray<FsstArray>

Only used when the dictionary is read from a trusted parquet reader, which reads a trusted parquet file, written by a trusted writer.

§Safety

The caller must ensure that the values in the dictionary are unique.

Source

pub fn train_from_arrow_dict( array: &DictionaryArray<UInt16Type>, ) -> (Arc<Compressor>, LiquidByteViewArray<FsstArray>)

Train a compressor from an Arrow DictionaryArray.

Source

pub fn train_compressor<'a, T: ArrayAccessor<Item = &'a str>>( array: ArrayIter<T>, ) -> Arc<Compressor>

Train a compressor from an iterator of strings

Source

pub fn train_compressor_bytes<'a, T: ArrayAccessor<Item = &'a [u8]>>( array: ArrayIter<T>, ) -> Arc<Compressor>

Train a compressor from an iterator of byte arrays

Source§

impl LiquidByteViewArray<FsstArray>

Source

pub fn from_bytes( bytes: Bytes, compressor: Arc<Compressor>, ) -> LiquidByteViewArray<FsstArray>

Deserialize a LiquidByteViewArray from bytes.

Source§

impl<B: FsstBacking> LiquidByteViewArray<B>

Source

pub fn nulls(&self) -> Option<&NullBuffer>

Get the nulls buffer

Source

pub fn get_detailed_memory_usage(&self) -> ByteViewArrayMemoryUsage

Get detailed memory usage of the byte view array

Source

pub fn len(&self) -> usize

Get the length of the array

Source

pub fn is_empty(&self) -> bool

Is the array empty?

Source§

impl LiquidByteViewArray<FsstArray>

Source

pub fn to_dict_arrow(&self) -> DictionaryArray<UInt16Type>

Convert to Arrow DictionaryArray

Source

pub fn to_arrow_array(&self) -> ArrayRef

Convert to Arrow array with original type

Source

pub fn is_fsst_buffer_on_disk(&self) -> bool

Check if the FSST buffer is currently stored on disk

Source§

impl LiquidByteViewArray<DiskBuffer>

Source

pub fn is_fsst_buffer_on_disk(&self) -> bool

Check if the FSST buffer is currently stored on disk

Source

pub async fn to_dict_arrow(&self) -> DictionaryArray<UInt16Type>

Convert to Arrow DictionaryArray

Source

pub async fn to_arrow_array(&self) -> ArrayRef

Convert to Arrow array with original type

Trait Implementations§

Source§

impl<B: Clone + FsstBacking> Clone for LiquidByteViewArray<B>

Source§

fn clone(&self) -> LiquidByteViewArray<B>

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl<B: FsstBacking> Debug for LiquidByteViewArray<B>

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl LiquidArray for LiquidByteViewArray<FsstArray>

Source§

fn as_any(&self) -> &dyn Any

Get the underlying any type.
Source§

fn get_array_memory_size(&self) -> usize

Get the memory size of the Liquid array.
Source§

fn len(&self) -> usize

Get the length of the Liquid array.
Source§

fn to_arrow_array(&self) -> ArrayRef

Convert the Liquid array to an Arrow array.
Source§

fn to_best_arrow_array(&self) -> ArrayRef

Convert the Liquid array to an Arrow array. Except that it will pick the best encoding for the arrow array. Meaning that it may not obey the data type of the original arrow array.
Source§

fn try_eval_predicate( &self, expr: &Arc<dyn PhysicalExpr>, filter: &BooleanBuffer, ) -> Option<BooleanArray>

Try to evaluate a predicate on the Liquid array with a filter. Returns None if the predicate is not supported. Read more
Source§

fn to_bytes(&self) -> Vec<u8>

Serialize the Liquid array to a byte array.
Source§

fn original_arrow_data_type(&self) -> DataType

Get the original arrow data type of the Liquid array.
Source§

fn data_type(&self) -> LiquidDataType

Get the logical data type of the Liquid array.
Source§

fn squeeze( &self, io: Arc<dyn SqueezeIoHandler>, squeeze_hint: Option<&CacheExpression>, ) -> Option<(LiquidSqueezedArrayRef, Bytes)>

Squeeze the Liquid array to a LiquidHybridArrayRef and a bytes::Bytes. Return None if the Liquid array cannot be squeezed. Read more
Source§

fn filter(&self, selection: &BooleanBuffer) -> ArrayRef

Filter the Liquid array with a boolean array and return an arrow array.
Source§

fn is_empty(&self) -> bool

Check if the Liquid array is empty.
Source§

impl LiquidSqueezedArray for LiquidByteViewArray<DiskBuffer>

Source§

fn as_any(&self) -> &dyn Any

Get the underlying any type.

Source§

fn get_array_memory_size(&self) -> usize

Get the memory size of the Liquid array.

Source§

fn len(&self) -> usize

Get the length of the Liquid array.

Source§

fn is_empty(&self) -> bool

Check if the Liquid array is empty.

Source§

fn to_arrow_array<'life0, 'async_trait>( &'life0 self, ) -> Pin<Box<dyn Future<Output = ArrayRef> + Send + 'async_trait>>
where Self: 'async_trait, 'life0: 'async_trait,

Convert the Liquid array to an Arrow array.

Source§

fn data_type(&self) -> LiquidDataType

Get the logical data type of the Liquid array.

Source§

fn filter<'life0, 'life1, 'async_trait>( &'life0 self, selection: &'life1 BooleanBuffer, ) -> Pin<Box<dyn Future<Output = ArrayRef> + Send + 'async_trait>>
where Self: 'async_trait, 'life0: 'async_trait, 'life1: 'async_trait,

Filter the Liquid array with a boolean array and return an arrow array.

Source§

fn try_eval_predicate<'life0, 'life1, 'life2, 'async_trait>( &'life0 self, expr: &'life1 Arc<dyn PhysicalExpr>, filter: &'life2 BooleanBuffer, ) -> Pin<Box<dyn Future<Output = Option<BooleanArray>> + Send + 'async_trait>>
where Self: 'async_trait, 'life0: 'async_trait, 'life1: 'async_trait, 'life2: 'async_trait,

Try to evaluate a predicate on the Liquid array with a filter. Returns Ok(None) if the predicate is not supported.

Note that the filter is a boolean buffer, not a boolean array, i.e., filter can’t be nullable. The returned boolean mask is nullable if the the original array is nullable.

Source§

fn original_arrow_data_type(&self) -> DataType

Get the original arrow data type of the Liquid squeezed array.
Source§

fn to_best_arrow_array<'life0, 'async_trait>( &'life0 self, ) -> Pin<Box<dyn Future<Output = ArrayRef> + Send + 'async_trait>>
where Self: 'async_trait, 'life0: 'async_trait,

Convert the Liquid array to an Arrow array. Except that it will pick the best encoding for the arrow array. Meaning that it may not obey the data type of the original arrow array.
Source§

fn disk_backing(&self) -> SqueezedBacking

Describe how the squeezed array persists its backing bytes on disk.

Auto Trait Implementations§

§

impl<B> Freeze for LiquidByteViewArray<B>
where B: Freeze,

§

impl<B> RefUnwindSafe for LiquidByteViewArray<B>
where B: RefUnwindSafe,

§

impl<B> Send for LiquidByteViewArray<B>
where B: Send,

§

impl<B> Sync for LiquidByteViewArray<B>
where B: Sync,

§

impl<B> Unpin for LiquidByteViewArray<B>
where B: Unpin,

§

impl<B> UnsafeUnpin for LiquidByteViewArray<B>
where B: UnsafeUnpin,

§

impl<B> UnwindSafe for LiquidByteViewArray<B>
where B: UnwindSafe,

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> FromRef<T> for T
where T: Clone,

Source§

fn from_ref(input: &T) -> T

Converts to this type from a reference to the input type.
Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> IntoRequest<T> for T

Source§

fn into_request(self) -> Request<T>

Wrap the input message T in a tonic::Request
Source§

impl<L> LayerExt<L> for L

Source§

fn named_layer<S>(&self, service: S) -> Layered<<L as Layer<S>>::Service, S>
where L: Layer<S>,

Applies the layer to a service and wraps it in Layered.
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

impl<T> Allocation for T
where T: RefUnwindSafe + Send + Sync,