Skip to main content

DictionaryEncoding

Struct DictionaryEncoding 

Source
pub struct DictionaryEncoding { /* private fields */ }
Expand description

Stores repeated strings efficiently by referencing them with integer codes.

Each unique string appears once in the dictionary. Values are stored as LE u32 indices pointing into that dictionary, refcounted as bytes::Bytes so heap-owned and mmap-backed columns share the same type (revised D7).

Implementations§

Source§

impl DictionaryEncoding

Source

pub fn new(dictionary: Arc<[Arc<str>]>, codes: Vec<u32>) -> Self

Creates a new dictionary encoding from a dictionary and codes (legacy Vec<u32> input).

Source

pub fn from_bytes_storage( dictionary: Arc<[Arc<str>]>, codes_bytes: Bytes, code_count: usize, ) -> Self

Constructs a dictionary encoding from pre-encoded bytes (Phase 3c entry point).

codes_bytes must be code_count * 4 bytes of LE u32 values.

Source

pub fn with_nulls(self, null_bitmap: Vec<u64>) -> Self

Adds a null bitmap to this encoding (legacy Vec<u64> input).

Source

pub fn with_null_bytes(self, null_bitmap: Bytes) -> Self

Adds a pre-encoded null bitmap (Phase 3c entry point).

Source

pub fn len(&self) -> usize

Returns the number of values.

Source

pub fn is_empty(&self) -> bool

Returns whether the encoding is empty.

Source

pub fn dictionary_size(&self) -> usize

Returns the number of unique strings in the dictionary.

Source

pub fn dictionary(&self) -> &Arc<[Arc<str>]>

Returns the dictionary.

Source

pub fn codes_bytes(&self) -> Bytes

Returns the encoded codes as raw LE u32 bytes (always materialised).

Phase 3b: codes storage is bytes::Bytes. Use Self::code_at for indexed access; this returns the raw byte storage for serializers that write the storage out directly.

Source

pub fn as_codes_slice(&self) -> Option<&[u32]>

Returns a direct &[u32] slice when the codes live in RAM.

Source

pub fn as_null_words_slice(&self) -> Option<&[u64]>

Returns a direct &[u64] view of the null bitmap when it lives in RAM. None when there is no null bitmap or the bitmap is mmap-backed.

Source

pub fn code_count(&self) -> usize

Number of u32 codes stored.

Source

pub fn code_at(&self, idx: usize) -> Option<u32>

Returns the code at idx, or None if out of range.

Source

pub fn codes(&self) -> Vec<u32>

Returns the codes as a materialized Vec<u32> (allocates).

Prefer Self::code_at or Self::code_count for reads. This exists for callers that need a contiguous slice and accept the allocation (e.g., legacy serialization paths).

Source

pub fn is_null(&self, index: usize) -> bool

Returns whether the value at index is null.

Source

pub fn get(&self, index: usize) -> Option<&str>

Returns the string value at the given index.

Returns None if the value is null.

Source

pub fn get_code(&self, index: usize) -> Option<u32>

Returns the code at the given index.

Source

pub fn iter(&self) -> impl Iterator<Item = Option<&str>>

Iterates over all values, yielding Option<&str>.

Source

pub fn compression_ratio(&self) -> f64

Returns the compression ratio (original size / compressed size).

Source

pub fn encode(&self, value: &str) -> Option<u32>

Encodes a lookup value into a code, if it exists in the dictionary.

Source

pub fn filter_by_code(&self, predicate: impl Fn(u32) -> bool) -> Vec<usize>

Returns the row offsets where the code matches predicate and the row is not null.

Branches once on the storage variants so the per-row body is a tight loop over native slices in the common in-memory case.

Trait Implementations§

Source§

impl Clone for DictionaryEncoding

Source§

fn clone(&self) -> DictionaryEncoding

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for DictionaryEncoding

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.