Skip to main content

ColumnData

Struct ColumnData 

Source
pub struct ColumnData<C: ColumnCursor> {
    pub len: usize,
    pub slabs: SpanTree<Slab, C::SlabIndex>,
    /* private fields */
}
Expand description

A compressed, mutable column of optional typed values.

ColumnData<C> stores a sequence of Option<C::Item> values using the encoding determined by cursor type C. Data is held internally in a SpanTree of Slabs; modifications replace individual slabs, leaving the rest untouched.

§Common cursor types

UIntCursor, IntCursor, StrCursor, ByteCursor, BooleanCursor, DeltaCursor, RawCursor.

§Example

use hexane::{ColumnData, UIntCursor};
use std::borrow::Cow;

let mut col: ColumnData<UIntCursor> = ColumnData::new();
col.splice(0, 0, [1u64, 2, 3]);
assert_eq!(col.get(1), Some(Some(Cow::Owned(2))));
assert_eq!(col.to_vec(), vec![Some(1), Some(2), Some(3)]);

Fields§

§len: usize§slabs: SpanTree<Slab, C::SlabIndex>

Implementations§

Source§

impl<C: ColumnCursor> ColumnData<C>

Source

pub fn byte_len(&self) -> usize

Total number of bytes used by all slabs (encoded, compressed size).

Source

pub fn get(&self, index: usize) -> Option<Option<Cow<'_, C::Item>>>

Returns the value at index, or None if the index is out of bounds.

The inner Option is None for null entries and Some(value) otherwise. This is O(log n + B) where B is the number of encoded runs in the target slab. For multiple sequential reads prefer ColumnData::iter or ColumnData::iter_range.

Source

pub fn get_acc_delta( &self, index1: usize, index2: usize, ) -> (Acc, Option<Cow<'_, C::Item>>)

Returns the change in accumulator between index1 and index2, together with the item at index2.

Panics if index1 > index2.

Source

pub fn get_acc(&self, index: usize) -> Acc

Returns the cumulative Acc for all items before index (i.e. the sum of agg(item) for items 0..index).

Source

pub fn get_with_acc( &self, index: usize, ) -> Option<ColGroupItem<'_, <C as ColumnCursor>::Item>>

Returns the item at index together with the Acc value immediately before it, or None if the index is out of bounds.

Source

pub fn is_empty(&self) -> bool

Returns true if every item in the column is null (None) or, for BooleanCursor, if every value is false.

An empty column (len() == 0) is also considered empty.

Source

pub fn dump(&self)

Source

pub fn and_remap<F>(self, f: F) -> Self
where F: Fn(Option<Cow<'_, C::Item>>) -> Option<Cow<'_, C::Item>>,

Returns a new column with every item transformed by f.

Equivalent to consuming self and re-encoding all items through f. For an in-place version see ColumnData::remap.

Source

pub fn remap<F>(&mut self, f: F)
where F: Fn(Option<Cow<'_, C::Item>>) -> Option<Cow<'_, C::Item>>,

Replaces the column with a re-encoded version where every item has been transformed by f. For a consuming version see ColumnData::and_remap.

Source

pub fn save_to_unless_empty(&self, out: &mut Vec<u8>) -> Range<usize>

Like save_to but writes nothing if is_empty returns true, returning an empty range at the current end of out.

Source

pub fn save_to(&self, out: &mut Vec<u8>) -> Range<usize>

Serializes the column by appending encoded bytes to out.

Returns the byte range written (out[range] is the serialized column data). The output is compatible with ColumnData::load. If the column is empty (zero items), nothing is written and an empty range is returned.

Source

pub fn raw_reader(&self, advance: usize) -> RawReader<'_, C::SlabIndex>

Source§

impl<C: ColumnCursor> ColumnData<C>

Source

pub fn run_iter(&self) -> impl Iterator<Item = Run<'_, C::Item>>

Iterates over the raw Runs in the column.

Each Run has a count and an optional value. This gives lower-level access to the RLE structure than iter() — useful for re-encoding or bulk inspection.

Source

pub fn to_vec(&self) -> Vec<C::Export>

Decodes all items into a Vec. Primarily useful for testing and debugging.

Source

pub fn iter(&self) -> ColumnDataIter<'_, C>

Returns a forward iterator over all items in the column.

The iterator decodes one slab at a time, carrying state across items within each slab for amortized O(1) per-item cost after an O(log n) initial seek. For a sub-range use iter_range.

Source

pub fn scope_to_value<B, R>(&self, value: Option<B>, range: R) -> Range<usize>
where B: Borrow<C::Item> + Copy + Debug, R: RangeBounds<usize>, C::Item: Ord,

Returns the contiguous index range where value appears within range.

Requires that the values in range are sorted. Uses B-tree binary search over slabs followed by a linear scan within the target slab. Returns an empty range at the found position if value is not present.

For repeated lookups on the same iterator use ColumnDataIter::seek_to_value.

Source

pub fn iter_range(&self, range: Range<usize>) -> ColumnDataIter<'_, C>

Returns an iterator over items in range, clamped to the column’s length.

Source

pub fn new() -> Self

Creates a new, empty column.

Source

pub fn save(&self) -> Vec<u8>

Serializes the column to a new Vec<u8>. See also save_to.

Source

pub fn push<'b, M>(&mut self, value: M) -> Acc
where M: MaybePackable<'b, C::Item> + Clone, C::Item: 'b,

Appends a single value to the end of the column.

Returns the Acc value of the appended item. For bulk appends at the end, extend is more efficient.

Source

pub fn extend<'b, M, I>(&mut self, values: I) -> Acc
where M: MaybePackable<'b, C::Item>, I: IntoIterator<Item = M>, C::Item: 'b,

Appends multiple values to the end of the column.

Returns the total Acc contributed by the appended values.

Source

pub fn splice<'b, M, I>(&mut self, index: usize, del: usize, values: I) -> Acc
where M: MaybePackable<'b, C::Item>, I: IntoIterator<Item = M>, C::Item: 'b,

Removes del items starting at index and inserts values in their place.

This is the primary mutation method. It finds the slab containing index in O(log n), re-encodes the affected slab with the deletion/insertion applied, then replaces it in the B-tree. Unaffected slabs are not touched.

Returns the accumulated Acc of the inserted values.

Panics if index > self.len().

Source

pub fn fill_if_empty(&mut self, len: usize) -> bool

If the column is currently empty, fills it with len null values and returns true. If the column already has items, returns false without modifying it.

Source

pub fn init_empty(len: usize) -> Self

Creates a column of len null values.

Source

pub fn load_unless_empty(data: &[u8], len: usize) -> Result<Self, PackError>

Deserializes data, or returns a column of len nulls if data is empty.

Returns PackError::InvalidLength if the decoded column has a different length than len.

Source

pub fn load_with_unless_empty<F>( data: &[u8], len: usize, test: &F, ) -> Result<Self, PackError>
where F: Fn(Option<&C::Item>) -> Option<String>,

Like load_unless_empty but also validates each value with test. If test returns Some(msg), decoding fails with PackError::InvalidValue.

Source

pub fn load(data: &[u8]) -> Result<Self, PackError>

Deserializes a column from bytes produced by save / save_to.

Returns a PackError if the bytes are malformed or use the wrong encoding.

Source

pub fn load_with<F>(data: &[u8], test: &F) -> Result<Self, PackError>
where F: Fn(Option<&C::Item>) -> Option<String>,

Like load but validates each decoded value with test. If test returns Some(msg), decoding fails with PackError::InvalidValue.

Source

pub fn len(&self) -> usize

Returns the number of items in the column (including nulls).

Source

pub fn acc(&self) -> Acc

Returns the total accumulated Acc for the entire column (sum of agg(item) for every non-null item).

Source§

impl<C: ColumnCursor> ColumnData<C>
where C::SlabIndex: HasMinMax,

Source

pub fn find_by_range( &self, range: Range<usize>, ) -> impl Iterator<Item = usize> + '_

Returns an iterator over the indices of items whose value falls within range.

Uses slab-level min/max metadata to skip slabs that cannot contain matching values, making this efficient for sparse matches. Requires that the cursor type supports min/max tracking (HasMinMax).

Source

pub fn find_by_value<A: Into<Agg>>( &self, agg: A, ) -> impl Iterator<Item = usize> + '_

Returns an iterator over the indices of items whose Agg value equals agg.

Uses slab-level min/max metadata to skip non-matching slabs. Requires that the cursor type supports min/max tracking (HasMinMax).

Trait Implementations§

Source§

impl<C: Clone + ColumnCursor> Clone for ColumnData<C>
where C::SlabIndex: Clone,

Source§

fn clone(&self) -> ColumnData<C>

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl<C: Debug + ColumnCursor> Debug for ColumnData<C>
where C::SlabIndex: Debug,

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl<C: ColumnCursor> Default for ColumnData<C>

Source§

fn default() -> Self

Returns the “default value” for a type. Read more
Source§

impl<'a, C, M> From<Vec<M>> for ColumnData<C>
where C: ColumnCursor, M: MaybePackable<'a, C::Item>, C::Item: 'a,

Source§

fn from(i: Vec<M>) -> Self

Converts to this type from the input type.
Source§

impl<'a, C, M> FromIterator<M> for ColumnData<C>
where C: ColumnCursor, M: MaybePackable<'a, C::Item>, C::Item: 'a,

Source§

fn from_iter<I: IntoIterator<Item = M>>(iter: I) -> Self

Creates a value from an iterator. Read more
Source§

impl<C: ColumnCursor> PartialEq for ColumnData<C>

Source§

fn eq(&self, other: &Self) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.

Auto Trait Implementations§

§

impl<C> Freeze for ColumnData<C>
where <C as ColumnCursor>::SlabIndex: Freeze,

§

impl<C> RefUnwindSafe for ColumnData<C>

§

impl<C> Send for ColumnData<C>
where C: Send, <C as ColumnCursor>::SlabIndex: Send,

§

impl<C> Sync for ColumnData<C>
where C: Sync, <C as ColumnCursor>::SlabIndex: Sync,

§

impl<C> Unpin for ColumnData<C>
where C: Unpin, <C as ColumnCursor>::SlabIndex: Unpin,

§

impl<C> UnsafeUnpin for ColumnData<C>

§

impl<C> UnwindSafe for ColumnData<C>

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.