pub struct VarBinViewArray { /* private fields */ }
Expand description
A variable-length binary view array that stores strings and binary data efficiently.
This mirrors the Apache Arrow StringView/BinaryView array encoding and provides an optimized representation for variable-length data with excellent performance characteristics for both short and long strings.
§Data Layout
The array uses a hybrid storage approach with two main components:
- Views buffer: Array of 16-byte
BinaryView
entries (one per logical element) - Data buffers: Shared backing storage for strings longer than 12 bytes
§View Structure
Commonly referred to as “German Strings”, each 16-byte view entry contains either:
- Inlined data: For strings ≤ 12 bytes, the entire string is stored directly in the view
- Reference data: For strings > 12 bytes, contains:
- String length (4 bytes)
- First 4 bytes of string as prefix (4 bytes)
- Buffer index and offset (8 bytes total)
The following ASCII graphic is reproduced verbatim from the Arrow documentation:
┌──────┬────────────────────────┐
│length│ string value │
Strings (len <= 12) │ │ (padded with 0) │
└──────┴────────────────────────┘
0 31 127
┌───────┬───────┬───────┬───────┐
│length │prefix │ buf │offset │
Strings (len > 12) │ │ │ index │ │
└───────┴───────┴───────┴───────┘
0 31 63 95 127
§Examples
use vortex_array::arrays::VarBinViewArray;
use vortex_dtype::{DType, Nullability};
use vortex_array::IntoArray;
// Create from an Iterator<Item = &str>
let array = VarBinViewArray::from_iter_str([
"inlined",
"this string is outlined"
]);
assert_eq!(array.len(), 2);
// Access individual strings
let first = array.bytes_at(0);
assert_eq!(first.as_slice(), b"inlined"); // "short"
let second = array.bytes_at(1);
assert_eq!(second.as_slice(), b"this string is outlined"); // Long string
Implementations§
Source§impl VarBinViewArray
impl VarBinViewArray
Sourcepub fn compact_buffers(&self) -> VortexResult<VarBinViewArray>
pub fn compact_buffers(&self) -> VortexResult<VarBinViewArray>
Returns a compacted copy of the input array, where all wasted space has been cleaned up. This operation can be very expensive, in the worst cast copying all existing string data into a new allocation.
After slicing/taking operations VarBinViewArray
s can continue to hold references to buffers
that are no longer visible. We detect when there is wasted space in any of the buffers, and if
so, will aggressively compact all visile outlined string data into a single new buffer.
Source§impl VarBinViewArray
impl VarBinViewArray
Sourcepub unsafe fn new_unchecked(
views: Buffer<BinaryView>,
buffers: Arc<[ByteBuffer]>,
dtype: DType,
validity: Validity,
) -> Self
pub unsafe fn new_unchecked( views: Buffer<BinaryView>, buffers: Arc<[ByteBuffer]>, dtype: DType, validity: Validity, ) -> Self
Build a new VarBinViewArray
from components with validation.
§Safety
This should only be used when you know for certain that all components are already validated, for example during array operations that preserve the invariants of the encoding.
See VarBinViewArray::try_new
for a safe constructor that does validation.
pub fn new( views: Buffer<BinaryView>, buffers: Arc<[ByteBuffer]>, dtype: DType, validity: Validity, ) -> Self
pub fn try_new( views: Buffer<BinaryView>, buffers: Arc<[ByteBuffer]>, dtype: DType, validity: Validity, ) -> VortexResult<Self>
Sourcepub fn views(&self) -> &Buffer<BinaryView>
pub fn views(&self) -> &Buffer<BinaryView>
Access to the primitive views buffer.
Variable-sized binary view buffer contain a “view” child array, with 16-byte entries that
contain either a pointer into one of the array’s owned buffer
s OR an inlined copy of
the string (if the string has 12 bytes or fewer).
Sourcepub fn bytes_at(&self, index: usize) -> ByteBuffer
pub fn bytes_at(&self, index: usize) -> ByteBuffer
Access value bytes at a given index
Will return a ByteBuffer
containing the data without performing a copy.
Sourcepub fn buffer(&self, idx: usize) -> &ByteBuffer
pub fn buffer(&self, idx: usize) -> &ByteBuffer
Access one of the backing data buffers.
§Panics
This method panics if the provided index is out of bounds for the set of buffers provided at construction time.
Sourcepub fn buffers(&self) -> &Arc<[ByteBuffer]>
pub fn buffers(&self) -> &Arc<[ByteBuffer]>
Iterate over the underlying raw data buffers, not including the views buffer.
Sourcepub fn from_iter<T: AsRef<[u8]>, I: IntoIterator<Item = Option<T>>>(
iter: I,
dtype: DType,
) -> Self
pub fn from_iter<T: AsRef<[u8]>, I: IntoIterator<Item = Option<T>>>( iter: I, dtype: DType, ) -> Self
Accumulate an iterable set of values into our type here.
pub fn from_iter_str<T: AsRef<str>, I: IntoIterator<Item = T>>(iter: I) -> Self
pub fn from_iter_nullable_str<T: AsRef<str>, I: IntoIterator<Item = Option<T>>>( iter: I, ) -> Self
pub fn from_iter_bin<T: AsRef<[u8]>, I: IntoIterator<Item = T>>(iter: I) -> Self
pub fn from_iter_nullable_bin<T: AsRef<[u8]>, I: IntoIterator<Item = Option<T>>>( iter: I, ) -> Self
Methods from Deref<Target = dyn Array>§
Sourcepub fn display_values(&self) -> impl Display
pub fn display_values(&self) -> impl Display
Display logical values of the array
For example, an i16
typed array containing the first five non-negative integers is displayed
as: [0i16, 1i16, 2i16, 3i16, 4i16]
.
§Examples
let array = buffer![0_i16, 1, 2, 3, 4].into_array();
assert_eq!(
format!("{}", array.display_values()),
"[0i16, 1i16, 2i16, 3i16, 4i16]",
)
See also: Array::display_as, DisplayArrayAs, and DisplayOptions.
Sourcepub fn display_as(&self, options: DisplayOptions) -> impl Display
pub fn display_as(&self, options: DisplayOptions) -> impl Display
Display the array as specified by the options.
See DisplayOptions for examples.
Sourcepub fn display_tree(&self) -> impl Display
pub fn display_tree(&self) -> impl Display
Display the tree of encodings of this array as an indented lists.
While some metadata (such as length, bytes and validity-rate) are included, the logical values of the array are not displayed. To view the logical values see Array::display_as and DisplayOptions.
§Examples
let array = buffer![0_i16, 1, 2, 3, 4].into_array();
let expected = "root: vortex.primitive(i16, len=5) nbytes=10 B (100.00%)
metadata: EmptyMetadata
buffer (align=2): 10 B (100.00%)
";
assert_eq!(format!("{}", array.display_tree()), expected);
Sourcepub fn as_opt<V: VTable>(&self) -> Option<&V::Array>
pub fn as_opt<V: VTable>(&self) -> Option<&V::Array>
Returns the array downcast to the given A
.
pub fn is_constant(&self) -> bool
pub fn is_constant_opts(&self, cost: Cost) -> bool
pub fn as_constant(&self) -> Option<Scalar>
Sourcepub fn nbytes(&self) -> u64
pub fn nbytes(&self) -> u64
Total size of the array in bytes, including all children and buffers.
Sourcepub fn to_array_iterator(&self) -> impl ArrayIterator + 'static
pub fn to_array_iterator(&self) -> impl ArrayIterator + 'static
Create an ArrayIterator
over the array.
Sourcepub fn serialize(
&self,
ctx: &ArrayContext,
options: &SerializeOptions,
) -> VortexResult<Vec<ByteBuffer>>
pub fn serialize( &self, ctx: &ArrayContext, options: &SerializeOptions, ) -> VortexResult<Vec<ByteBuffer>>
Serialize the array into a sequence of byte buffers that should be written contiguously. This function returns a vec to avoid copying data buffers.
Optionally, padding can be included to guarantee buffer alignment and ensure zero-copy reads within the context of an external file or stream. In this case, the alignment of the first byte buffer should be respected when writing the buffers to the stream or file.
The format of this blob is a sequence of data buffers, possible with prefixed padding,
followed by a flatbuffer containing an fba::Array
message, and ending with a
little-endian u32 describing the length of the flatbuffer message.
Sourcepub fn to_array_stream(&self) -> impl ArrayStream + 'static
pub fn to_array_stream(&self) -> impl ArrayStream + 'static
Create an ArrayStream
over the array.
Sourcepub fn as_null_typed(&self) -> NullTyped<'_>
pub fn as_null_typed(&self) -> NullTyped<'_>
Downcasts the array for null-specific behavior.
Sourcepub fn as_bool_typed(&self) -> BoolTyped<'_>
pub fn as_bool_typed(&self) -> BoolTyped<'_>
Downcasts the array for bool-specific behavior.
Sourcepub fn as_primitive_typed(&self) -> PrimitiveTyped<'_>
pub fn as_primitive_typed(&self) -> PrimitiveTyped<'_>
Downcasts the array for primitive-specific behavior.
Sourcepub fn as_decimal_typed(&self) -> DecimalTyped<'_>
pub fn as_decimal_typed(&self) -> DecimalTyped<'_>
Downcasts the array for decimal-specific behavior.
Sourcepub fn as_utf8_typed(&self) -> Utf8Typed<'_>
pub fn as_utf8_typed(&self) -> Utf8Typed<'_>
Downcasts the array for utf8-specific behavior.
Sourcepub fn as_binary_typed(&self) -> BinaryTyped<'_>
pub fn as_binary_typed(&self) -> BinaryTyped<'_>
Downcasts the array for binary-specific behavior.
Sourcepub fn as_struct_typed(&self) -> StructTyped<'_>
pub fn as_struct_typed(&self) -> StructTyped<'_>
Downcasts the array for struct-specific behavior.
Sourcepub fn as_list_typed(&self) -> ListTyped<'_>
pub fn as_list_typed(&self) -> ListTyped<'_>
Downcasts the array for list-specific behavior.
Sourcepub fn as_extension_typed(&self) -> ExtensionTyped<'_>
pub fn as_extension_typed(&self) -> ExtensionTyped<'_>
Downcasts the array for extension-specific behavior.
Trait Implementations§
Source§impl ArrayAccessor<[u8]> for VarBinViewArray
impl ArrayAccessor<[u8]> for VarBinViewArray
Source§fn with_iterator<F: for<'a> FnOnce(&mut dyn Iterator<Item = Option<&'a [u8]>>) -> R, R>(
&self,
f: F,
) -> VortexResult<R>
fn with_iterator<F: for<'a> FnOnce(&mut dyn Iterator<Item = Option<&'a [u8]>>) -> R, R>( &self, f: F, ) -> VortexResult<R>
Source§impl AsRef<dyn Array> for VarBinViewArray
impl AsRef<dyn Array> for VarBinViewArray
Source§impl Clone for VarBinViewArray
impl Clone for VarBinViewArray
Source§fn clone(&self) -> VarBinViewArray
fn clone(&self) -> VarBinViewArray
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source
. Read moreSource§impl Debug for VarBinViewArray
impl Debug for VarBinViewArray
Source§impl Deref for VarBinViewArray
impl Deref for VarBinViewArray
Source§impl From<VarBinViewArray> for ArrayRef
impl From<VarBinViewArray> for ArrayRef
Source§fn from(value: VarBinViewArray) -> ArrayRef
fn from(value: VarBinViewArray) -> ArrayRef
Source§impl<'a> FromIterator<Option<&'a [u8]>> for VarBinViewArray
impl<'a> FromIterator<Option<&'a [u8]>> for VarBinViewArray
Source§impl<'a> FromIterator<Option<&'a str>> for VarBinViewArray
impl<'a> FromIterator<Option<&'a str>> for VarBinViewArray
Source§impl FromIterator<Option<String>> for VarBinViewArray
impl FromIterator<Option<String>> for VarBinViewArray
Source§impl FromIterator<Option<Vec<u8>>> for VarBinViewArray
impl FromIterator<Option<Vec<u8>>> for VarBinViewArray
Source§impl IntoArray for VarBinViewArray
impl IntoArray for VarBinViewArray
fn into_array(self) -> ArrayRef
Source§impl ValidityHelper for VarBinViewArray
impl ValidityHelper for VarBinViewArray
Auto Trait Implementations§
impl !Freeze for VarBinViewArray
impl !RefUnwindSafe for VarBinViewArray
impl Send for VarBinViewArray
impl Sync for VarBinViewArray
impl Unpin for VarBinViewArray
impl !UnwindSafe for VarBinViewArray
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left
is true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left(&self)
returns true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read more