pub enum Canonical {
Null(NullArray),
Bool(BoolArray),
Primitive(PrimitiveArray),
Decimal(DecimalArray),
Struct(StructArray),
List(ListArray),
VarBinView(VarBinViewArray),
Extension(ExtensionArray),
}
Expand description
An enum capturing the default uncompressed encodings for each Vortex type.
Any array can be decoded into canonical form via the to_canonical
trait method. This is the simplest encoding for a type, and will not be compressed but may
contain compressed child arrays.
Canonical form is useful for doing type-specific compute where you need to know that all elements are laid out decompressed and contiguous in memory.
§Laziness
Canonical form is not recursive, so while a StructArray
is the canonical format for any
Struct
type, individual column child arrays may still be compressed. This allows
compute over Vortex arrays to push decoding as late as possible, and ideally many child arrays
never need to be decoded into canonical form at all depending on the compute.
§Arrow interoperability
All of the Vortex canonical encodings have an equivalent Arrow encoding that can be built zero-copy, and the corresponding Arrow array types can also be built directly.
The full list of canonical types and their equivalent Arrow array types are:
NullArray
:arrow_array::NullArray
BoolArray
:arrow_array::BooleanArray
PrimitiveArray
:arrow_array::PrimitiveArray
DecimalArray
:arrow_array::Decimal128Array
andarrow_array::Decimal256Array
StructArray
:arrow_array::StructArray
ListArray
:arrow_array::ListArray
VarBinViewArray
:arrow_array::GenericByteViewArray
Vortex uses a logical type system, unlike Arrow which uses physical encodings for its types.
As an example, there are at least six valid physical encodings for a Utf8
array. This can
create ambiguity.
Thus, if you receive an Arrow array, compress it using Vortex, and then
decompress it later to pass to a compute kernel, there are multiple suitable Arrow array
variants to hold the data.
To disambiguate, we choose a canonical physical encoding for every Vortex DType
, which
will correspond to an arrow-rs arrow_schema::DataType
.
§Views support
Binary and String views, also known as “German strings” are a better encoding format for
nearly all use-cases. Variable-length binary views are part of the Apache Arrow spec, and are
fully supported by the Datafusion query engine. We use them as our canonical string encoding
for all Utf8
and Binary
typed arrays in Vortex. They provide considerably faster filter
execution than the core StringArray
and BinaryArray
types, at the expense of potentially
needing garbage collection to clear unreferenced items
from memory.
Variants§
Null(NullArray)
Bool(BoolArray)
Primitive(PrimitiveArray)
Decimal(DecimalArray)
Struct(StructArray)
List(ListArray)
VarBinView(VarBinViewArray)
Extension(ExtensionArray)
Implementations§
Source§impl Canonical
impl Canonical
pub fn into_null(self) -> VortexResult<NullArray>
pub fn into_bool(self) -> VortexResult<BoolArray>
pub fn into_primitive(self) -> VortexResult<PrimitiveArray>
pub fn into_decimal(self) -> VortexResult<DecimalArray>
pub fn into_struct(self) -> VortexResult<StructArray>
pub fn into_list(self) -> VortexResult<ListArray>
pub fn into_varbinview(self) -> VortexResult<VarBinViewArray>
pub fn into_extension(self) -> VortexResult<ExtensionArray>
Trait Implementations§
Auto Trait Implementations§
impl !Freeze for Canonical
impl !RefUnwindSafe for Canonical
impl Send for Canonical
impl Sync for Canonical
impl Unpin for Canonical
impl !UnwindSafe for Canonical
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left
is true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left(&self)
returns true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read more