Expand description
§🌪️ Vortex
📚 Documentation | 📊 Performance Benchmarks
§Overview
Vortex is a next-generation columnar file format and toolkit designed for high-performance data analytics. It provides:
-
⚡️ Blazing Fast Performance
- 100-200x faster random access reads than Apache Parquet
- 2-10x faster scans with similar compression ratios and write throughput
- Efficient support for wide tables with zero-copy/zero-parse metadata
-
🔧 Extensible Architecture
- Modeled after Apache DataFusion’s extensible approach
- Pluggable encoding system
- Zero-copy compatibility with Apache Arrow
🚧 Development Status: This project is under active development. APIs and file formats may change, and some features are still being implemented.
§Key Features
§Core Capabilities
- ✨ Logical Types - Clean separation between logical schema and physical layout
- 🔄 Zero-Copy Arrow Integration - Seamless conversion to/from Apache Arrow arrays
- 🧩 Extensible Encodings - Pluggable physical layouts with built-in optimizations
- 📦 Cascading Compression - Support for nested encoding schemes
- 🚀 High-Performance Computing - Optimized compute kernels for encoded data
- 📊 Rich Statistics - Lazy-loaded summary statistics for optimization
§Technical Architecture
§Logical vs Physical Design
Vortex strictly separates logical and physical concerns:
- Logical Layer: Defines data types and schema
- Physical Layer: Handles encoding and storage implementation
- Built-in Encodings: Compatible with Apache Arrow’s memory format
- Extension Encodings: Optimized compression schemes (RLE, dictionary, etc.)
§Quick Start
§Installation
§Rust Crate
All features are exported through the main vortex
crate.
cargo add vortex
§Python Package
uv add vortex-array
§Command Line UI (vx)
For browsing the structure of Vortex files, you can use the vx
command-line tool.
# Install latest release
cargo install vortex-tui --locked
# Or build from source
cargo install --path vortex-tui --locked
# Usage
vx browse <file>
§Development Setup
§Prerequisites (macOS)
# Optional but recommended dependencies
brew install flatbuffers protobuf # For .fbs and .proto files
brew install duckdb # For benchmarks
# Install Rust toolchain
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# or
brew install rustup
# Initialize submodules
git submodule update --init --recursive
# Setup dependencies with uv
uv sync --all-packages
§Performance Optimization
For optimal performance, use MiMalloc:
#[global_allocator]
static GLOBAL_ALLOC: MiMalloc = MiMalloc;
§Project Information
§License
Licensed under the Apache License, Version 2.0
§Governance
Vortex is committed to remaining open-source, following governance models inspired by the Substrait project and Apache Software Foundation.
§Contributing
See CONTRIBUTING.md for guidelines.
§Acknowledgments 🏆
This project builds upon groundbreaking work from the academic and open-source communities:
§Key Research Papers
- BtrBlocks - Efficient columnar compression
- FastLanes - High-performance integer compression
- FSST - Fast random access string compression
- ALP - Adaptive lossless floating-point compression
- Procella - YouTube’s unified data system
- Cloud Object Storage Analytics - High-performance analytics
- ClickHouse - Fast analytics for everyone
§Open Source Inspiration
- Apache Arrow & Apache DataFusion
- parquet2 by Jorge Leitao
- DuckDB
- Velox & Nimble
Thanks to all contributors who have shared their knowledge and code with the community! 🚀
Re-exports§
pub use vortex_file as file;
pub use vortex_btrblocks as compressor;
pub use vortex_buffer as buffer;
pub use vortex_dtype as dtype;
pub use vortex_error as error;
pub use vortex_expr as expr;
pub use vortex_flatbuffers as flatbuffers;
pub use vortex_ipc as ipc;
pub use vortex_layout as layout;
pub use vortex_mask as mask;
pub use vortex_proto as proto;
pub use vortex_scalar as scalar;
Modules§
- accessor
- aliases
- Re-exports of third-party crates we use in the API.
- arcref
- arrays
- All the built-in encoding schemes and arrays.
- arrow
- Utilities to work with
Arrow
data and types. - builders
- Builders for Vortex arrays.
- compress
- compute
- Compute kernels on top of Vortex Arrays.
- encodings
- iter
- Iterator over slices of an array, and related utilities.
- nbytes
- patches
- serde
- stats
- Traits and utilities to compute and access array statistics.
- stream
- validity
- Array validity and nullability behavior, used by arrays and compute functions.
- variants
- This module defines array traits for each Vortex DType.
- vtable
- This module contains the VTable definitions for a Vortex Array.
Macros§
- match_
each_ decimal_ value - match_
each_ decimal_ value_ type - Macro to match over each decimal value type, binding the corresponding native type (from
DecimalValueType
) - register_
kernel - Register a kernel for a compute function. See each compute function for the correct type of kernel to register.
- try_
from_ array_ ref
Structs§
- Empty
Metadata - Empty array metadata
- Prost
Metadata - A utility wrapper for Prost metadata serialization.
- VTable
Context - A collection of encodings that can be addressed by a u16 positional index. This is used to map array encodings and layout encodings when reading from a file.
- VTable
Registry - A registry of encodings that can be used to construct a context for serde.
Enums§
- Canonical
- The set of canonical array encodings, also the set of encodings that can be transferred to Arrow with zero-copy.
Traits§
- Array
- The base trait for all Vortex arrays.
- Array
Buffer Visitor - Array
Canonical Impl - Implementation trait for canonicalization functions.
- Array
Child Visitor - Array
Ext - Array
Impl - A trait used to encapsulate common implementation behaviour for a Vortex
Array
. - Array
Statistics - Extension functions for arrays that provide statistics.
- Array
Statistics Impl - Array
Validity Impl - Implementation trait for validity functions.
- Array
Variants - Array
Variants Impl - Implementation trait for downcasting to type-specific traits.
- Array
Visitor - Array
Visitor Ext - Array
Visitor Impl - Deserialize
Metadata - Encoding
- Marker trait for array encodings with their associated Array type.
- Into
Array - Trait for converting a type into a Vortex
ArrayRef
. - Serialize
Metadata - ToCanonical
- Trait for types that can be converted from an owned type into an owned array variant.
- TryFrom
Array Ref - Trait for converting a type from a Vortex
ArrayRef
, returning an error if the conversion fails. - TryInto
Array - Trait for converting a type into a Vortex
ArrayRef
, returning an error if the conversion fails.
Type Aliases§
- Array
Context - A collection of array encodings.
- Array
Ref - A reference counted pointer to a dynamic
Array
trait object. - Array
Registry - Encoding
Id - EncodingId is a globally unique name of the array’s encoding.