ElastiCube Library
A high-performance, embeddable OLAP cube builder and query library written in Rust with Python bindings.
Overview
ElastiCube Library provides fast, in-memory multidimensional analytical processing (OLAP) without requiring pre-aggregation or external services. Built on Apache Arrow and DataFusion, it offers columnar storage and efficient query execution for analytical workloads.
Features
- Columnar Storage: Efficient field-by-field storage using Apache Arrow
- Dynamic Aggregations: Query raw data without pre-aggregation
- Multi-Source Data: Load from CSV, Parquet, JSON, and RecordBatch sources
- Data Updates: Append, delete, and update rows incrementally
- Calculated Fields: Define virtual dimensions and calculated measures using SQL expressions
- Query Optimization: Built-in caching and performance optimizations
- OLAP Operations: Slice, dice, drill-down, and roll-up operations
- Python Bindings: Full Python API with PyArrow, Pandas, and Polars integration
- Embeddable: Pure Rust library with no cloud dependencies
- Fast: Near C-level performance with parallel query execution via DataFusion
Architecture
elasticube_library/
├── elasticube-core/ # Rust core library
│ ├── src/
│ │ ├── lib.rs # Public API exports
│ │ ├── builder.rs # ElastiCubeBuilder for cube construction
│ │ ├── cube/ # Core cube implementation
│ │ │ ├── mod.rs # ElastiCube, Dimension, Measure, etc.
│ │ │ ├── schema.rs # Schema management
│ │ │ ├── hierarchy.rs # Hierarchical dimensions
│ │ │ ├── calculated.rs # Calculated measures & virtual dimensions
│ │ │ └── updates.rs # Data update operations
│ │ ├── query.rs # QueryBuilder and execution
│ │ ├── cache.rs # Query result caching
│ │ ├── optimization.rs # Performance optimizations
│ │ ├── storage.rs # Data storage layer
│ │ └── sources.rs # CSV, Parquet, JSON data sources
│ └── Cargo.toml
│
├── elasticube-py/ # Python bindings (PyO3)
│ ├── src/lib.rs # Python API wrapper
│ └── Cargo.toml
│
└── examples/ # Usage examples
├── query_demo.rs # Comprehensive query examples
├── calculated_fields_demo.rs # Calculated fields demo
├── data_updates_demo.rs # Data update operations
└── python/ # Python examples
├── query_demo.py
├── polars_demo.py
└── visualization_demo.py
Quick Start
Rust
Add to your Cargo.toml:
[]
= { = "elasticube-core" }
= { = "1", = ["full"] }
= "54"
Build and query a cube:
use ;
use DataType;
use Arc;
async
Key Types & Functions:
ElastiCubeBuilder- Builder for constructing cubes (elasticube-core/src/builder.rs:19)ElastiCube::query()- Create query builder (elasticube-core/src/cube/mod.rs:133)QueryBuilder::execute()- Execute query (elasticube-core/src/query.rs:225)
Python
Install the Python package:
Use in Python:
# Build cube from CSV
= \
\
\
\
\
\
# Query and convert to Pandas
= \
\
\
\
Python Bindings:
PyElastiCubeBuilder- Build cubes from Python (elasticube-py/src/lib.rs:16)PyElastiCube- Python cube wrapper (elasticube-py/src/lib.rs:116)PyQueryBuilder- Query builder (elasticube-py/src/lib.rs:332)
Advanced Features
Calculated Fields
Define derived metrics and dimensions using SQL expressions:
let cube = new
.add_measure?
.add_measure?
// Calculated measure
.add_calculated_measure?
// Virtual dimension
.add_virtual_dimension?
.build?;
See CalculatedMeasure and VirtualDimension in elasticube-core/src/cube/calculated.rs
Data Updates
Incrementally update cube data without rebuilding:
// Append new rows
let new_batch = create_record_batch?;
cube.append_rows?;
// Delete rows matching filter
cube.delete_rows.await?;
// Update specific rows
cube.update_rows.await?;
// Consolidate fragmented batches
cube.consolidate_batches?;
See update methods in elasticube-core/src/cube/mod.rs:279-373
OLAP Operations
// Slice: filter on one dimension
let result = cube.query?
.slice
.select
.execute
.await?;
// Dice: filter on multiple dimensions
let result = cube.query?
.dice
.select
.execute
.await?;
See elasticube-core/src/query.rs:75-103
Hierarchies
Define drill-down paths for dimensional analysis:
let cube = new
.add_dimension?
.add_dimension?
.add_dimension?
.add_hierarchy?
.build?;
See elasticube-core/src/cube/hierarchy.rs
Performance Optimization
// Enable query caching
let cube = new
.with_cache_size?
.build?;
// Get statistics
let stats = cube.statistics;
println!;
// Get cache stats
let cache_stats = cube.cache_stats;
println!;
See elasticube-core/src/cache.rs and elasticube-core/src/optimization.rs
Python Integration
Polars (High Performance)
# Zero-copy conversion to Polars DataFrame
= \
\
# 642x faster than to_pandas()
# Leverage Polars for further analysis
=
Pandas
# Convert to Pandas DataFrame
= \
\
# Use familiar Pandas API
=
Visualization
= \
\
\
See examples/python/ for complete examples.
Examples
Rust Examples
Run with cargo run --example <name>:
query_demo- Comprehensive query examples with all featurescalculated_fields_demo- Virtual dimensions and calculated measuresdata_updates_demo- Append, delete, update operations
Python Examples
Located in examples/python/:
query_demo.py- Basic queries and aggregationspolars_demo.py- High-performance Polars integrationvisualization_demo.py- Chart creation with Matplotlibserialization_demo.py- Save and load cubeselasticube_tutorial.ipynb- Interactive Jupyter notebook
Development
Build and Test
# Build Rust library
# Run all tests (84 tests)
# Run specific test module
# Build Python bindings
Project Structure
- Core Types: ElastiCube, Dimension, Measure, Hierarchy (elasticube-core/src/cube/)
- Builder Pattern: ElastiCubeBuilder (elasticube-core/src/builder.rs)
- Query Engine: QueryBuilder, QueryResult (elasticube-core/src/query.rs)
- Data Sources: CsvSource, ParquetSource, JsonSource (elasticube-core/src/sources.rs)
- Caching: QueryCache, CacheStats (elasticube-core/src/cache.rs)
- Optimization: CubeStatistics, OptimizationConfig (elasticube-core/src/optimization.rs)
Performance
- Apache Arrow: Columnar memory format for efficient data access
- DataFusion: SQL query optimizer and execution engine
- Parallel Execution: Multi-threaded query processing
- Query Caching: Automatic result caching for repeated queries
- Zero-Copy: Efficient data transfer between Rust and Python via PyArrow
License
Licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE)
- MIT license (LICENSE-MIT)
at your option.
Contributing
Contributions are welcome! Please feel free to submit issues and pull requests.