Table of Contents
GPU-First Embedded Analytics with SIMD Fallback
GPU-first embedded analytics database with graceful degradation: GPU → SIMD → Scalar
Features
- Cost-based dispatch: GPU only when compute > 5x transfer time
- Morsel-based paging: Out-of-core execution (128MB chunks)
- JIT WGSL compiler: Kernel fusion for single-pass execution
- GPU kernels: SUM, MIN, MAX, COUNT, AVG, fused filter+sum
- SIMD fallback: Trueno integration (AVX-512/AVX2/SSE2)
- SQL interface: SELECT, WHERE, aggregations, ORDER BY, LIMIT
Installation
[]
= "0.3"
# Optional: GPU acceleration
= { = "0.3", = ["gpu"] }
Quick Start
use ;
use StorageEngine;
async
Performance
SIMD Aggregation (1M rows, AMD Threadripper 7960X):
| Operation | SIMD | Scalar | Speedup |
|---|---|---|---|
| SUM | 228µs | 634µs | 2.78x |
| MIN | 228µs | 1,048µs | 4.60x |
| MAX | 228µs | 257µs | 1.13x |
| AVG | 228µs | 634µs | 2.78x |
Architecture
┌─────────────────────────────────────────────┐
│ SQL Interface │
│ (QueryEngine / Parser) │
├─────────────────────────────────────────────┤
│ Query Executor │
│ (cost-based dispatch, morsel paging) │
├──────────┬──────────┬───────────────────────┤
│ GPU │ SIMD │ Scalar │
│ (WGSL) │ (Trueno) │ (fallback) │
├──────────┴──────────┴───────────────────────┤
│ Storage Engine │
│ (columnar, Parquet, morsel-based I/O) │
└─────────────────────────────────────────────┘
- Query Layer: SQL parser produces logical plans, cost-based optimizer selects GPU/SIMD/Scalar backend
- Execution Layer: Morsel-based paging (128MB chunks) enables out-of-core processing
- GPU Backend: JIT-compiled WGSL kernels with fused filter+aggregate passes
- SIMD Backend: Trueno-powered AVX-512/AVX2/SSE2 vectorized aggregations
- Storage Layer: Columnar storage with Parquet I/O and late materialization
API Reference
StorageEngine
Load and manage columnar data:
let storage = load_parquet?;
let storage = from_batches;
QueryEngine / QueryExecutor
Parse and execute SQL queries:
let engine = new;
let plan = engine.parse?;
let result = new.execute?;
GpuEngine
Direct GPU kernel access (requires gpu feature):
let gpu = new.await?;
let sum = gpu.sum_i32.await?;
let filtered = gpu.fused_filter_sum.await?;
TopK
Heap-based top-K selection for ORDER BY ... LIMIT queries:
let top = top_k_descending?;
Examples
Testing
Property-based tests cover storage invariants and top-K correctness via proptest.
Development
Contributing
Contributions are welcome! Please see the CONTRIBUTING.md guide for details.
MSRV
Minimum Supported Rust Version: 1.75
License
MIT