µWheel
µWheel is an Embeddable Aggregate Management System for Streams and Queries.
See more about its design here and try it out directly on the web.
Features
- Streaming window aggregation
- Built-in warehousing capabilities
- Wheel-based query optimizer + vectorized execution.
- Out-of-order support using
low watermarking. - High-throughput stream ingestion.
- User-defined aggregation.
- Low space footprint.
- Incremental checkpointing support.
- Compatible with
#[no_std](requiresalloc).
When should I use µWheel?
µWheel unifies the aggregate management for online streaming and offline analytical queries in a single system. µWheel is not a general purpose solution but a specialized system tailored for a pre-defined aggregation function.
µWheel is an excellent choice when:
- You know the aggregation function apriori.
- You need high-throughput ingestion of out-of-order streams.
- You need support for streaming window queries (e.g., Sliding/Tumbling).
- You need support for exploratory analysis of historical data.
- You need a lightweight and highly embeddable solution.
Example use cases:
- A mini stream processor (see example)
- A real-time OLAP index (e.g., Top-N) (see example)
- A compact and mergeable system for analytics at the edge (see example).
Pre-defined Aggregators
| Function | Description | Types | SIMD |
|---|---|---|---|
| SUM | Sum of all inputs | u16, u32, u64, i16, i32, i64, f32, f64 | ✓ |
| MIN | Minimum value of all inputs | u16, u32, u64, i32, i16, i64, f32, f64 | ✓ |
| MAX | Maximum value of all inputs | u16, u32, u64, i16, i32, i64, f32, f64 | ✓ |
| MINMAX | Minimum and Maximum value of all inputs | u16, u32, u64, i16, i32, i64, f32, f64 | ✗ |
| AVG | Arithmetic mean of all inputs | u16, u32, u64, i16, i32, i64, f32, f64 | ✗ |
| ALL | Pre-computed SUM, AVG, MIN, MAX, COUNT | f64 | ✗ |
| TOP N | Top N of all inputs | Aggregator with aggregate data that implements Ord |
✗ |
See a user-defined aggregator example here.
Feature Flags
std(enabled by default)- Enables features that rely on the standard library
sum(enabled by default)- Enables sum aggregation
avg(enabled by default)- Enables avg aggregation
min(enabled by default)- Enables min aggregation
max(enabled by default)- Enables max aggregation
min_max(enabled by default)- Enables min-max aggregation
all(enabled by default)- Enables all aggregation
top_n- Enables Top-N aggregation
simd(requiresnightly)- Enables support to speed up aggregation functions with SIMD operations
sync(implicitly enablesstd)- Enables a sync version of
ReaderWheelthat can be shared and queried across threads
- Enables a sync version of
profiler(implicitly enablesstd)- Enables recording of latencies for various operations
serde- Enables serde support
timer- Enables scheduling user-defined functions
Usage
For std support and compilation of built-in aggregators:
= "0.3.0"
For no_std support and minimal compile time:
= { = "0.3.0", = false }
Examples
The following code is from the hello world example.
use ;
// Initial start watermark 2023-11-09 00:00:00 (represented as milliseconds)
let mut watermark = 1699488000000;
// Create a Reader-Writer Wheel with U32 Sum Aggregation using the default configuration
let mut wheel: = new;
// Install a Sliding Window Aggregation Query (results are produced when we advance the wheel).
wheel.window;
// Simulate ingestion and fill the wheel with 1 hour of aggregates (3600 seconds).
for _ in 0..3600
// Explore historical data - The low watermark is now 2023-11-09 01:00:00
// query the wheel using different intervals
assert_eq!;
assert_eq!;
// combine range of 2023-11-09 00:00:00 and 2023-11-09 01:00:00
let range = new_unchecked;
assert_eq!;
// The following runs the the same combine range query as above.
assert_eq!;
See more examples here.
Acknowledgements
- µWheel borrows scripts from the egui crate.
- µWheel uses a modified Duration from the time crate.
- µWheel soft forks a Hierarchical Timing Wheel made by @Bathtor.
Contributing
See Contributing.
Community
If you find µWheel interesting and want to learn more, then join the Discord community!
Publications
- Max Meldrum, Paris Carbone (2024). µWheel: Aggregate Management for Streams and Queries (Best Paper Award). In DEBS '24. [PDF].
Blog Posts
- Introducing datafusion-uwheel, A Native DataFusion Optimizer for Time-based Analytics - August 2024
- Best Paper Award + 0.2.0 Release - July 2024
- Speeding up Temporal Aggregation in DataFusion by 60-60000x using µWheel - May 2024
Citing µWheel
@inproceedings{meldrum2024uwheel,
author = {Meldrum, Max and Carbone, Paris},
title = {μWheel: Aggregate Management for Streams and Queries},
booktitle = {Proceedings of the 18th ACM International Conference on Distributed and Event-Based Systems},
year = {2024},
pages = {54--65},
doi = {10.1145/3629104.3666031}
}
License
Licensed under either of
- Apache License, Version 2.0, (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.