# Datum
🪼 - Just "Stream" anything :]
Datum is a Rust stream processing library scaffolded around Ractor actors and a push-based stream abstraction. The compatibility target is Akka/Pekko Streams Typed API shape and behavior, with Rust-native ownership, async, and benchmarking constraints.
## Install
Add Datum to your project from crates.io:
```sh
cargo add datum-core
```
Or in `Cargo.toml`:
```toml
[dependencies]
datum-core = "0.5"
```
> **Note:** The package is published as `datum-core` (the name `datum` is taken on crates.io), but
> the import path stays `use datum::…` — the crate's library name is `datum`.
To track an unreleased commit instead:
```toml
[dependencies]
datum-core = { git = "https://github.com/Aethergrids/Datum", tag = "v0.5.0" }
```
## Upstream References
- Ractor crate: `0.15.13`
- Optional Ractor cluster crate: `0.15.13`, enabled with `--features cluster`
- Akka source submodule: `third_party/akka`, tracking upstream `main` and pinned in this repository at commit `58f1f6db2e505e87f5dc115ee9476833872e7ae0`
- Latest stable Akka tag observed during setup: `v2.10.18`
## Feature highlights (v0.5.0)
- **Graph cycles** — `MergePreferred`/`Broadcast` feedback loops supported; unbuffered cycles surface
`EventLimitExceeded` instead of hanging.
- **StreamRefs** — `StreamRefs::source_ref()` / `sink_ref()`, Ractor-backed one-shot streaming handles
for actor/process boundary crossing.
- **IO adapters** — `StreamConverters::as_input_stream` / `as_output_stream` bridging Datum streams
to `std::io::Read` / `Write`.
- **Performance** — the typed graph executor now covers all major junction shapes and cyclic
feedback; nearly every benchmarked hot path meets or beats warmed Akka/Pekko (see benchmark tables
linked below for honest per-row numbers, including below-parity rows).
- `#![forbid(unsafe_code)]` on the `datum` crate itself; new safe deps `crossbeam-queue` and
`arc-swap` handle lock-free queues and hub snapshots without adding unsafe to the crate.
## Development
```sh
cargo test
cargo check --benches
cargo bench --bench push_baseline
cargo bench --bench source_flow
cargo bench --bench materialization
cargo bench --bench graph
cargo bench --bench actor_ask
benches/actor_ask_compare/run.sh
```
```rust
use datum::{Sink, Source};
let sum = Source::from_iter(0_u64..1_000)
.map(|item| item + 1)
.filter(|item| item % 2 == 0)
.run_with(Sink::fold(0_u64, |acc, item| acc + item))
.unwrap()
.wait()
.unwrap();
```
## Benchmarks
Datum is benchmarked head-to-head against warmed Akka/Pekko Streams across seven areas — Source/Flow,
materialization, graph/junctions, actor ask, dynamic streams, streaming IO, and substreams. As of
v0.5.0, Datum is at or above Akka on the large majority of benchmarked paths. Honest per-path numbers
(including remaining below-parity rows) are in the result tables under `roadmap/benchmarks/`. The
harness adds a `Datum CPU us/op` column on purpose: some wins come partly from busy-spinning while
Akka parks — a real cost the wall-clock number hides.
Qualitative summary of the current state:
- **Fused linear path:** 13–44x Akka (typed plan; all common sink shapes).
- **Junctions:** 1–340x, with most shapes well above parity; typed kernels now cover concat, merge,
broadcast/zip, balance/merge, partition, prioritized merge, merge-preferred, merge-sorted,
merge-sequence, and merge-latest.
- **Graph cycles (typed feedback kernel):** ~29x Akka on the `MergePreferred`/`Broadcast` feedback shape.
- **BidiFlow:** 3.7–5x Akka.
- **Dynamic hubs:** MergeHub p16 13x, p4 2.1x (p1 below parity — remaining lever documented in
`roadmap/benchmarks/dynamic-streams.md`); BroadcastHub 1.3–2x; PartitionHub 1.4–2.7x.
- **Bounded queue:** ~parity (0.96x wall) with ~112x less allocation than Akka.
- **map_async:** ≥2x at p4/p32 (Tokio dispatch).
- **flat_map_merge:** 1.4–1.8x.
- **Streaming IO adapters:** ~7.4x on the round-trip scenario.
- **StreamRefs:** 0.24–0.26x (Ractor-bound; levers documented).
- **Actor ask:** ~1.3x at p1; allocation dominated by upstream Ractor boxing (~848 B/element, upstream-blocked).
- **Materialization:** at parity.
Run a comparison (requires a JDK + `sbt`); each runner writes a rendered table under `target/<area>-comparison/`:
```sh
benches/source_flow_compare/run.sh
benches/materialization_compare/run.sh
benches/graph_compare/run.sh
benches/actor_ask_compare/run.sh
benches/dynamic_streams_compare/run.sh
benches/streaming_io_compare/run.sh
benches/substreams_compare/run.sh
```
Current result tables, the per-operator coverage matrix, and apples-to-apples caveats:
- `roadmap/benchmarks/source-flow.md`
- `roadmap/benchmarks/materialization.md`
- `roadmap/benchmarks/graph.md`
- `roadmap/benchmarks/actor-ask.md`
- `roadmap/benchmarks/dynamic-streams.md`
- `roadmap/benchmarks/streaming-io.md`
- `roadmap/benchmarks/substreams.md`
- `roadmap/M1-v0.1.0-foundation.md` — coverage matrix, optimization status & apples-to-apples caveats
- `roadmap/M4-v0.4.0-completeness-hardening.md` — M4 work packages and exit criteria
See [`roadmap/`](roadmap/README.md) for the full milestone roadmap. User documentation is the published VitePress site (Cloudflare Pages); `docs/` holds its source. `roadmap/` holds internal planning and benchmark records.