Datum

🪼 - Just "Stream" anything :]

Datum is a Rust stream processing library scaffolded around Ractor actors and a push-based stream abstraction. The compatibility target is Akka/Pekko Streams Typed API shape and behavior, with Rust-native ownership, async, and benchmarking constraints.

Install

Add Datum to your project from crates.io:

cargo add datum-core

Or in Cargo.toml:

[dependencies]
datum-core = "0.4"

Note: The package is published as datum-core (the name datum is taken on crates.io), but the import path stays use datum::… — the crate's library name is datum.

To track an unreleased commit instead:

[dependencies]
datum-core = { git = "https://github.com/Aethergrids/Datum", tag = "v0.4.0" }

Upstream References

Ractor crate: 0.15.13
Optional Ractor cluster crate: 0.15.13, enabled with --features cluster
Akka source submodule: third_party/akka, tracking upstream main and pinned in this repository at commit 58f1f6db2e505e87f5dc115ee9476833872e7ae0
Latest stable Akka tag observed during setup: v2.10.18

Feature highlights (v0.4.0)

Graph cycles — MergePreferred/Broadcast feedback loops supported; unbuffered cycles surface EventLimitExceeded instead of hanging.
StreamRefs — StreamRefs::source_ref() / sink_ref(), Ractor-backed one-shot streaming handles for actor/process boundary crossing.
IO adapters — StreamConverters::as_input_stream / as_output_stream bridging Datum streams to std::io::Read / Write.
Performance — the typed graph executor now covers all major junction shapes and cyclic feedback; nearly every benchmarked hot path meets or beats warmed Akka/Pekko (see benchmark tables linked below for honest per-row numbers, including below-parity rows).
#![forbid(unsafe_code)] on the datum crate itself; new safe deps crossbeam-queue and arc-swap handle lock-free queues and hub snapshots without adding unsafe to the crate.

Development

cargo test
cargo check --benches
cargo bench --bench push_baseline
cargo bench --bench source_flow
cargo bench --bench materialization
cargo bench --bench graph
cargo bench --bench actor_ask
benches/actor_ask_compare/run.sh

use datum::{Sink, Source};

let sum = Source::from_iter(0_u64..1_000)
    .map(|item| item + 1)
    .filter(|item| item % 2 == 0)
    .run_with(Sink::fold(0_u64, |acc, item| acc + item))
    .unwrap()
    .wait()
    .unwrap();

Benchmarks

Datum is benchmarked head-to-head against warmed Akka/Pekko Streams across seven areas — Source/Flow, materialization, graph/junctions, actor ask, dynamic streams, streaming IO, and substreams. As of v0.4.0, Datum is at or above Akka on the large majority of benchmarked paths. Honest per-path numbers (including remaining below-parity rows) are in the result tables under roadmap/benchmarks/. The harness adds a Datum CPU us/op column on purpose: some wins come partly from busy-spinning while Akka parks — a real cost the wall-clock number hides.

Qualitative summary of the current state:

Fused linear path: 13–44x Akka (typed plan; all common sink shapes).
Junctions: 1–340x, with most shapes well above parity; typed kernels now cover concat, merge, broadcast/zip, balance/merge, partition, prioritized merge, merge-preferred, merge-sorted, merge-sequence, and merge-latest.
Graph cycles (typed feedback kernel): ~29x Akka on the MergePreferred/Broadcast feedback shape.
BidiFlow: 3.7–5x Akka.
Dynamic hubs: MergeHub p16 13x, p4 2.1x (p1 below parity — remaining lever documented in roadmap/benchmarks/dynamic-streams.md); BroadcastHub 1.3–2x; PartitionHub 1.4–2.7x.
Bounded queue: ~parity (0.96x wall) with ~112x less allocation than Akka.
map_async: ≥2x at p4/p32 (Tokio dispatch).
flat_map_merge: 1.4–1.8x.
Streaming IO adapters: ~7.4x on the round-trip scenario.
StreamRefs: 0.24–0.26x (Ractor-bound; levers documented).
Actor ask: ~1.3x at p1; allocation dominated by upstream Ractor boxing (~848 B/element, upstream-blocked).
Materialization: at parity.

Run a comparison (requires a JDK + sbt); each runner writes a rendered table under target/<area>-comparison/:

benches/source_flow_compare/run.sh
benches/materialization_compare/run.sh
benches/graph_compare/run.sh
benches/actor_ask_compare/run.sh
benches/dynamic_streams_compare/run.sh
benches/streaming_io_compare/run.sh
benches/substreams_compare/run.sh

Current result tables, the per-operator coverage matrix, and apples-to-apples caveats:

roadmap/benchmarks/source-flow.md
roadmap/benchmarks/materialization.md
roadmap/benchmarks/graph.md
roadmap/benchmarks/actor-ask.md
roadmap/benchmarks/dynamic-streams.md
roadmap/benchmarks/streaming-io.md
roadmap/benchmarks/substreams.md
roadmap/M1-v0.1.0-foundation.md — coverage matrix, optimization status & apples-to-apples caveats
roadmap/M4-v0.4.0-completeness-hardening.md — M4 work packages and exit criteria