datum-core 0.6.0

Rust stream-processing library mirroring Akka/Pekko Streams Typed, built on Ractor actors
Documentation
# Datum

🪼 - Just "Stream" anything :]

Datum is a Rust stream processing library scaffolded around Ractor actors and a push-based stream abstraction. The compatibility target is Akka/Pekko Streams Typed API shape and behavior, with Rust-native ownership, async, and benchmarking constraints.

## Install

Add Datum to your project from crates.io:

```sh
cargo add datum-core
```

Or in `Cargo.toml`:

```toml
[dependencies]
datum-core = "0.5"
```

> **Note:** The package is published as `datum-core` (the name `datum` is taken on crates.io), but
> the import path stays `use datum::…` — the crate's library name is `datum`.

To track an unreleased commit instead:

```toml
[dependencies]
datum-core = { git = "https://github.com/Aethergrids/Datum", tag = "v0.5.0" }
```

## Upstream References

- Ractor crate: `0.15.13`
- Optional Ractor cluster crate: `0.15.13`, enabled with `--features cluster`
- Akka source submodule: `third_party/akka`, tracking upstream `main` and pinned in this repository at commit `58f1f6db2e505e87f5dc115ee9476833872e7ae0`
- Latest stable Akka tag observed during setup: `v2.10.18`

## Feature highlights (v0.5.0)

- **Graph cycles** — `MergePreferred`/`Broadcast` feedback loops supported; unbuffered cycles surface
  `EventLimitExceeded` instead of hanging.
- **StreamRefs** — `StreamRefs::source_ref()` / `sink_ref()`, Ractor-backed one-shot streaming handles
  for actor/process boundary crossing.
- **IO adapters** — `StreamConverters::as_input_stream` / `as_output_stream` bridging Datum streams
  to `std::io::Read` / `Write`.
- **Performance** — the typed graph executor now covers all major junction shapes and cyclic
  feedback; nearly every benchmarked hot path meets or beats warmed Akka/Pekko (see benchmark tables
  linked below for honest per-row numbers, including below-parity rows).
- `#![forbid(unsafe_code)]` on the `datum` crate itself; new safe deps `crossbeam-queue` and
  `arc-swap` handle lock-free queues and hub snapshots without adding unsafe to the crate.

## Development

```sh
cargo test
cargo check --benches
cargo bench --bench push_baseline
cargo bench --bench source_flow
cargo bench --bench materialization
cargo bench --bench graph
cargo bench --bench actor_ask
benches/actor_ask_compare/run.sh
```

```rust
use datum::{Sink, Source};

let sum = Source::from_iter(0_u64..1_000)
    .map(|item| item + 1)
    .filter(|item| item % 2 == 0)
    .run_with(Sink::fold(0_u64, |acc, item| acc + item))
    .unwrap()
    .wait()
    .unwrap();
```

## Benchmarks

Datum is benchmarked head-to-head against warmed Akka/Pekko Streams across seven areas — Source/Flow,
materialization, graph/junctions, actor ask, dynamic streams, streaming IO, and substreams. As of
v0.5.0, Datum is at or above Akka on the large majority of benchmarked paths. Honest per-path numbers
(including remaining below-parity rows) are in the result tables under `roadmap/benchmarks/`. The
harness adds a `Datum CPU us/op` column on purpose: some wins come partly from busy-spinning while
Akka parks — a real cost the wall-clock number hides.

Qualitative summary of the current state:

- **Fused linear path:** 13–44x Akka (typed plan; all common sink shapes).
- **Junctions:** 1–340x, with most shapes well above parity; typed kernels now cover concat, merge,
  broadcast/zip, balance/merge, partition, prioritized merge, merge-preferred, merge-sorted,
  merge-sequence, and merge-latest.
- **Graph cycles (typed feedback kernel):** ~29x Akka on the `MergePreferred`/`Broadcast` feedback shape.
- **BidiFlow:** 3.7–5x Akka.
- **Dynamic hubs:** MergeHub p16 13x, p4 2.1x (p1 below parity — remaining lever documented in
  `roadmap/benchmarks/dynamic-streams.md`); BroadcastHub 1.3–2x; PartitionHub 1.4–2.7x.
- **Bounded queue:** ~parity (0.96x wall) with ~112x less allocation than Akka.
- **map_async:** ≥2x at p4/p32 (Tokio dispatch).
- **flat_map_merge:** 1.4–1.8x.
- **Streaming IO adapters:** ~7.4x on the round-trip scenario.
- **StreamRefs:** 0.24–0.26x (Ractor-bound; levers documented).
- **Actor ask:** ~1.3x at p1; allocation dominated by upstream Ractor boxing (~848 B/element, upstream-blocked).
- **Materialization:** at parity.

Run a comparison (requires a JDK + `sbt`); each runner writes a rendered table under `target/<area>-comparison/`:

```sh
benches/source_flow_compare/run.sh
benches/materialization_compare/run.sh
benches/graph_compare/run.sh
benches/actor_ask_compare/run.sh
benches/dynamic_streams_compare/run.sh
benches/streaming_io_compare/run.sh
benches/substreams_compare/run.sh
```

Current result tables, the per-operator coverage matrix, and apples-to-apples caveats:

- `roadmap/benchmarks/source-flow.md`
- `roadmap/benchmarks/materialization.md`
- `roadmap/benchmarks/graph.md`
- `roadmap/benchmarks/actor-ask.md`
- `roadmap/benchmarks/dynamic-streams.md`
- `roadmap/benchmarks/streaming-io.md`
- `roadmap/benchmarks/substreams.md`
- `roadmap/M1-v0.1.0-foundation.md` — coverage matrix, optimization status & apples-to-apples caveats
- `roadmap/M4-v0.4.0-completeness-hardening.md` — M4 work packages and exit criteria

See [`roadmap/`](roadmap/README.md) for the full milestone roadmap. User documentation is the published VitePress site (Cloudflare Pages); `docs/` holds its source. `roadmap/` holds internal planning and benchmark records.