Expand description
§mailrs-delivery-executor
Group-commit delivery executor on top of
mailrs-maildir 1.2’s
deliver_batch. Accumulates per-path delivery requests from
concurrent async tasks (SMTP / LMTP / IMAP APPEND sessions) and
flushes each path’s batch via a single fsync instead of
N per-message fsyncs.
Built on tokio::sync::mpsc + tokio::sync::oneshot. Each
calling session submits a delivery and awaits its own
oneshot::Receiver for individual confirmation.
§Why
mailrs-maildir 1.2’s deliver_batch is 15.27× faster than
N × deliver at N=64 batches on APFS (criterion microbench).
But typical mail receivers have the wrong shape to use it
directly: each SMTP session delivers 1-N recipients, and N is
small. No caller is naturally going to hand a batch of 64 messages
to a single deliver_batch call.
This crate is the bridge. The executor task accumulates per-path
requests across concurrent sessions, groups them by destination,
and flushes each group through deliver_batch. At saturation,
batches naturally fill to max_batch=64 and the full microbench
speedup translates to real throughput.
§Quick start
use mailrs_delivery_executor::DeliveryExecutor;
use std::sync::Arc;
let executor = DeliveryExecutor::spawn();
// In your SMTP session handler:
let path = "/var/mail/example.com/alice".to_string();
let body = Arc::new(b"From: a@b\r\n\r\nhello\r\n".to_vec());
let id = executor.deliver(path, body).await?;
println!("delivered: {}", id.0);§Tuning
| Knob | Default | Trade |
|---|---|---|
max_batch | 64 | matches maildir 1.2 microbench sweet spot. Higher: marginally more throughput, more memory per batch. Lower: less batching benefit. |
max_wait | 10 ms | upper bound on added per-message latency. Lower (1-2ms): latency-sensitive workloads (transactional mail where SMTP 250 OK feeds an HTTP response). Higher: low-traffic but batch-amortizing. |
max_concurrent_flushes | 2 (1.1+) | how many batches can have fsync in flight simultaneously. =1 is strictly serial (1.0.0 behavior). =2 hides fsync wait behind next batch collection — empirically +8% throughput, -41% p999 tail on APFS, M-series Mac, 32-conn bench. >2 typically doesn’t help on SSD because the disk serializes durable writes per mount; it just queues more fsyncs. |
use mailrs_delivery_executor::DeliveryExecutor;
use std::time::Duration;
let executor = DeliveryExecutor::with_config(/*max_batch=*/ 128, /*max_wait=*/ Duration::from_millis(5));
// Full tuning (1.1+): pipeline 3 flushes for very-high-load
// deployments where you've measured the disk handles parallel
// fsyncs (NVMe, RAID, network FS with concurrent commit).
let executor = DeliveryExecutor::with_full_config(
/*max_batch=*/ 64,
/*max_wait=*/ Duration::from_millis(10),
/*max_concurrent_flushes=*/ 3,
);§What it costs
Per-message latency increases by up to max_wait. With
max_wait=10ms and a load of 32 concurrent connections, batches
fill in 1-5ms in practice. Under truly low load (single message
in flight), the executor waits the full max_wait before
flushing — that’s 10ms tail added to every delivery. The win
appears when load is high enough to fill the batch before the
timeout.
§What this crate does NOT do
- No SMTP / LMTP protocol — caller’s session driver parses incoming mail and passes raw bytes.
- No storage beyond Maildir — for IMAP-backed or Dovecot-style backends, write your own executor over those primitives. The pattern (per-path accumulate + batch flush) is portable; this crate is just the Maildir variant.
- No delivery scheduling — first-come-first-served per path. For priority queues use a different executor.
§Stone audit (v3 cycle, 2026-05-25)
| Axis | Status |
|---|---|
| doc | ✅ clean (cargo doc --no-deps -p mailrs-delivery-executor) |
| test | line cov: 96.1% (cargo llvm-cov -p mailrs-delivery-executor --summary-only) |
| bench | ✅ 1 file(s) criterion + ✅ 0 gate(s) perf_gate.rs |
| size | release rlib: 241 KB |
| fuzz | ❌ none |
| mem | dhat profile pending (v3.4 backlog) |
§Competitor comparisons (from PERFORMANCE.md)
- | SMTP receive throughput, post DeliveryExecutor (
mailrs-delivery-executor1.0 group-commit, 2026-05-24) | 999 msg/s mean across 3 × 30s × 32 conns (rounds: 1045 / 972 / 979). 3.4× vs the immediately-prior 291 msg/s baseline (same hardware, same bench). P50 32 ms (vs 105 ms baseline = 3.3× faster), P99 41 ms (vs 163 ms = 4.0× faster), P999 76 ms (vs 199 ms = 2.6× faster). All four UX axes — throughput, p50, p99, p999 — improve simultaneously; no axis regresses. The win comes from group-commit: 32 concurrent SMTP sessions delivering to the same Maildir path now share a single fsync per batch (max_batch=64, max_wait=10ms) viamailrs-delivery-executor’s mpsc →Maildir::deliver_batchpipeline, instead of each session driving its own per-message fsync. |cargo build --profile release-debug -p mailrs-server --bench smtp_load && $CARGO_TARGET_DIR/release-debug/deps/smtp_load-* --duration 30 --conns 32 --warmup 5| - | SMTP receive throughput, post pipelined DeliveryExecutor (
mailrs-delivery-executor1.1, max_concurrent_flushes=2, 2026-05-24) | 1079 msg/s mean across 3 × 30s × 32 conns (rounds: 1074 / 1073 / 1089). +8% vs the 1.0 serial-flush 999 msg/s. P50 29 ms (-9%), P99 36 ms (-12%), P999 45 ms (-41%) — tail latency is the headline win. Mechanism: while batch A’s fsync is in flight on aspawn_blockingthread, batch B starts collecting concurrently; aSemaphore-bounded pipeline of 2 in-flight flushes hides disk-wait behind batch-collection latency without queuing unbounded fsyncs. Cumulative since the perf-axis kickoff (#127): 291 → 1079 msg/s = 3.71× throughput, P999 199 → 45 ms = 4.4× faster tail. | Same reproduce command as the 1.0 row above; binary uses the new publishedmailrs-delivery-executor1.1 default tuning. |
§License
Apache-2.0 OR MIT.
§Performance
Criterion benches: cargo bench -p mailrs-delivery-executor. Per-bench medians + regression budgets are documented in BUDGETS.md (this crate) and the workspace PERFORMANCE.md.
§Why this exists
mailrs-maildir 1.2 introduced deliver_batch which is 15.27×
faster than per-message deliver at N=64 batches on APFS. The
microbench at crates/storage-maildir/benches/deliver.rs
measured this directly. But the SMTP receive path is structured
as N independent sessions each delivering 1-N messages — no
caller is naturally going to hand a batch of 64 messages to a
single deliver_batch call.
This module is the bridge: a single executor task accumulates
per-path delivery requests from concurrent SMTP sessions,
either until the batch reaches max_batch OR a max_wait
timeout fires, then groups by destination path and calls
deliver_batch once per path. Each calling session awaits a
oneshot::Receiver for its individual result.
§What it costs the caller
Per-message latency increases by up to max_wait. With
max_wait = 10ms and a typical load of 32 concurrent
connections, batches fill in 1-5ms in practice. Under low load
(single message in flight), the executor waits the full
max_wait before flushing a single-message batch — that’s
10ms latency added to every delivery in the worst case.
The win comes when load is high enough to fill the batch
before the timeout.
§Tuning
max_batch = 64matches the microbench sweet spotmax_wait = 10msis the standard SMTP-delivery latency tolerance — well below RFC 5321 timeouts and below human perception thresholds for delivery confirmation- For latency-sensitive deployments (e.g. transactional mail
where delivery confirmation feeds an HTTP response), lower
max_waitto 1-2ms; throughput drops, latency stays bounded.
Structs§
- Delivery
Executor - Handle held by SMTP sessions to submit deliveries.
Clone-safe (internally
Arc<mpsc::Sender>) — every session task can hold its own clone.
Constants§
- DEFAULT_
MAX_ BATCH - Default batch size — N=64 matches the maildir-1.2 microbench crossover where batched fsync hits ~15× throughput vs per-message.
- DEFAULT_
MAX_ CONCURRENT_ FLUSHES - Default in-flight flush concurrency. With N=2 the executor can start collecting batch B while batch A’s fsync is still in flight on a blocking thread, hiding the dir-fsync wait behind collection latency. Higher values don’t help on SSD/APFS because the disk serializes durable writes per-mount; they just queue more fsyncs without parallelism. N=1 (no pipeline) is the conservative baseline and matches v1.0.0 behavior.
- DEFAULT_
MAX_ WAIT - Default flush deadline. 10ms is well below any SMTP timeout and below most users’ perception threshold for delivery confirmation latency.