fsys 1.1.0

Filesystem IO for Rust storage engines: journal substrate, io_uring, NVMe passthrough, atomic writes, cross-platform durability.
Documentation

fsys is a foundation-tier filesystem IO crate for Rust storage engines, embedded databases, and durable services. It pairs an explicit durability model with a journal substrate, io_uring on Linux, NVMe passthrough, and atomic-replace writes — sitting one layer below your data structures and one layer above std::fs.

It is not trying to replace std::fs for ordinary application code.

 

Quickstart

use std::sync::Arc;
use fsys::{builder, JournalOptions};

fn main() -> fsys::Result<()> {
    // Build a handle once, share via Arc.
    let fs = Arc::new(builder().build()?);

    // Open an append-only journal &mdash; the WAL primitive.
    let log = fs.journal("/var/lib/myapp/log.wal")?;

    // Append many records without per-call fsync.
    let _ = log.append(b"txn 1: insert")?;
    let _ = log.append(b"txn 2: update")?;
    let lsn = log.append(b"txn 3: commit")?;

    // One fsync covers every prior append &mdash; group-commit.
    log.sync_through(lsn)?;

    Ok(())
}

For one-shot file IO (atomic-replace, durable), fsys::quick::write / read skip the handle:

fsys::quick::write("/etc/myapp/config.toml", b"value = 42")?;
let data = fsys::quick::read("/etc/myapp/config.toml")?;

See examples/ (33 runnable patterns) and docs/EXAMPLES.md for the full catalogue.

 

At a glance

  • Five durability methodsSync, Data, Mmap, Direct, and hardware-aware Auto. Every method is platform-honest: the actual primitive in use is observable via Handle::active_durability_primitive().
  • Journal substrate — open-once append-only log with atomic LSN reservation, group-commit fsync, and a CRC-32C-protected frame format. Three throughput tiers (sync, lock-free concurrent, native io_uring async on Linux). The HiveDB-class WAL primitive.
  • Atomic-replace writes — every write / write_copy / Batch::commit uses temp-file + atomic rename. The target is either entirely the old payload or entirely the new payload — never torn.
  • Linux io_uring on the hot pathMethod::Direct and the journal Direct-IO path submit through io_uring with IORING_OP_WRITE_FIXED against pre-registered buffer slots. Falls back to O_DIRECT + pwrite + fdatasync cleanly when io_uring is unavailable.
  • NVMe passthrough flush — on Linux (NVME_IOCTL_IO_CMD) and Windows (IOCTL_STORAGE_PROTOCOL_COMMAND) when the hardware supports it. Transparent fallback to fdatasync / WRITE_THROUGH otherwise.
  • Cross-platform reflinks — macOS clonefile(2) + Windows FSCTL_DUPLICATE_EXTENTS_TO_FILE give APFS / ReFS instant copy-on-write semantics. Multi-GiB checkpoint clones drop from seconds to microseconds.
  • Optional async layer (async feature) — every sync method gets an _async sibling. On Linux + Method::Direct, async ops submit directly to the per-handle io_uring ring (no spawn_blocking thread-pool hop).
  • Hardware-aware tuning — PLP detection, NAWUN/NAWUPF probe (atomic-write unit), Builder::tune_for(Workload::Database) preset, runtime CPU-feature detection for hardware CRC-32C.
  • Capability cache + SPDK gating (1.1.0)fsys::capability::capabilities() probes the system once (50–200 ms), caches the result to disk, and returns sub-millisecond loads thereafter. Method::Spdk is wired through the public API; the kernel-bypass backend lives in the companion fsys-spdk crate.
  • Journal backend observability (1.1.0) — every JournalHandle exposes backend_kind() / backend_health() / backend_info() so ops teams can verify which IO path is live without ambiguity.

 

When to use fsys

You need... Use
A casual file read or write in a non-critical path std::fs
Async file IO inside a tokio program, no durability requirements tokio::fs (which routes through spawn_blocking)
A durable write that survives kill -9 fsys — atomic-replace pattern
A write-ahead log / WAL / journal fsys::JournalHandle
Direct-IO on NVMe with explicit fsync control fsys::Handle with Method::Direct
One Rust crate that handles Linux + macOS + Windows durability cleanly fsys — per-platform fallback ladder, observable via Handle::active_durability_primitive()
The lowest possible std::fs::write latency in the happy path std::fs::write (skips fsync, doesn't survive crash)

The "fair comparison" for durable writes is fsys::Sync versus std::fs plus a manual temp-file + sync_all + rename dance — the latter is what most application code gets wrong. fsys provides this as a single public API call.

 

Performance

Numbers below were captured on windows-ntfs-nvme (Windows 11 Pro, x86_64, local NVMe SSD; std::env::temp_dir() resolves to NTFS) with 100 timed iterations after 10 warmup. Run-to-run noise is roughly ±5% on this host class. Full methodology, additional payload sizes, and Linux numbers live in docs/BENCH.md; reproduce locally with cargo bench.

Journal substrate vs atomic-replace

The headline result. Atomic-replace pays 5–7 syscalls per durable write; the journal opens once, appends without per-call fsync, and amortises durability across a sync_through call — the canonical WAL pattern.

Payload Atomic-replace Journal (sync at end) Speedup
64 B 634 ops/s 462.9 K ops/s 730×
4 KiB 891 ops/s 189.3 K ops/s 212×

At an intermediate cadence (sync every 100 appends), the journal still delivers 109–255× the atomic-replace throughput. See docs/BENCH.md for the full table including per-append sync cadence.

Atomic-replace write vs std::fs::write

fsys::Auto pays a deterministic durability cost; std::fs::write defers that cost to OS scheduling and pays it at p99 instead.

Payload fsys::Auto median / p99 std::fs::write median / p99
4 KiB 1.08 ms / 4.69 ms 218.7 µs / 7.18 ms
64 KiB 1.23 ms / 5.50 ms 4.48 ms / 5.47 ms
1 MiB 1.80 ms / 5.00 ms 2.84 ms / 16.45 ms

At 1 MiB, fsys::Auto is 3.3× faster than std::fs::write at p99 — durability paid up-front rather than at unpredictable points.

Read parity

The read path is essentially std::fs::read plus handle bookkeeping — no durability cost on reads.

Payload fsys::Auto median / p99 std::fs::read median / p99 tokio::fs::read median / p99
4 KiB 25.0 / 89.4 µs 23.7 / 77.1 µs 35.8 / 152.8 µs
64 KiB 25.0 / 58.9 µs 24.1 / 64.0 µs 105.9 / 337.5 µs
1 MiB 182.5 / 482.3 µs 189.0 / 327.4 µs 250.7 / 585.8 µs

tokio::fs::read is 1.5–4.4× slower than fsys::Auto because tokio's own fs module routes through spawn_blocking. On Linux + Method::Direct + the async feature, fsys's native io_uring substrate bypasses that thread-pool hop entirely.

 

Installation

[dependencies]
fsys = "1.1"

With the async layer:

[dependencies]
fsys = { version = "1.1", features = ["async"] }

Cargo features

Feature Default Pulls in Purpose
async off tokio (rt, rt-multi-thread, sync, macros) _async siblings for every sync method; async batch via tokio::sync::oneshot.
tracing off tracing Structured spans + events on the write / read / journal hot paths. No-op when subscriber is absent.
spdk (1.1.0) off (companion crate fsys-spdk) Gates Method::Spdk activation. With the feature off, the variant compiles but selecting it returns Error::FeatureNotEnabled. The actual SPDK backend implementation ships in the fsys-spdk companion crate. See docs/SPDK.md.
stress off (none) Switches tests/stress.rs from a 60-second validation run to the full 1-hour soak. CI nightly enables this; dev iteration leaves it off.
fuzz off (none) Compile-only flag for fuzz instrumentation. Actual targets live in fuzz/ (cargo-fuzz workspace).

Minimum supported Rust version

1.75. Through the 1.x line, MSRV bumps are allowed only in 1.x.0 minor releases (within the 12 most recent stable Rust versions at release time). Patch releases never bump MSRV. See docs/STABILITY-1.0.md for the full policy.

 

Highlights by release

The full per-version delta lives in CHANGELOG.md. Headline capabilities by release:

Release Headline
1.1.0 Capability cache + SPDK eligibility surface + JournalBackend trait + observability accessors. New Method::Spdk variant runtime-validated through Builder::build. Error::FeatureNotEnabled (FS-00022) + Error::SpdkUnavailable (FS-00023). 100% additive vs. 1.0.0; on-disk format unchanged.
1.0.0 First stable release. SemVer + on-disk-format guarantees apply for the 1.x line per docs/STABILITY-1.0.md. No source-logic changes vs. 0.9.8.
0.9.8 Final pre-1.0 polish: documentation refresh, examples expansion, canonical benchmarks, STABILITY-1.0.md commitment doc.
0.9.7 GroupCommit wake-stampede fix (atomic pending_followers, ~5× lock-hold reduction under 100+ followers); Builder::sqpoll(idle_ms) opt-in kernel-side submission polling; IORING_REGISTER_FILES restored on both rings; OOM-injection test infrastructure; LSN atomic-ordering tightened to Release.
0.9.6 Full-codebase audit (38 findings); journal-on-io_uring via IORING_OP_WRITE_FIXED; APFS clonefile(2) + ReFS FSCTL_DUPLICATE_EXTENTS_TO_FILE reflinks for copy_file; real OS-version probes; Lsn + BatchError field lockdown for pre-1.0 stability.
0.9.5 Dual-buffered Direct-mode log buffer (multi-core scalable journal appends); Handle::punch_hole + Handle::write_zeros cross-platform sparse-file primitives; IORING_REGISTER_FILES on both io_uring rings.
0.9.4 io_uring elite flags (COOP_TASKRUN / SINGLE_ISSUER / DEFER_TASKRUN); linked Write+Fsync via IOSQE_IO_LINK; NAWUN / NAWUPF probe and Handle::atomic_write_unit(); macOS SyncMode::Barrier for F_BARRIERFSYNC; Linux WriteLifetimeHint for multi-stream NVMe.
0.9.3 Builder::dispatcher_shards(N) for multi-core batch throughput; Batch::commit_grouped() amortises parent-directory fsync.
0.9.2 PLP detection (Handle::is_plp_protected / plp_status); FsysObserver trait + Builder::observer for telemetry; Builder::tune_for(Workload::Database); runtime CPU-feature detection for hardware CRC-32C.
0.9.1 Vectored JournalHandle::append_batch(&[&[u8]]) (~1.6× faster than append-in-loop on Windows NTFS, larger wins on Linux NVMe); hardware-accelerated CRC-32C (SSE4.2 / ARMv8 CRC); cache-padded hot atomics; group-commit window + max-batch tuning.
0.9.0 Journal substrate (three throughput tiers); Direct-IO journal opt-in; CRC-32C frame format with tail-truncation detection; per-method crash-safety integration tests.

 

Documentation

LICENSE

Licensed under the Apache License version 2.0 [ LICENSE-APACHE ], or the MIT License [ LICENSE-MIT ]; otherwise known as the ("License Agreement"); you are permitted to use this software, its source code, documentation, concepts, and any of the associated contents, within the limitations defined by the "License Agreement".