fsys 0.9.8

Filesystem IO for Rust storage engines: journal substrate, io_uring, NVMe passthrough, atomic writes, cross-platform durability.
Documentation

fsys is a foundation-tier filesystem IO crate for Rust storage engines, embedded databases, and durable services. It pairs an explicit durability model with a journal substrate, io_uring on Linux, NVMe passthrough, and atomic-replace writes — sitting one layer below your data structures and one layer above std::fs.

It is not trying to replace std::fs for ordinary application code.

 

Quickstart

use std::sync::Arc;
use fsys::{builder, JournalOptions};

fn main() -> fsys::Result<()> {
    // Build a handle once, share via Arc.
    let fs = Arc::new(builder().build()?);

    // Open an append-only journal &mdash; the WAL primitive.
    let log = fs.journal("/var/lib/myapp/log.wal")?;

    // Append many records without per-call fsync.
    let _ = log.append(b"txn 1: insert")?;
    let _ = log.append(b"txn 2: update")?;
    let lsn = log.append(b"txn 3: commit")?;

    // One fsync covers every prior append &mdash; group-commit.
    log.sync_through(lsn)?;

    Ok(())
}

For one-shot file IO (atomic-replace, durable), fsys::quick::write / read skip the handle:

fsys::quick::write("/etc/myapp/config.toml", b"value = 42")?;
let data = fsys::quick::read("/etc/myapp/config.toml")?;

See examples/ (17 runnable patterns) and docs/EXAMPLES.md for the full catalogue.

 

At a glance

  • Five durability methodsSync, Data, Mmap, Direct, and hardware-aware Auto. Every method is platform-honest: the actual primitive in use is observable via Handle::active_durability_primitive().
  • Journal substrate — open-once append-only log with atomic LSN reservation, group-commit fsync, and a CRC-32C-protected frame format. Three throughput tiers (sync, lock-free concurrent, native io_uring async on Linux). The HiveDB-class WAL primitive.
  • Atomic-replace writes — every write / write_copy / Batch::commit uses temp-file + atomic rename. The target is either entirely the old payload or entirely the new payload — never torn.
  • Linux io_uring on the hot pathMethod::Direct and the journal Direct-IO path submit through io_uring with IORING_OP_WRITE_FIXED against pre-registered buffer slots. Falls back to O_DIRECT + pwrite + fdatasync cleanly when io_uring is unavailable.
  • NVMe passthrough flush — on Linux (NVME_IOCTL_IO_CMD) and Windows (IOCTL_STORAGE_PROTOCOL_COMMAND) when the hardware supports it. Transparent fallback to fdatasync / WRITE_THROUGH otherwise.
  • Cross-platform reflinks — macOS clonefile(2) + Windows FSCTL_DUPLICATE_EXTENTS_TO_FILE give APFS / ReFS instant copy-on-write semantics. Multi-GiB checkpoint clones drop from seconds to microseconds.
  • Optional async layer (async feature) — every sync method gets an _async sibling. On Linux + Method::Direct, async ops submit directly to the per-handle io_uring ring (no spawn_blocking thread-pool hop).
  • Hardware-aware tuning — PLP detection, NAWUN/NAWUPF probe (atomic-write unit), Builder::tune_for(Workload::Database) preset, runtime CPU-feature detection for hardware CRC-32C.

 

When to use fsys

You need... Use
A casual file read or write in a non-critical path std::fs
Async file IO inside a tokio program, no durability requirements tokio::fs (which routes through spawn_blocking)
A durable write that survives kill -9 fsys — atomic-replace pattern
A write-ahead log / WAL / journal fsys::JournalHandle
Direct-IO on NVMe with explicit fsync control fsys::Handle with Method::Direct
One Rust crate that handles Linux + macOS + Windows durability cleanly fsys — per-platform fallback ladder, observable via Handle::active_durability_primitive()
The lowest possible std::fs::write latency in the happy path std::fs::write (skips fsync, doesn't survive crash)

The "fair comparison" for durable writes is fsys::Sync versus std::fs plus a manual temp-file + sync_all + rename dance — the latter is what most application code gets wrong. fsys provides this as a single public API call.

 

Performance

Numbers below were captured on windows-ntfs-nvme (Windows 11 Pro, x86_64, local NVMe SSD; std::env::temp_dir() resolves to NTFS) with 100 timed iterations after 10 warmup. Run-to-run noise is roughly ±5% on this host class. Full methodology, additional payload sizes, and Linux numbers live in docs/BENCH.md; reproduce locally with cargo bench.

Journal substrate vs atomic-replace

The headline result. Atomic-replace pays 5–7 syscalls per durable write; the journal opens once, appends without per-call fsync, and amortises durability across a sync_through call — the canonical WAL pattern.

Payload Atomic-replace Journal (sync at end) Speedup
64 B 634 ops/s 462.9 K ops/s 730×
4 KiB 891 ops/s 189.3 K ops/s 212×

At an intermediate cadence (sync every 100 appends), the journal still delivers 109–255× the atomic-replace throughput. See docs/BENCH.md for the full table including per-append sync cadence.

Atomic-replace write vs std::fs::write

fsys::Auto pays a deterministic durability cost; std::fs::write defers that cost to OS scheduling and pays it at p99 instead.

Payload fsys::Auto median / p99 std::fs::write median / p99
4 KiB 1.08 ms / 4.69 ms 218.7 µs / 7.18 ms
64 KiB 1.23 ms / 5.50 ms 4.48 ms / 5.47 ms
1 MiB 1.80 ms / 5.00 ms 2.84 ms / 16.45 ms

At 1 MiB, fsys::Auto is 3.3× faster than std::fs::write at p99 — durability paid up-front rather than at unpredictable points.

Read parity

The read path is essentially std::fs::read plus handle bookkeeping — no durability cost on reads.

Payload fsys::Auto median / p99 std::fs::read median / p99 tokio::fs::read median / p99
4 KiB 25.0 / 89.4 µs 23.7 / 77.1 µs 35.8 / 152.8 µs
64 KiB 25.0 / 58.9 µs 24.1 / 64.0 µs 105.9 / 337.5 µs
1 MiB 182.5 / 482.3 µs 189.0 / 327.4 µs 250.7 / 585.8 µs

tokio::fs::read is 1.5–4.4× slower than fsys::Auto because tokio's own fs module routes through spawn_blocking. On Linux + Method::Direct + the async feature, fsys's native io_uring substrate bypasses that thread-pool hop entirely.

 

Installation

[dependencies]
fsys = "0.9.8"

With the async layer:

[dependencies]
fsys = { version = "0.9.8", features = ["async"] }

Cargo features

Feature Default Pulls in Purpose
async off tokio (rt, rt-multi-thread, sync, macros) _async siblings for every sync method; async batch via tokio::sync::oneshot.
tracing off tracing Structured spans + events on the write / read / journal hot paths. No-op when subscriber is absent.
stress off (none) Switches tests/stress.rs from a 60-second validation run to the full 1-hour soak. CI nightly enables this; dev iteration leaves it off.
fuzz off (none) Compile-only flag for fuzz instrumentation. Actual targets live in fuzz/ (cargo-fuzz workspace).

Minimum supported Rust version

1.75. MSRV may be raised in any minor version before 1.0.0. After 1.0.0, MSRV bumps require a minor version bump.

 

Highlights by release

The full per-version delta lives in CHANGELOG.md. Headline capabilities by release:

Release Headline
0.9.7 GroupCommit wake-stampede fix (atomic pending_followers, ~5× lock-hold reduction under 100+ followers); Builder::sqpoll(idle_ms) opt-in kernel-side submission polling; IORING_REGISTER_FILES restored on both rings; OOM-injection test infrastructure; LSN atomic-ordering tightened to Release.
0.9.6 Full-codebase audit (38 findings); journal-on-io_uring via IORING_OP_WRITE_FIXED; APFS clonefile(2) + ReFS FSCTL_DUPLICATE_EXTENTS_TO_FILE reflinks for copy_file; real OS-version probes; Lsn + BatchError field lockdown for pre-1.0 stability.
0.9.5 Dual-buffered Direct-mode log buffer (multi-core scalable journal appends); Handle::punch_hole + Handle::write_zeros cross-platform sparse-file primitives; IORING_REGISTER_FILES on both io_uring rings.
0.9.4 io_uring elite flags (COOP_TASKRUN / SINGLE_ISSUER / DEFER_TASKRUN); linked Write+Fsync via IOSQE_IO_LINK; NAWUN / NAWUPF probe and Handle::atomic_write_unit(); macOS SyncMode::Barrier for F_BARRIERFSYNC; Linux WriteLifetimeHint for multi-stream NVMe.
0.9.3 Builder::dispatcher_shards(N) for multi-core batch throughput; Batch::commit_grouped() amortises parent-directory fsync.
0.9.2 PLP detection (Handle::is_plp_protected / plp_status); FsysObserver trait + Builder::observer for telemetry; Builder::tune_for(Workload::Database); runtime CPU-feature detection for hardware CRC-32C.
0.9.1 Vectored JournalHandle::append_batch(&[&[u8]]) (~1.6× faster than append-in-loop on Windows NTFS, larger wins on Linux NVMe); hardware-accelerated CRC-32C (SSE4.2 / ARMv8 CRC); cache-padded hot atomics; group-commit window + max-batch tuning.
0.9.0 Journal substrate (three throughput tiers); Direct-IO journal opt-in; CRC-32C frame format with tail-truncation detection; per-method crash-safety integration tests.

 

Documentation

LICENSE

Licensed under the Apache License version 2.0 [ LICENSE-APACHE ], or the MIT License [ LICENSE-MIT ]; otherwise known as the ("License Agreement"); you are permitted to use this software, its source code, documentation, concepts, and any of the associated contents, within the limitations defined by the "License Agreement".