fsys is a foundation-tier filesystem IO crate for Rust storage engines, embedded databases, and durable services. It pairs an explicit durability model with a journal substrate, io_uring on Linux, NVMe passthrough, and atomic-replace writes — sitting one layer below your data structures and one layer above std::fs.
It is not trying to replace std::fs for ordinary application code.
Quickstart
use Arc;
use ;
For one-shot file IO (atomic-replace, durable), fsys::quick::write / read skip the handle:
write?;
let data = read?;
See examples/ (33 runnable patterns) and docs/EXAMPLES.md for the full catalogue.
At a glance
- Five durability methods —
Sync,Data,Mmap,Direct, and hardware-awareAuto. Every method is platform-honest: the actual primitive in use is observable viaHandle::active_durability_primitive(). - Journal substrate — open-once append-only log with atomic LSN reservation, group-commit fsync, and a CRC-32C-protected frame format. Three throughput tiers (sync, lock-free concurrent, native io_uring async on Linux). The HiveDB-class WAL primitive.
- Atomic-replace writes — every
write/write_copy/Batch::commituses temp-file + atomic rename. The target is either entirely the old payload or entirely the new payload — never torn. - Linux io_uring on the hot path —
Method::Directand the journal Direct-IO path submit through io_uring withIORING_OP_WRITE_FIXEDagainst pre-registered buffer slots. Falls back toO_DIRECT+pwrite+fdatasynccleanly when io_uring is unavailable. - NVMe passthrough flush — on Linux (
NVME_IOCTL_IO_CMD) and Windows (IOCTL_STORAGE_PROTOCOL_COMMAND) when the hardware supports it. Transparent fallback tofdatasync/WRITE_THROUGHotherwise. - Cross-platform reflinks — macOS
clonefile(2)+ WindowsFSCTL_DUPLICATE_EXTENTS_TO_FILEgive APFS / ReFS instant copy-on-write semantics. Multi-GiB checkpoint clones drop from seconds to microseconds. - Optional async layer (
asyncfeature) — every sync method gets an_asyncsibling. On Linux +Method::Direct, async ops submit directly to the per-handle io_uring ring (nospawn_blockingthread-pool hop). - Hardware-aware tuning — PLP detection, NAWUN/NAWUPF probe (atomic-write unit),
Builder::tune_for(Workload::Database)preset, runtime CPU-feature detection for hardware CRC-32C. - Capability cache + SPDK gating (1.1.0) —
fsys::capability::capabilities()probes the system once (50–200 ms), caches the result to disk, and returns sub-millisecond loads thereafter.Method::Spdkis wired through the public API; the kernel-bypass backend lives in the companionfsys-spdkcrate. - Journal backend observability (1.1.0) — every
JournalHandleexposesbackend_kind()/backend_health()/backend_info()so ops teams can verify which IO path is live without ambiguity.
When to use fsys
| You need... | Use |
|---|---|
| A casual file read or write in a non-critical path | std::fs |
| Async file IO inside a tokio program, no durability requirements | tokio::fs (which routes through spawn_blocking) |
A durable write that survives kill -9 |
fsys — atomic-replace pattern |
| A write-ahead log / WAL / journal | fsys::JournalHandle |
| Direct-IO on NVMe with explicit fsync control | fsys::Handle with Method::Direct |
| One Rust crate that handles Linux + macOS + Windows durability cleanly | fsys — per-platform fallback ladder, observable via Handle::active_durability_primitive() |
The lowest possible std::fs::write latency in the happy path |
std::fs::write (skips fsync, doesn't survive crash) |
The "fair comparison" for durable writes is fsys::Sync versus std::fs plus a manual temp-file + sync_all + rename dance — the latter is what most application code gets wrong. fsys provides this as a single public API call.
Performance
Numbers below were captured on windows-ntfs-nvme (Windows 11 Pro, x86_64, local NVMe SSD; std::env::temp_dir() resolves to NTFS) with 100 timed iterations after 10 warmup. Run-to-run noise is roughly ±5% on this host class. Full methodology, additional payload sizes, and Linux numbers live in docs/BENCH.md; reproduce locally with cargo bench.
Journal substrate vs atomic-replace
The headline result. Atomic-replace pays 5–7 syscalls per durable write; the journal opens once, appends without per-call fsync, and amortises durability across a sync_through call — the canonical WAL pattern.
| Payload | Atomic-replace | Journal (sync at end) | Speedup |
|---|---|---|---|
| 64 B | 634 ops/s | 462.9 K ops/s | 730× |
| 4 KiB | 891 ops/s | 189.3 K ops/s | 212× |
At an intermediate cadence (sync every 100 appends), the journal still delivers 109–255× the atomic-replace throughput. See docs/BENCH.md for the full table including per-append sync cadence.
Atomic-replace write vs std::fs::write
fsys::Auto pays a deterministic durability cost; std::fs::write defers that cost to OS scheduling and pays it at p99 instead.
| Payload | fsys::Auto median / p99 |
std::fs::write median / p99 |
|---|---|---|
| 4 KiB | 1.08 ms / 4.69 ms | 218.7 µs / 7.18 ms |
| 64 KiB | 1.23 ms / 5.50 ms | 4.48 ms / 5.47 ms |
| 1 MiB | 1.80 ms / 5.00 ms | 2.84 ms / 16.45 ms |
At 1 MiB, fsys::Auto is 3.3× faster than std::fs::write at p99 — durability paid up-front rather than at unpredictable points.
Read parity
The read path is essentially std::fs::read plus handle bookkeeping — no durability cost on reads.
| Payload | fsys::Auto median / p99 |
std::fs::read median / p99 |
tokio::fs::read median / p99 |
|---|---|---|---|
| 4 KiB | 25.0 / 89.4 µs | 23.7 / 77.1 µs | 35.8 / 152.8 µs |
| 64 KiB | 25.0 / 58.9 µs | 24.1 / 64.0 µs | 105.9 / 337.5 µs |
| 1 MiB | 182.5 / 482.3 µs | 189.0 / 327.4 µs | 250.7 / 585.8 µs |
tokio::fs::read is 1.5–4.4× slower than fsys::Auto because tokio's own fs module routes through spawn_blocking. On Linux + Method::Direct + the async feature, fsys's native io_uring substrate bypasses that thread-pool hop entirely.
Installation
[]
= "1.1"
With the async layer:
[]
= { = "1.1", = ["async"] }
Cargo features
| Feature | Default | Pulls in | Purpose |
|---|---|---|---|
async |
off | tokio (rt, rt-multi-thread, sync, macros) |
_async siblings for every sync method; async batch via tokio::sync::oneshot. |
tracing |
off | tracing |
Structured spans + events on the write / read / journal hot paths. No-op when subscriber is absent. |
spdk (1.1.0) |
off | (companion crate fsys-spdk) |
Gates Method::Spdk activation. With the feature off, the variant compiles but selecting it returns Error::FeatureNotEnabled. The actual SPDK backend implementation ships in the fsys-spdk companion crate. See docs/SPDK.md. |
stress |
off | (none) | Switches tests/stress.rs from a 60-second validation run to the full 1-hour soak. CI nightly enables this; dev iteration leaves it off. |
fuzz |
off | (none) | Compile-only flag for fuzz instrumentation. Actual targets live in fuzz/ (cargo-fuzz workspace). |
Minimum supported Rust version
1.75. Through the 1.x line, MSRV bumps are allowed only in 1.x.0 minor releases (within the 12 most recent stable Rust versions at release time). Patch releases never bump MSRV. See docs/STABILITY-1.0.md for the full policy.
Highlights by release
The full per-version delta lives in CHANGELOG.md. Headline capabilities by release:
| Release | Headline |
|---|---|
| 1.1.0 | Capability cache + SPDK eligibility surface + JournalBackend trait + observability accessors. New Method::Spdk variant runtime-validated through Builder::build. Error::FeatureNotEnabled (FS-00022) + Error::SpdkUnavailable (FS-00023). 100% additive vs. 1.0.0; on-disk format unchanged. |
| 1.0.0 | First stable release. SemVer + on-disk-format guarantees apply for the 1.x line per docs/STABILITY-1.0.md. No source-logic changes vs. 0.9.8. |
| 0.9.8 | Final pre-1.0 polish: documentation refresh, examples expansion, canonical benchmarks, STABILITY-1.0.md commitment doc. |
| 0.9.7 | GroupCommit wake-stampede fix (atomic pending_followers, ~5× lock-hold reduction under 100+ followers); Builder::sqpoll(idle_ms) opt-in kernel-side submission polling; IORING_REGISTER_FILES restored on both rings; OOM-injection test infrastructure; LSN atomic-ordering tightened to Release. |
| 0.9.6 | Full-codebase audit (38 findings); journal-on-io_uring via IORING_OP_WRITE_FIXED; APFS clonefile(2) + ReFS FSCTL_DUPLICATE_EXTENTS_TO_FILE reflinks for copy_file; real OS-version probes; Lsn + BatchError field lockdown for pre-1.0 stability. |
| 0.9.5 | Dual-buffered Direct-mode log buffer (multi-core scalable journal appends); Handle::punch_hole + Handle::write_zeros cross-platform sparse-file primitives; IORING_REGISTER_FILES on both io_uring rings. |
| 0.9.4 | io_uring elite flags (COOP_TASKRUN / SINGLE_ISSUER / DEFER_TASKRUN); linked Write+Fsync via IOSQE_IO_LINK; NAWUN / NAWUPF probe and Handle::atomic_write_unit(); macOS SyncMode::Barrier for F_BARRIERFSYNC; Linux WriteLifetimeHint for multi-stream NVMe. |
| 0.9.3 | Builder::dispatcher_shards(N) for multi-core batch throughput; Batch::commit_grouped() amortises parent-directory fsync. |
| 0.9.2 | PLP detection (Handle::is_plp_protected / plp_status); FsysObserver trait + Builder::observer for telemetry; Builder::tune_for(Workload::Database); runtime CPU-feature detection for hardware CRC-32C. |
| 0.9.1 | Vectored JournalHandle::append_batch(&[&[u8]]) (~1.6× faster than append-in-loop on Windows NTFS, larger wins on Linux NVMe); hardware-accelerated CRC-32C (SSE4.2 / ARMv8 CRC); cache-padded hot atomics; group-commit window + max-batch tuning. |
| 0.9.0 | Journal substrate (three throughput tiers); Direct-IO journal opt-in; CRC-32C frame format with tail-truncation detection; per-method crash-safety integration tests. |
Documentation
- API reference: https://docs.rs/fsys
- 33 runnable examples:
docs/EXAMPLES.md— catalogues every example inexamples/with a "when to use this pattern" guide. - Architecture overview:
docs/ARCHITECTURE.md - Method matrix +
Autodecision ladder:docs/METHODS.md - Performance targets + tuning:
docs/PERFORMANCE.md - Crash-safety contract per method:
docs/CRASH-SAFETY.md - Per-platform behavior + capability requirements:
docs/PLATFORM-NOTES.md - SPDK setup guide (1.1.0):
docs/SPDK.md— hardware requirements, system setup, capability probe, and per-SpdkSkipReasonremediation steps. - Benchmark methodology + results:
docs/BENCH.md - Public-API reference:
docs/API.md - Per-version migration deltas:
CHANGELOG.md
LICENSE
Licensed under the Apache License version 2.0 [ LICENSE-APACHE ], or the MIT License [ LICENSE-MIT ]; otherwise known as the ("License Agreement"); you are permitted to use this software, its source code, documentation, concepts, and any of the associated contents, within the limitations defined by the "License Agreement".