FSYS (fsys-rs) is a low-level file and directory IO crate for Rust.
It is aimed at systems code that needs explicit control over durability,
predictable cross-platform behavior, and an API surface that stays close to
how storage software actually thinks about IO.
The crate sits between std::fs and a fully bespoke platform layer. It keeps
the operational model explicit: you choose a durability method, build a
long-lived Handle, and issue file or directory operations through that
handle. On supported platforms, fsys uses the best available primitive for
the selected method while keeping fallback behavior visible rather than
implicit.
That makes it a good fit for storage engines, embedded databases, local-first
applications, durable caches, append-heavy services, background workers, and
other programs where write semantics matter as much as raw throughput. It is
not trying to replace std::fs for ordinary application code.
FEATURES
- Journal substrate — open-once append-only log file with atomic LSN reservation, group-commit fsync, and a CRC-32C-protected self-identifying frame format. Intended for write-ahead-log workloads (database WAL, persistent queues, ledgers) where the atomic-replace primitive's per-call fsync cost is the bottleneck. Three throughput tiers are present: a cross-platform synchronous core, a lock-free concurrent append path, and a native io_uring asynchronous substrate on Linux. An opt-in Direct-IO mode (
JournalOptions::direct(true)) routes appends through a sector-aligned in-memory log buffer — the architecture used by InnoDB's redo log and the WiredTiger journal — which trades the lock-free hot path for predictable tail latency and zero-copy device writes viaO_DIRECT/F_NOCACHE/FILE_FLAG_NO_BUFFERING. 0.9.1 adds a vectoredJournalHandle::append_batch(&[&[u8]])that submits N records as a single framed-write syscall (~1.6× faster thanappend-in-loop on Windows page cache; larger wins expected on Linux + NVMe), hardware-accelerated CRC-32C with runtime CPU-feature dispatch (SSE4.2 / ARMv8 CRC), cache-padded hot atomics, stack-allocated frame encoding for small records, and a parking_lot Condvar leader/follower group-commit coordinator with two new tuning knobs (JournalOptions::group_commit_window,group_commit_max_batch) ported from emdb v0.8.5 (defaultSome(500 µs)/8). - Five real durability methods —
Sync,Data,Mmap,Direct, and hardware-awareAuto. Every method is platform-honest: the actual primitive in use is observable viaHandle::active_method()andHandle::active_durability_primitive(). - Cross-platform IO semantics — one API surface across Linux, macOS, and Windows, with platform-specific fallbacks documented rather than hidden.
- NVMe passthrough flush — on Linux (
NVME_IOCTL_IO_CMD) and Windows (IOCTL_STORAGE_PROTOCOL_COMMAND) when the hardware supports it and the process has the privilege. Transparent fallback tofdatasync/WRITE_THROUGHotherwise. - Linux io_uring path —
Method::Directon Linux routes throughio_uringwhen available (kernel ≥ 5.1, no SECCOMP/AppArmor block), falling back toO_DIRECT+pwrite+fdatasynccleanly. - Atomic replace-style writes — every public write API (
write,write_copy,write_batch,Batch::commit) uses a temp-file + atomic rename pattern. The target file is either entirely the old payload or entirely the new payload — never torn. - Crash-safety verified — per-method crash tests with three kill points (pre-syscall, mid-syscall, post-syscall) and the 100× pre-merge stability protocol.
write_copywith metadata preservation — atomic-swap that preserves the target's existing mode (Unix), owner/group (Unix, when permitted), ACLs (Windows), and timestamps (all platforms).- Root-scoped handles — bind a
Handleto a base directory and reject paths that escape it. - Full file and directory CRUD — write, read, append, positioned writes, range reads, truncate, rename, copy, metadata, sync, directory creation/removal, listing, recursive scan, glob find, and recursive count.
- Batch operations — grouped writes, deletes, and copies through
write_batch,delete_batch,copy_batch, and the chainableBatchbuilder. - Async layer with two substrates — gated behind the
asyncCargo feature. Every sync method gets an_asyncsibling. On Linux +Method::Direct, async ops submit directly to the per-handle io_uring ring (the native substrate, new in0.7.0). Everywhere else, async ops route throughtokio::task::spawn_blocking. Which substrate a handle uses is observable viaHandle::async_substrate() -> AsyncSubstrate. - Configurable group lane — tune batch window, batch size, queue depth, io_uring queue depth, and aligned-buffer-pool size per handle.
- Quick one-shot API — convenience helpers backed by a lazily initialized default handle for simple cases.
- Structured error reporting — 21 explicit error variants with stable
FS-XXXXXcodes for unsupported methods, alignment failures, atomic-replace failures, NVMe passthrough denial, async-runtime requirements, glob-pattern errors, batch failure position, handle poisoning, io_uring submit failure, and completion-driver liveness. - Hardware-aware database surface (0.9.2) —
Handle::is_plp_protected()/Handle::plp_status()for safe per-commit fsync skip on confirmed-PLP enterprise NVMe (3–10× transaction-throughput lever);crate::observer::FsysObservertrait +Builder::observerfor typed per-op telemetry (journal append / sync / handle write / read); runtime CPU-feature detection (replacing pre-0.9.2 compile-timecfg!(target_feature = ...)that lied on cross-target builds);Builder::tune_for(Workload::Database)for one-line storage-engine tuning (8 MiB buffer pool, 256-deep io_uring ring, 4096-deep batch queue). - Pipeline throughput tier (0.9.3) —
Builder::dispatcher_shards(N)spawns N independent dispatcher threads per handle, each with its own bounded queue; batches hash-routed by first op's path so within-batch order is preserved while concurrent submitters writing to different files scale near-linearly with shard count (was a one-core ceiling pre-0.9.3).Batch::commit_grouped()amortises parent-directoryfsyncacross the entire batch — one syscall per unique parent directory instead of one per op — for bulk-load / SST-flush / checkpoint workloads where the batch is the durability unit.
Installation
[]
= "0.9.3"
To opt into the async layer:
[]
= { = "0.9.0", = ["async"] }
Cargo features
| Feature | Default | Pulls in | Purpose |
|---|---|---|---|
async |
off | tokio (rt, rt-multi-thread, sync, macros) |
_async siblings for every sync method; async batch via tokio::sync::oneshot. |
stress |
off | (none) | Switches the soak tests in tests/stress.rs from a 60-second validation run to the full 1-hour soak duration. CI nightly enables this; dev iteration leaves it off. |
fuzz |
off | (none) | Compile-only flag for fuzz instrumentation. The actual fuzz targets live in fuzz/ (separate cargo-fuzz workspace). |
Minimum supported Rust version
1.75. The MSRV may be raised in any minor version before 1.0.0. After 1.0.0, MSRV bumps require a minor version bump.
Benchmark results
Numbers below were captured on windows-ntfs-nvme (Windows 11 Pro, x86_64, local NVMe SSD; std::env::temp_dir() resolves to NTFS) with 100 timed iterations after 10 warmup. Run-to-run noise is roughly ±5 % on this host class. The full methodology, additional payload sizes, and Linux numbers live in docs/BENCH.md; reproduce locally with cargo bench.
Journal substrate vs atomic-replace — the headline 0.9.0 result. Atomic-replace pays 5–7 syscalls per durable write; the journal opens once, appends without per-call fsync, and amortises durability across a sync_through call.
| Payload | Atomic-replace | Journal (sync at end) | Speedup |
|---|---|---|---|
| 64 B | 634 ops/s | 462.9 K ops/s | 730× |
| 4 KiB | 891 ops/s | 189.3 K ops/s | 212× |
The "sync at end" cadence is the canonical WAL pattern: append many records, fsync once at a transaction boundary. At an intermediate cadence (sync every 100 appends), the journal still delivers 109–255× the atomic-replace throughput. See docs/BENCH.md for the full table including the per-append sync cadence.
Atomic-replace write vs std::fs::write — tail latency is what fsys pays for; medians on small writes go to std::fs::write because it does not provide durability guarantees.
| Payload | fsys::Auto median / p99 |
std::fs::write median / p99 |
|---|---|---|
| 4 KiB | 1.08 ms / 4.69 ms | 218.7 µs / 7.18 ms |
| 64 KiB | 1.23 ms / 5.50 ms | 4.48 ms / 5.47 ms |
| 1 MiB | 1.80 ms / 5.00 ms | 2.84 ms / 16.45 ms |
std::fs::write is ~5× faster than fsys::Auto at the 4 KiB median because it skips the fsync + atomic-rename cycle. At p99 the gap inverts: fsys::Auto is 3.3× faster than std::fs::write at 1 MiB because the durability cost is paid deterministically rather than deferred to OS scheduling. The fair comparison for durable writes is fsys::Sync versus std::fs plus a manual temp-file + sync_all + rename dance — the latter is what most application code gets wrong.
Read parity — the read path is essentially std::fs::read plus handle bookkeeping.
| Payload | fsys::Auto median / p99 |
std::fs::read median / p99 |
tokio::fs::read median / p99 |
|---|---|---|---|
| 4 KiB | 25.0 / 89.4 µs | 23.7 / 77.1 µs | 35.8 / 152.8 µs |
| 64 KiB | 25.0 / 58.9 µs | 24.1 / 64.0 µs | 105.9 / 337.5 µs |
| 1 MiB | 182.5 / 482.3 µs | 189.0 / 327.4 µs | 250.7 / 585.8 µs |
tokio::fs::read (simulated via spawn_blocking, which is what tokio's own fs module does internally) is 1.5–4.4× slower because of the thread-pool hop. On Linux + Method::Direct + the async feature, fsys's native io_uring substrate bypasses that hop entirely — see docs/BENCH.md for the WSL2 measurement.
Documentation
- API reference: https://docs.rs/fsys
- Architecture overview:
docs/ARCHITECTURE.md - Runnable examples (16):
docs/EXAMPLES.md— catalogues every example inexamples/with a "when to use this pattern" guide. - Method matrix and
Autodecision ladder:docs/METHODS.md - Performance targets and tuning:
docs/PERFORMANCE.md - Crash-safety contract per method:
docs/CRASH-SAFETY.md - Per-platform behavior + capability requirements:
docs/PLATFORM-NOTES.md - API stability + breaking-change policy: see Stability + breaking-change policy and API changes in 0.9.0 in
docs/API.md. Per-version migration deltas live inCHANGELOG.md.
Coming Soon...
LICENSE
Licensed under the Apache License version 2.0 [ LICENSE-APACHE ], or the MIT License [ LICENSE-MIT ]; otherwise known as the ("License Agreement"); you are permitted to use this software, its source code, documentation, concepts, and any of the associated contents, within the limitations defined by the "License Agreement".