fsys 0.9.1

Adaptive file and directory IO for Rust — fast, hardware-aware, multi-strategy.
Documentation

FSYS (fsys-rs) is a low-level file and directory IO crate for Rust. It is aimed at systems code that needs explicit control over durability, predictable cross-platform behavior, and an API surface that stays close to how storage software actually thinks about IO.

The crate sits between std::fs and a fully bespoke platform layer. It keeps the operational model explicit: you choose a durability method, build a long-lived Handle, and issue file or directory operations through that handle. On supported platforms, fsys uses the best available primitive for the selected method while keeping fallback behavior visible rather than implicit.

That makes it a good fit for storage engines, embedded databases, local-first applications, durable caches, append-heavy services, background workers, and other programs where write semantics matter as much as raw throughput. It is not trying to replace std::fs for ordinary application code.

 

FEATURES

  • Journal substrate — open-once append-only log file with atomic LSN reservation, group-commit fsync, and a CRC-32C-protected self-identifying frame format. Intended for write-ahead-log workloads (database WAL, persistent queues, ledgers) where the atomic-replace primitive's per-call fsync cost is the bottleneck. Three throughput tiers are present: a cross-platform synchronous core, a lock-free concurrent append path, and a native io_uring asynchronous substrate on Linux. An opt-in Direct-IO mode (JournalOptions::direct(true)) routes appends through a sector-aligned in-memory log buffer — the architecture used by InnoDB's redo log and the WiredTiger journal — which trades the lock-free hot path for predictable tail latency and zero-copy device writes via O_DIRECT / F_NOCACHE / FILE_FLAG_NO_BUFFERING. 0.9.1 adds a vectored JournalHandle::append_batch(&[&[u8]]) that submits N records as a single framed-write syscall (~1.6× faster than append-in-loop on Windows page cache; larger wins expected on Linux + NVMe), hardware-accelerated CRC-32C with runtime CPU-feature dispatch (SSE4.2 / ARMv8 CRC), cache-padded hot atomics, stack-allocated frame encoding for small records, and a parking_lot Condvar leader/follower group-commit coordinator with two new tuning knobs (JournalOptions::group_commit_window, group_commit_max_batch) ported from emdb v0.8.5 (default Some(500 µs) / 8).
  • Five real durability methodsSync, Data, Mmap, Direct, and hardware-aware Auto. Every method is platform-honest: the actual primitive in use is observable via Handle::active_method() and Handle::active_durability_primitive().
  • Cross-platform IO semantics — one API surface across Linux, macOS, and Windows, with platform-specific fallbacks documented rather than hidden.
  • NVMe passthrough flush — on Linux (NVME_IOCTL_IO_CMD) and Windows (IOCTL_STORAGE_PROTOCOL_COMMAND) when the hardware supports it and the process has the privilege. Transparent fallback to fdatasync / WRITE_THROUGH otherwise.
  • Linux io_uring pathMethod::Direct on Linux routes through io_uring when available (kernel ≥ 5.1, no SECCOMP/AppArmor block), falling back to O_DIRECT + pwrite + fdatasync cleanly.
  • Atomic replace-style writes — every public write API (write, write_copy, write_batch, Batch::commit) uses a temp-file + atomic rename pattern. The target file is either entirely the old payload or entirely the new payload — never torn.
  • Crash-safety verified — per-method crash tests with three kill points (pre-syscall, mid-syscall, post-syscall) and the 100× pre-merge stability protocol.
  • write_copy with metadata preservation — atomic-swap that preserves the target's existing mode (Unix), owner/group (Unix, when permitted), ACLs (Windows), and timestamps (all platforms).
  • Root-scoped handles — bind a Handle to a base directory and reject paths that escape it.
  • Full file and directory CRUD — write, read, append, positioned writes, range reads, truncate, rename, copy, metadata, sync, directory creation/removal, listing, recursive scan, glob find, and recursive count.
  • Batch operations — grouped writes, deletes, and copies through write_batch, delete_batch, copy_batch, and the chainable Batch builder.
  • Async layer with two substrates — gated behind the async Cargo feature. Every sync method gets an _async sibling. On Linux + Method::Direct, async ops submit directly to the per-handle io_uring ring (the native substrate, new in 0.7.0). Everywhere else, async ops route through tokio::task::spawn_blocking. Which substrate a handle uses is observable via Handle::async_substrate() -> AsyncSubstrate.
  • Configurable group lane — tune batch window, batch size, queue depth, io_uring queue depth, and aligned-buffer-pool size per handle.
  • Quick one-shot API — convenience helpers backed by a lazily initialized default handle for simple cases.
  • Structured error reporting — 21 explicit error variants with stable FS-XXXXX codes for unsupported methods, alignment failures, atomic-replace failures, NVMe passthrough denial, async-runtime requirements, glob-pattern errors, batch failure position, handle poisoning, io_uring submit failure, and completion-driver liveness.

 

Installation

[dependencies]
fsys = "0.9.1"

To opt into the async layer:

[dependencies]
fsys = { version = "0.9.0", features = ["async"] }

Cargo features

Feature Default Pulls in Purpose
async off tokio (rt, rt-multi-thread, sync, macros) _async siblings for every sync method; async batch via tokio::sync::oneshot.
stress off (none) Switches the soak tests in tests/stress.rs from a 60-second validation run to the full 1-hour soak duration. CI nightly enables this; dev iteration leaves it off.
fuzz off (none) Compile-only flag for fuzz instrumentation. The actual fuzz targets live in fuzz/ (separate cargo-fuzz workspace).

Minimum supported Rust version

1.75. The MSRV may be raised in any minor version before 1.0.0. After 1.0.0, MSRV bumps require a minor version bump.

Benchmark results

Numbers below were captured on windows-ntfs-nvme (Windows 11 Pro, x86_64, local NVMe SSD; std::env::temp_dir() resolves to NTFS) with 100 timed iterations after 10 warmup. Run-to-run noise is roughly ±5 % on this host class. The full methodology, additional payload sizes, and Linux numbers live in docs/BENCH.md; reproduce locally with cargo bench.

Journal substrate vs atomic-replace — the headline 0.9.0 result. Atomic-replace pays 5–7 syscalls per durable write; the journal opens once, appends without per-call fsync, and amortises durability across a sync_through call.

Payload Atomic-replace Journal (sync at end) Speedup
64 B 634 ops/s 462.9 K ops/s 730×
4 KiB 891 ops/s 189.3 K ops/s 212×

The "sync at end" cadence is the canonical WAL pattern: append many records, fsync once at a transaction boundary. At an intermediate cadence (sync every 100 appends), the journal still delivers 109–255× the atomic-replace throughput. See docs/BENCH.md for the full table including the per-append sync cadence.

Atomic-replace write vs std::fs::write — tail latency is what fsys pays for; medians on small writes go to std::fs::write because it does not provide durability guarantees.

Payload fsys::Auto median / p99 std::fs::write median / p99
4 KiB 1.08 ms / 4.69 ms 218.7 µs / 7.18 ms
64 KiB 1.23 ms / 5.50 ms 4.48 ms / 5.47 ms
1 MiB 1.80 ms / 5.00 ms 2.84 ms / 16.45 ms

std::fs::write is ~5× faster than fsys::Auto at the 4 KiB median because it skips the fsync + atomic-rename cycle. At p99 the gap inverts: fsys::Auto is 3.3× faster than std::fs::write at 1 MiB because the durability cost is paid deterministically rather than deferred to OS scheduling. The fair comparison for durable writes is fsys::Sync versus std::fs plus a manual temp-file + sync_all + rename dance — the latter is what most application code gets wrong.

Read parity — the read path is essentially std::fs::read plus handle bookkeeping.

Payload fsys::Auto median / p99 std::fs::read median / p99 tokio::fs::read median / p99
4 KiB 25.0 / 89.4 µs 23.7 / 77.1 µs 35.8 / 152.8 µs
64 KiB 25.0 / 58.9 µs 24.1 / 64.0 µs 105.9 / 337.5 µs
1 MiB 182.5 / 482.3 µs 189.0 / 327.4 µs 250.7 / 585.8 µs

tokio::fs::read (simulated via spawn_blocking, which is what tokio's own fs module does internally) is 1.5–4.4× slower because of the thread-pool hop. On Linux + Method::Direct + the async feature, fsys's native io_uring substrate bypasses that hop entirely — see docs/BENCH.md for the WSL2 measurement.

Documentation

Coming Soon...

LICENSE

Licensed under the Apache License version 2.0 [ LICENSE-APACHE ], or the MIT License [ LICENSE-MIT ]; otherwise known as the ("License Agreement"); you are permitted to use this software, its source code, documentation, concepts, and any of the associated contents, within the limitations defined by the "License Agreement".