jdb_fs 0.1.4

Async Direct I/O for database storage / 数据库存储异步直接 I/O
Documentation

English | 中文


jdb_fs : Async Direct I/O for Database Storage

High-performance async file I/O library with Direct I/O support, built on compio.

Table of Contents

Features

  • Async Direct I/O bypassing OS page cache
  • Zero-copy I/O via BorrowedFd on Unix (no Arc overhead)
  • Page-aligned read/write with runtime alignment checks
  • WAL mode with O_DSYNC for durability
  • Cross-platform: Linux (io_uring + O_DIRECT), macOS (kqueue + F_NOCACHE), Windows (IOCP + NO_BUFFERING)
  • Space preallocation via fallocate/F_PREALLOCATE/SetFileInformationByHandle

Installation

[dependencies]
jdb_fs = "0.1"
jdb_alloc = "0.1"  # for AlignedBuf

Usage

Basic file operations:

use jdb_alloc::{AlignedBuf, PAGE_SIZE};
use jdb_fs::File;

async fn example() -> jdb_fs::Result<()> {
  // Create file
  let file = File::create("/tmp/test.dat").await?;

  // Write page-aligned data
  let mut buf = AlignedBuf::zeroed(PAGE_SIZE)?;
  buf[0..5].copy_from_slice(b"hello");
  file.write_at(buf, 0).await?;
  file.sync_data().await?;

  // Read back
  let buf = AlignedBuf::with_cap(PAGE_SIZE)?;
  let buf = file.read_at(buf, 0).await?;
  assert_eq!(&buf[0..5], b"hello");

  Ok(())
}

WAL mode with synchronous durability:

let wal = File::open_wal("/tmp/wal.log").await?;
// Writes are durable on return (O_DSYNC)

Filesystem utilities:

// Directory operations
jdb_fs::mkdir("/tmp/data").await?;
jdb_fs::rename("/tmp/old.dat", "/tmp/new.dat").await?;
jdb_fs::remove("/tmp/unwanted.dat").await?;

// Directory listing (files only)
let files = jdb_fs::ls("/tmp/data").await?;

// File metadata
let size = jdb_fs::size("/tmp/file.dat").await?;
let exists = jdb_fs::exists("/tmp/file.dat");

// Directory sync for WAL durability
jdb_fs::sync_dir("/tmp/wal_dir").await?;

API Reference

File

Async file wrapper with Direct I/O.

Method Description
open(path) Open read-only
create(path) Create new file (truncate if exists)
open_rw(path) Open read-write (create if not exists)
open_wal(path) Open for WAL with O_DSYNC
read_at(buf, offset) Read at offset (page-aligned)
write_at(buf, offset) Write at offset (page-aligned)
size() Get file size
sync_all() Sync data and metadata
sync_data() Sync data only
truncate(len) Truncate file to length
preallocate(len) Preallocate disk space

Error

Variant Description
Io System I/O error
Alloc Memory allocation error
Alignment Buffer/offset not page-aligned
ShortRead Read fewer bytes than expected
ShortWrite Wrote fewer bytes than expected
Join spawn_blocking task failed
Overflow File size exceeds i64

Filesystem Functions

Function Description
exists(path) Check if path exists
mkdir(path) Create directory recursively
ls(path) List files in directory (no subdirs)
size(path) Get file size without opening
rename(from, to) Atomic rename
remove(path) Remove file
sync_dir(path) Sync directory metadata

Constants

  • PAGE_SIZE: System page size (re-exported from jdb_alloc)

Architecture

graph TD
  A[Application] --> B[File]
  B --> C{Platform}
  C -->|Linux| D[io_uring + O_DIRECT]
  C -->|macOS| E[kqueue + F_NOCACHE]
  C -->|Windows| F[IOCP + NO_BUFFERING]
  D --> G[compio runtime]
  E --> G
  F --> G

Call flow for write_at:

  1. Check alignment (offset & len must be PAGE_SIZE aligned)
  2. Borrow raw fd via BorrowedFd (zero-copy)
  3. Submit WriteAt op to compio runtime
  4. io_uring/kqueue/IOCP completes async
  5. Return buffer ownership to caller

Directory Structure

jdb_fs/
├── src/
│   ├── lib.rs      # Public exports
│   ├── file.rs     # File struct and async methods
│   ├── error.rs    # Error types (thiserror)
│   ├── fs.rs       # Filesystem utilities
│   └── os/         # Platform-specific implementations
│       ├── mod.rs
│       ├── linux.rs   # O_DIRECT, fallocate
│       ├── macos.rs   # F_NOCACHE, F_PREALLOCATE
│       └── windows.rs # FILE_FLAG_NO_BUFFERING
├── tests/
│   └── main.rs     # Integration tests
└── Cargo.toml

Tech Stack

Component Technology
Async Runtime compio
Linux I/O io_uring
macOS I/O kqueue
Windows I/O IOCP
Error Handling thiserror
Memory Alignment jdb_alloc

History

io_uring was introduced in Linux kernel 5.1 (March 2019) by Jens Axboe, the block I/O maintainer. Before io_uring, Linux async I/O (AIO) required complex setup and had significant limitations. Axboe designed io_uring with shared ring buffers between kernel and userspace, eliminating syscall overhead for high-throughput scenarios.

Direct I/O (O_DIRECT) has been part of Linux since kernel 2.4. It bypasses the page cache, giving databases direct control over caching and ensuring predictable I/O latency. Database engines like MySQL InnoDB, PostgreSQL, and RocksDB rely heavily on Direct I/O for consistent performance.

The combination of io_uring + Direct I/O represents the state-of-the-art for database storage engines on Linux, achieving millions of IOPS on modern NVMe drives.


About

This project is an open-source component of js0.site ⋅ Refactoring the Internet Plan.

We are redefining the development paradigm of the Internet in a componentized way. Welcome to follow us:


About

This project is an open-source component of js0.site ⋅ Refactoring the Internet Plan.

We are redefining the development paradigm of the Internet in a componentized way. Welcome to follow us:


jdb_fs : 数据库存储异步直接 I/O

高性能异步文件 I/O 库,支持 Direct I/O,基于 compio 构建。

目录

特性

  • 异步 Direct I/O,绕过操作系统页缓存
  • Unix 上通过 BorrowedFd 实现零拷贝 I/O(无 Arc 开销)
  • 页对齐读写,运行时对齐检查
  • WAL 模式,O_DSYNC 保证持久性
  • 跨平台:Linux (io_uring + O_DIRECT)、macOS (kqueue + F_NOCACHE)、Windows (IOCP + NO_BUFFERING)
  • 通过 fallocate/F_PREALLOCATE/SetFileInformationByHandle 预分配空间

安装

[dependencies]
jdb_fs = "0.1"
jdb_alloc = "0.1"  # 用于 AlignedBuf

使用

基本文件操作:

use jdb_alloc::{AlignedBuf, PAGE_SIZE};
use jdb_fs::File;

async fn example() -> jdb_fs::Result<()> {
  // 创建文件
  let file = File::create("/tmp/test.dat").await?;

  // 写入页对齐数据
  let mut buf = AlignedBuf::zeroed(PAGE_SIZE)?;
  buf[0..5].copy_from_slice(b"hello");
  file.write_at(buf, 0).await?;
  file.sync_data().await?;

  // 读取
  let buf = AlignedBuf::with_cap(PAGE_SIZE)?;
  let buf = file.read_at(buf, 0).await?;
  assert_eq!(&buf[0..5], b"hello");

  Ok(())
}

WAL 模式,同步持久化:

let wal = File::open_wal("/tmp/wal.log").await?;
// 写入返回时数据已落盘 (O_DSYNC)

文件系统工具:

// 目录操作
jdb_fs::mkdir("/tmp/data").await?;
jdb_fs::rename("/tmp/old.dat", "/tmp/new.dat").await?;
jdb_fs::remove("/tmp/unwanted.dat").await?;

// 目录列表(仅文件)
let files = jdb_fs::ls("/tmp/data").await?;

// 文件元数据
let size = jdb_fs::size("/tmp/file.dat").await?;
let exists = jdb_fs::exists("/tmp/file.dat");

// 目录同步保证 WAL 持久性
jdb_fs::sync_dir("/tmp/wal_dir").await?;

API 参考

File

支持 Direct I/O 的异步文件封装。

方法 描述
open(path) 只读打开
create(path) 创建新文件(存在则截断)
open_rw(path) 读写打开(不存在则创建)
open_wal(path) WAL 模式打开,启用 O_DSYNC
read_at(buf, offset) 指定偏移读取(页对齐)
write_at(buf, offset) 指定偏移写入(页对齐)
size() 获取文件大小
sync_all() 同步数据和元数据
sync_data() 仅同步数据
truncate(len) 截断文件到指定长度
preallocate(len) 预分配磁盘空间

Error

变体 描述
Io 系统 I/O 错误
Alloc 内存分配错误
Alignment 缓冲区/偏移未页对齐
ShortRead 读取字节数不足
ShortWrite 写入字节数不足
Join spawn_blocking 任务失败
Overflow 文件大小超出 i64

文件系统函数

函数 描述
exists(path) 检查路径是否存在
mkdir(path) 递归创建目录
ls(path) 列出目录中的文件(不含子目录)
size(path) 获取文件大小(无需打开)
rename(from, to) 原子重命名
remove(path) 删除文件
sync_dir(path) 同步目录元数据

常量

  • PAGE_SIZE:系统页大小(从 jdb_alloc 重导出)

架构

graph TD
  A[应用层] --> B[File]
  B --> C{平台}
  C -->|Linux| D[io_uring + O_DIRECT]
  C -->|macOS| E[kqueue + F_NOCACHE]
  C -->|Windows| F[IOCP + NO_BUFFERING]
  D --> G[compio 运行时]
  E --> G
  F --> G

write_at 调用流程:

  1. 检查对齐(offset 和 len 必须 PAGE_SIZE 对齐)
  2. 通过 BorrowedFd 借用原始 fd(零拷贝)
  3. 向 compio 运行时提交 WriteAt 操作
  4. io_uring/kqueue/IOCP 异步完成
  5. 将缓冲区所有权返回调用方

目录结构

jdb_fs/
├── src/
│   ├── lib.rs      # 公开导出
│   ├── file.rs     # File 结构体和异步方法
│   ├── error.rs    # 错误类型 (thiserror)
│   ├── fs.rs       # 文件系统工具
│   └── os/         # 平台特定实现
│       ├── mod.rs
│       ├── linux.rs   # O_DIRECT, fallocate
│       ├── macos.rs   # F_NOCACHE, F_PREALLOCATE
│       └── windows.rs # FILE_FLAG_NO_BUFFERING
├── tests/
│   └── main.rs     # 集成测试
└── Cargo.toml

技术栈

组件 技术
异步运行时 compio
Linux I/O io_uring
macOS I/O kqueue
Windows I/O IOCP
错误处理 thiserror
内存对齐 jdb_alloc

历史

io_uring 由 Jens Axboe(Linux 块 I/O 维护者)在 2019 年 3 月引入 Linux 内核 5.1。在 io_uring 之前,Linux 异步 I/O (AIO) 设置复杂且限制颇多。Axboe 设计 io_uring 时采用内核与用户空间共享环形缓冲区,在高吞吐场景下消除系统调用开销。

Direct I/O (O_DIRECT) 自 Linux 内核 2.4 起就已存在。它绕过页缓存,让数据库直接控制缓存策略,确保 I/O 延迟可预测。MySQL InnoDB、PostgreSQL、RocksDB 等数据库引擎都重度依赖 Direct I/O 以获得稳定性能。

io_uring + Direct I/O 的组合代表了 Linux 数据库存储引擎的最先进技术,在现代 NVMe 驱动器上可达数百万 IOPS。


关于

本项目为 js0.site ⋅ 重构互联网计划 的开源组件。

我们正在以组件化的方式重新定义互联网的开发范式,欢迎关注:


关于

本项目为 js0.site ⋅ 重构互联网计划 的开源组件。

我们正在以组件化的方式重新定义互联网的开发范式,欢迎关注: