jdb_fs : Async Direct I/O for Database Storage
High-performance async file I/O library with Direct I/O support, built on compio.
Table of Contents
Features
- Async Direct I/O bypassing OS page cache
- Zero-copy I/O via BorrowedFd on Unix (no Arc overhead)
- Page-aligned read/write with runtime alignment checks
- WAL mode with O_DSYNC for durability
- Cross-platform: Linux (io_uring + O_DIRECT), macOS (kqueue + F_NOCACHE), Windows (IOCP + NO_BUFFERING)
- Space preallocation via fallocate/F_PREALLOCATE/SetFileInformationByHandle
Installation
[]
= "0.1"
= "0.1" # for AlignedBuf
Usage
Basic file operations:
use ;
use File;
async
WAL mode with synchronous durability:
let wal = open_wal.await?;
// Writes are durable on return (O_DSYNC)
Filesystem utilities:
// Directory operations
mkdir.await?;
rename.await?;
remove.await?;
// Directory listing (files only)
let files = ls.await?;
// File metadata
let size = size.await?;
let exists = exists;
// Directory sync for WAL durability
sync_dir.await?;
API Reference
File
Async file wrapper with Direct I/O.
| Method | Description |
|---|---|
open(path) |
Open read-only |
create(path) |
Create new file (truncate if exists) |
open_rw(path) |
Open read-write (create if not exists) |
open_wal(path) |
Open for WAL with O_DSYNC |
read_at(buf, offset) |
Read at offset (page-aligned) |
write_at(buf, offset) |
Write at offset (page-aligned) |
size() |
Get file size |
sync_all() |
Sync data and metadata |
sync_data() |
Sync data only |
truncate(len) |
Truncate file to length |
preallocate(len) |
Preallocate disk space |
Error
| Variant | Description |
|---|---|
Io |
System I/O error |
Alloc |
Memory allocation error |
Alignment |
Buffer/offset not page-aligned |
ShortRead |
Read fewer bytes than expected |
ShortWrite |
Wrote fewer bytes than expected |
Join |
spawn_blocking task failed |
Overflow |
File size exceeds i64 |
Filesystem Functions
| Function | Description |
|---|---|
exists(path) |
Check if path exists |
mkdir(path) |
Create directory recursively |
ls(path) |
List files in directory (no subdirs) |
size(path) |
Get file size without opening |
rename(from, to) |
Atomic rename |
remove(path) |
Remove file |
sync_dir(path) |
Sync directory metadata |
Constants
PAGE_SIZE: System page size (re-exported from jdb_alloc)
Architecture
graph TD
A[Application] --> B[File]
B --> C{Platform}
C -->|Linux| D[io_uring + O_DIRECT]
C -->|macOS| E[kqueue + F_NOCACHE]
C -->|Windows| F[IOCP + NO_BUFFERING]
D --> G[compio runtime]
E --> G
F --> G
Call flow for write_at:
- Check alignment (offset & len must be PAGE_SIZE aligned)
- Borrow raw fd via BorrowedFd (zero-copy)
- Submit WriteAt op to compio runtime
- io_uring/kqueue/IOCP completes async
- Return buffer ownership to caller
Directory Structure
jdb_fs/
├── src/
│ ├── lib.rs # Public exports
│ ├── file.rs # File struct and async methods
│ ├── error.rs # Error types (thiserror)
│ ├── fs.rs # Filesystem utilities
│ └── os/ # Platform-specific implementations
│ ├── mod.rs
│ ├── linux.rs # O_DIRECT, fallocate
│ ├── macos.rs # F_NOCACHE, F_PREALLOCATE
│ └── windows.rs # FILE_FLAG_NO_BUFFERING
├── tests/
│ └── main.rs # Integration tests
└── Cargo.toml
Tech Stack
| Component | Technology |
|---|---|
| Async Runtime | compio |
| Linux I/O | io_uring |
| macOS I/O | kqueue |
| Windows I/O | IOCP |
| Error Handling | thiserror |
| Memory Alignment | jdb_alloc |
History
io_uring was introduced in Linux kernel 5.1 (March 2019) by Jens Axboe, the block I/O maintainer. Before io_uring, Linux async I/O (AIO) required complex setup and had significant limitations. Axboe designed io_uring with shared ring buffers between kernel and userspace, eliminating syscall overhead for high-throughput scenarios.
Direct I/O (O_DIRECT) has been part of Linux since kernel 2.4. It bypasses the page cache, giving databases direct control over caching and ensuring predictable I/O latency. Database engines like MySQL InnoDB, PostgreSQL, and RocksDB rely heavily on Direct I/O for consistent performance.
The combination of io_uring + Direct I/O represents the state-of-the-art for database storage engines on Linux, achieving millions of IOPS on modern NVMe drives.
About
This project is an open-source component of js0.site ⋅ Refactoring the Internet Plan.
We are redefining the development paradigm of the Internet in a componentized way. Welcome to follow us:
About
This project is an open-source component of js0.site ⋅ Refactoring the Internet Plan.
We are redefining the development paradigm of the Internet in a componentized way. Welcome to follow us:
jdb_fs : 数据库存储异步直接 I/O
高性能异步文件 I/O 库,支持 Direct I/O,基于 compio 构建。
目录
特性
- 异步 Direct I/O,绕过操作系统页缓存
- Unix 上通过 BorrowedFd 实现零拷贝 I/O(无 Arc 开销)
- 页对齐读写,运行时对齐检查
- WAL 模式,O_DSYNC 保证持久性
- 跨平台:Linux (io_uring + O_DIRECT)、macOS (kqueue + F_NOCACHE)、Windows (IOCP + NO_BUFFERING)
- 通过 fallocate/F_PREALLOCATE/SetFileInformationByHandle 预分配空间
安装
[]
= "0.1"
= "0.1" # 用于 AlignedBuf
使用
基本文件操作:
use ;
use File;
async
WAL 模式,同步持久化:
let wal = open_wal.await?;
// 写入返回时数据已落盘 (O_DSYNC)
文件系统工具:
// 目录操作
mkdir.await?;
rename.await?;
remove.await?;
// 目录列表(仅文件)
let files = ls.await?;
// 文件元数据
let size = size.await?;
let exists = exists;
// 目录同步保证 WAL 持久性
sync_dir.await?;
API 参考
File
支持 Direct I/O 的异步文件封装。
| 方法 | 描述 |
|---|---|
open(path) |
只读打开 |
create(path) |
创建新文件(存在则截断) |
open_rw(path) |
读写打开(不存在则创建) |
open_wal(path) |
WAL 模式打开,启用 O_DSYNC |
read_at(buf, offset) |
指定偏移读取(页对齐) |
write_at(buf, offset) |
指定偏移写入(页对齐) |
size() |
获取文件大小 |
sync_all() |
同步数据和元数据 |
sync_data() |
仅同步数据 |
truncate(len) |
截断文件到指定长度 |
preallocate(len) |
预分配磁盘空间 |
Error
| 变体 | 描述 |
|---|---|
Io |
系统 I/O 错误 |
Alloc |
内存分配错误 |
Alignment |
缓冲区/偏移未页对齐 |
ShortRead |
读取字节数不足 |
ShortWrite |
写入字节数不足 |
Join |
spawn_blocking 任务失败 |
Overflow |
文件大小超出 i64 |
文件系统函数
| 函数 | 描述 |
|---|---|
exists(path) |
检查路径是否存在 |
mkdir(path) |
递归创建目录 |
ls(path) |
列出目录中的文件(不含子目录) |
size(path) |
获取文件大小(无需打开) |
rename(from, to) |
原子重命名 |
remove(path) |
删除文件 |
sync_dir(path) |
同步目录元数据 |
常量
PAGE_SIZE:系统页大小(从 jdb_alloc 重导出)
架构
graph TD
A[应用层] --> B[File]
B --> C{平台}
C -->|Linux| D[io_uring + O_DIRECT]
C -->|macOS| E[kqueue + F_NOCACHE]
C -->|Windows| F[IOCP + NO_BUFFERING]
D --> G[compio 运行时]
E --> G
F --> G
write_at 调用流程:
- 检查对齐(offset 和 len 必须 PAGE_SIZE 对齐)
- 通过 BorrowedFd 借用原始 fd(零拷贝)
- 向 compio 运行时提交 WriteAt 操作
- io_uring/kqueue/IOCP 异步完成
- 将缓冲区所有权返回调用方
目录结构
jdb_fs/
├── src/
│ ├── lib.rs # 公开导出
│ ├── file.rs # File 结构体和异步方法
│ ├── error.rs # 错误类型 (thiserror)
│ ├── fs.rs # 文件系统工具
│ └── os/ # 平台特定实现
│ ├── mod.rs
│ ├── linux.rs # O_DIRECT, fallocate
│ ├── macos.rs # F_NOCACHE, F_PREALLOCATE
│ └── windows.rs # FILE_FLAG_NO_BUFFERING
├── tests/
│ └── main.rs # 集成测试
└── Cargo.toml
技术栈
| 组件 | 技术 |
|---|---|
| 异步运行时 | compio |
| Linux I/O | io_uring |
| macOS I/O | kqueue |
| Windows I/O | IOCP |
| 错误处理 | thiserror |
| 内存对齐 | jdb_alloc |
历史
io_uring 由 Jens Axboe(Linux 块 I/O 维护者)在 2019 年 3 月引入 Linux 内核 5.1。在 io_uring 之前,Linux 异步 I/O (AIO) 设置复杂且限制颇多。Axboe 设计 io_uring 时采用内核与用户空间共享环形缓冲区,在高吞吐场景下消除系统调用开销。
Direct I/O (O_DIRECT) 自 Linux 内核 2.4 起就已存在。它绕过页缓存,让数据库直接控制缓存策略,确保 I/O 延迟可预测。MySQL InnoDB、PostgreSQL、RocksDB 等数据库引擎都重度依赖 Direct I/O 以获得稳定性能。
io_uring + Direct I/O 的组合代表了 Linux 数据库存储引擎的最先进技术,在现代 NVMe 驱动器上可达数百万 IOPS。
关于
本项目为 js0.site ⋅ 重构互联网计划 的开源组件。
我们正在以组件化的方式重新定义互联网的开发范式,欢迎关注:
关于
本项目为 js0.site ⋅ 重构互联网计划 的开源组件。
我们正在以组件化的方式重新定义互联网的开发范式,欢迎关注: