jdb_fs 0.1.3

Async Direct I/O for database storage / 数据库存储异步直接 I/O
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
[English]#en | [中文]#zh

---

<a id="en"></a>

# jdb_fs : Async Direct I/O for Database Storage

High-performance async file I/O library with Direct I/O support, built on compio.

## Table of Contents

- [Features]#features
- [Installation]#installation
- [Usage]#usage
- [API Reference]#api-reference
- [Architecture]#architecture
- [Directory Structure]#directory-structure
- [Tech Stack]#tech-stack
- [History]#history

## Features

- Async Direct I/O bypassing OS page cache
- Zero-copy I/O via BorrowedFd on Unix (no Arc overhead)
- Page-aligned read/write with runtime alignment checks
- WAL mode with O_DSYNC for durability
- Cross-platform: Linux (io_uring + O_DIRECT), macOS (kqueue + F_NOCACHE), Windows (IOCP + NO_BUFFERING)
- Space preallocation via fallocate/F_PREALLOCATE/SetFileInformationByHandle

## Installation

```toml
[dependencies]
jdb_fs = "0.1"
jdb_alloc = "0.1"  # for AlignedBuf
```

## Usage

Basic file operations:

```rust
use jdb_alloc::{AlignedBuf, PAGE_SIZE};
use jdb_fs::File;

async fn example() -> jdb_fs::Result<()> {
  // Create file
  let file = File::create("/tmp/test.dat").await?;

  // Write page-aligned data
  let mut buf = AlignedBuf::zeroed(PAGE_SIZE)?;
  buf[0..5].copy_from_slice(b"hello");
  file.write_at(buf, 0).await?;
  file.sync_data().await?;

  // Read back
  let buf = AlignedBuf::with_cap(PAGE_SIZE)?;
  let buf = file.read_at(buf, 0).await?;
  assert_eq!(&buf[0..5], b"hello");

  Ok(())
}
```

WAL mode with synchronous durability:

```rust
let wal = File::open_wal("/tmp/wal.log").await?;
// Writes are durable on return (O_DSYNC)
```

Filesystem utilities:

```rust
// Directory operations
jdb_fs::mkdir("/tmp/data").await?;
jdb_fs::rename("/tmp/old.dat", "/tmp/new.dat").await?;
jdb_fs::remove("/tmp/unwanted.dat").await?;

// Directory listing (files only)
let files = jdb_fs::ls("/tmp/data").await?;

// File metadata
let size = jdb_fs::size("/tmp/file.dat").await?;
let exists = jdb_fs::exists("/tmp/file.dat");

// Directory sync for WAL durability
jdb_fs::sync_dir("/tmp/wal_dir").await?;
```

## API Reference

### File

Async file wrapper with Direct I/O.

| Method | Description |
|--------|-------------|
| `open(path)` | Open read-only |
| `create(path)` | Create new file (truncate if exists) |
| `open_rw(path)` | Open read-write (create if not exists) |
| `open_wal(path)` | Open for WAL with O_DSYNC |
| `read_at(buf, offset)` | Read at offset (page-aligned) |
| `write_at(buf, offset)` | Write at offset (page-aligned) |
| `size()` | Get file size |
| `sync_all()` | Sync data and metadata |
| `sync_data()` | Sync data only |
| `truncate(len)` | Truncate file to length |
| `preallocate(len)` | Preallocate disk space |

### Error

| Variant | Description |
|---------|-------------|
| `Io` | System I/O error |
| `Alloc` | Memory allocation error |
| `Alignment` | Buffer/offset not page-aligned |
| `ShortRead` | Read fewer bytes than expected |
| `ShortWrite` | Wrote fewer bytes than expected |
| `Join` | spawn_blocking task failed |
| `Overflow` | File size exceeds i64 |

### Filesystem Functions

| Function | Description |
|----------|-------------|
| `exists(path)` | Check if path exists |
| `mkdir(path)` | Create directory recursively |
| `ls(path)` | List files in directory (no subdirs) |
| `size(path)` | Get file size without opening |
| `rename(from, to)` | Atomic rename |
| `remove(path)` | Remove file |
| `sync_dir(path)` | Sync directory metadata |

### Constants

- `PAGE_SIZE`: System page size (re-exported from jdb_alloc)

## Architecture

```mermaid
graph TD
  A[Application] --> B[File]
  B --> C{Platform}
  C -->|Linux| D[io_uring + O_DIRECT]
  C -->|macOS| E[kqueue + F_NOCACHE]
  C -->|Windows| F[IOCP + NO_BUFFERING]
  D --> G[compio runtime]
  E --> G
  F --> G
```

Call flow for `write_at`:

1. Check alignment (offset & len must be PAGE_SIZE aligned)
2. Borrow raw fd via BorrowedFd (zero-copy)
3. Submit WriteAt op to compio runtime
4. io_uring/kqueue/IOCP completes async
5. Return buffer ownership to caller

## Directory Structure

```
jdb_fs/
├── src/
│   ├── lib.rs      # Public exports
│   ├── file.rs     # File struct and async methods
│   ├── error.rs    # Error types (thiserror)
│   ├── fs.rs       # Filesystem utilities
│   └── os/         # Platform-specific implementations
│       ├── mod.rs
│       ├── linux.rs   # O_DIRECT, fallocate
│       ├── macos.rs   # F_NOCACHE, F_PREALLOCATE
│       └── windows.rs # FILE_FLAG_NO_BUFFERING
├── tests/
│   └── main.rs     # Integration tests
└── Cargo.toml
```

## Tech Stack

| Component | Technology |
|-----------|------------|
| Async Runtime | compio |
| Linux I/O | io_uring |
| macOS I/O | kqueue |
| Windows I/O | IOCP |
| Error Handling | thiserror |
| Memory Alignment | jdb_alloc |

## History

io_uring was introduced in Linux kernel 5.1 (March 2019) by Jens Axboe, the block I/O maintainer. Before io_uring, Linux async I/O (AIO) required complex setup and had significant limitations. Axboe designed io_uring with shared ring buffers between kernel and userspace, eliminating syscall overhead for high-throughput scenarios.

Direct I/O (O_DIRECT) has been part of Linux since kernel 2.4. It bypasses the page cache, giving databases direct control over caching and ensuring predictable I/O latency. Database engines like MySQL InnoDB, PostgreSQL, and RocksDB rely heavily on Direct I/O for consistent performance.

The combination of io_uring + Direct I/O represents the state-of-the-art for database storage engines on Linux, achieving millions of IOPS on modern NVMe drives.

---

## About

This project is an open-source component of [js0.site ⋅ Refactoring the Internet Plan](https://js0.site).

We are redefining the development paradigm of the Internet in a componentized way. Welcome to follow us:

* [Google Group]https://groups.google.com/g/js0-site
* [js0site.bsky.social]https://bsky.app/profile/js0site.bsky.social

---

## About

This project is an open-source component of [js0.site ⋅ Refactoring the Internet Plan](https://js0.site).

We are redefining the development paradigm of the Internet in a componentized way. Welcome to follow us:

* [Google Group]https://groups.google.com/g/js0-site
* [js0site.bsky.social]https://bsky.app/profile/js0site.bsky.social

---

<a id="zh"></a>

# jdb_fs : 数据库存储异步直接 I/O

高性能异步文件 I/O 库,支持 Direct I/O,基于 compio 构建。

## 目录

- [特性]#特性
- [安装]#安装
- [使用]#使用
- [API 参考]#api-参考
- [架构]#架构
- [目录结构]#目录结构
- [技术栈]#技术栈
- [历史]#历史

## 特性

- 异步 Direct I/O,绕过操作系统页缓存
- Unix 上通过 BorrowedFd 实现零拷贝 I/O(无 Arc 开销)
- 页对齐读写,运行时对齐检查
- WAL 模式,O_DSYNC 保证持久性
- 跨平台:Linux (io_uring + O_DIRECT)、macOS (kqueue + F_NOCACHE)、Windows (IOCP + NO_BUFFERING)
- 通过 fallocate/F_PREALLOCATE/SetFileInformationByHandle 预分配空间

## 安装

```toml
[dependencies]
jdb_fs = "0.1"
jdb_alloc = "0.1"  # 用于 AlignedBuf
```

## 使用

基本文件操作:

```rust
use jdb_alloc::{AlignedBuf, PAGE_SIZE};
use jdb_fs::File;

async fn example() -> jdb_fs::Result<()> {
  // 创建文件
  let file = File::create("/tmp/test.dat").await?;

  // 写入页对齐数据
  let mut buf = AlignedBuf::zeroed(PAGE_SIZE)?;
  buf[0..5].copy_from_slice(b"hello");
  file.write_at(buf, 0).await?;
  file.sync_data().await?;

  // 读取
  let buf = AlignedBuf::with_cap(PAGE_SIZE)?;
  let buf = file.read_at(buf, 0).await?;
  assert_eq!(&buf[0..5], b"hello");

  Ok(())
}
```

WAL 模式,同步持久化:

```rust
let wal = File::open_wal("/tmp/wal.log").await?;
// 写入返回时数据已落盘 (O_DSYNC)
```

文件系统工具:

```rust
// 目录操作
jdb_fs::mkdir("/tmp/data").await?;
jdb_fs::rename("/tmp/old.dat", "/tmp/new.dat").await?;
jdb_fs::remove("/tmp/unwanted.dat").await?;

// 目录列表(仅文件)
let files = jdb_fs::ls("/tmp/data").await?;

// 文件元数据
let size = jdb_fs::size("/tmp/file.dat").await?;
let exists = jdb_fs::exists("/tmp/file.dat");

// 目录同步保证 WAL 持久性
jdb_fs::sync_dir("/tmp/wal_dir").await?;
```

## API 参考

### File

支持 Direct I/O 的异步文件封装。

| 方法 | 描述 |
|------|------|
| `open(path)` | 只读打开 |
| `create(path)` | 创建新文件(存在则截断) |
| `open_rw(path)` | 读写打开(不存在则创建) |
| `open_wal(path)` | WAL 模式打开,启用 O_DSYNC |
| `read_at(buf, offset)` | 指定偏移读取(页对齐) |
| `write_at(buf, offset)` | 指定偏移写入(页对齐) |
| `size()` | 获取文件大小 |
| `sync_all()` | 同步数据和元数据 |
| `sync_data()` | 仅同步数据 |
| `truncate(len)` | 截断文件到指定长度 |
| `preallocate(len)` | 预分配磁盘空间 |

### Error

| 变体 | 描述 |
|------|------|
| `Io` | 系统 I/O 错误 |
| `Alloc` | 内存分配错误 |
| `Alignment` | 缓冲区/偏移未页对齐 |
| `ShortRead` | 读取字节数不足 |
| `ShortWrite` | 写入字节数不足 |
| `Join` | spawn_blocking 任务失败 |
| `Overflow` | 文件大小超出 i64 |

### 文件系统函数

| 函数 | 描述 |
|------|------|
| `exists(path)` | 检查路径是否存在 |
| `mkdir(path)` | 递归创建目录 |
| `ls(path)` | 列出目录中的文件(不含子目录) |
| `size(path)` | 获取文件大小(无需打开) |
| `rename(from, to)` | 原子重命名 |
| `remove(path)` | 删除文件 |
| `sync_dir(path)` | 同步目录元数据 |

### 常量

- `PAGE_SIZE`:系统页大小(从 jdb_alloc 重导出)

## 架构

```mermaid
graph TD
  A[应用层] --> B[File]
  B --> C{平台}
  C -->|Linux| D[io_uring + O_DIRECT]
  C -->|macOS| E[kqueue + F_NOCACHE]
  C -->|Windows| F[IOCP + NO_BUFFERING]
  D --> G[compio 运行时]
  E --> G
  F --> G
```

`write_at` 调用流程:

1. 检查对齐(offset 和 len 必须 PAGE_SIZE 对齐)
2. 通过 BorrowedFd 借用原始 fd(零拷贝)
3. 向 compio 运行时提交 WriteAt 操作
4. io_uring/kqueue/IOCP 异步完成
5. 将缓冲区所有权返回调用方

## 目录结构

```
jdb_fs/
├── src/
│   ├── lib.rs      # 公开导出
│   ├── file.rs     # File 结构体和异步方法
│   ├── error.rs    # 错误类型 (thiserror)
│   ├── fs.rs       # 文件系统工具
│   └── os/         # 平台特定实现
│       ├── mod.rs
│       ├── linux.rs   # O_DIRECT, fallocate
│       ├── macos.rs   # F_NOCACHE, F_PREALLOCATE
│       └── windows.rs # FILE_FLAG_NO_BUFFERING
├── tests/
│   └── main.rs     # 集成测试
└── Cargo.toml
```

## 技术栈

| 组件 | 技术 |
|------|------|
| 异步运行时 | compio |
| Linux I/O | io_uring |
| macOS I/O | kqueue |
| Windows I/O | IOCP |
| 错误处理 | thiserror |
| 内存对齐 | jdb_alloc |

## 历史

io_uring 由 Jens Axboe(Linux 块 I/O 维护者)在 2019 年 3 月引入 Linux 内核 5.1。在 io_uring 之前,Linux 异步 I/O (AIO) 设置复杂且限制颇多。Axboe 设计 io_uring 时采用内核与用户空间共享环形缓冲区,在高吞吐场景下消除系统调用开销。

Direct I/O (O_DIRECT) 自 Linux 内核 2.4 起就已存在。它绕过页缓存,让数据库直接控制缓存策略,确保 I/O 延迟可预测。MySQL InnoDB、PostgreSQL、RocksDB 等数据库引擎都重度依赖 Direct I/O 以获得稳定性能。

io_uring + Direct I/O 的组合代表了 Linux 数据库存储引擎的最先进技术,在现代 NVMe 驱动器上可达数百万 IOPS。

---

## 关于

本项目为 [js0.site ⋅ 重构互联网计划](https://js0.site) 的开源组件。

我们正在以组件化的方式重新定义互联网的开发范式,欢迎关注:

* [谷歌邮件列表]https://groups.google.com/g/js0-site
* [js0site.bsky.social]https://bsky.app/profile/js0site.bsky.social

---

## 关于

本项目为 [js0.site ⋅ 重构互联网计划](https://js0.site) 的开源组件。

我们正在以组件化的方式重新定义互联网的开发范式,欢迎关注:

* [谷歌邮件列表]https://groups.google.com/g/js0-site
* [js0site.bsky.social]https://bsky.app/profile/js0site.bsky.social