jdb_trait 0.1.2

异步存储引擎数据库抽象层 / Async database abstraction layer for storage engines
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
[English]#en | [中文]#zh

---

<a id="en"></a>

# jdb_trait: Database Abstraction Layer for Async Storage Engines

## Table of Contents

- [Overview]#overview
- [Features]#features
- [Installation]#installation
- [Core Concepts]#core-concepts
- [API Reference]#api-reference
- [Architecture]#architecture
- [Tech Stack]#tech-stack
- [Directory Structure]#directory-structure
- [History]#history

## Overview

jdb_trait defines async trait interfaces for building database storage engines. It provides abstractions for tables, sub-tables (partitions), schemas, queries, and row data with support for key-value separation.

## Features

- Async-first design with `Future`-based APIs
- Sub-table partitioning for horizontal scaling
- Schema versioning with TTL and depth control
- Flexible query expressions with AND/OR/NOT logic
- Key-value separation via `AsyncRow` trait
- Zero-copy string/binary types using `HipStr`/`HipByt`
- Type-safe value representation with `Val` enum

## Installation

```toml
[dependencies]
jdb_trait = "0.1"
```

## Core Concepts

### Engine → Table → SubTable

```
Engine
  └── Table (with Schema)
        └── SubTable (partition by SubTableKey)
              └── Row (Vec<Val>)
```

- `Engine`: Entry point for opening/creating tables
- `Table`: Manages schema and routes operations to sub-tables
- `SubTable`: Partition holding actual row data
- `Row`: Synchronous row data (`Vec<Val>`)
- `AsyncRow`: Async row accessor for key-value separation

### Query Flow

```mermaid
graph TD
  A[Query] --> B{sub_table_filter}
  B -->|match| C[SubTable]
  C --> D{val_filter}
  D -->|match| E[AsyncRow]
  E --> F[Row data]
```

## API Reference

### Types

| Type | Description |
|------|-------------|
| `Id` | Record identifier (`u64`) |
| `Col` | Column name (`HipByt<'static>`) |
| `ColIdx` | Column index (`u16`) |
| `Row` | Synchronous row data (`Vec<Val>`) |
| `SubTableKey` | Partition routing key (`Row`) |

### Val

Atomic database value supporting multiple types:

```rust
pub enum Val {
  Bool(bool),
  I8(i8), I16(i16), I32(i32), I64(i64), I128(i128),
  U8(u8), U16(u16), U32(u32), U64(u64), U128(u128),
  F32(OrderedFloat<f32>), F64(OrderedFloat<f64>),
  Str(HipStr<'static>),
  Bin(HipByt<'static>),
}
```

### Schema

Table schema with versioning:

```rust
pub struct Schema {
  pub name: HipByt<'static>,
  pub ver: SchemaVer,
  pub col_li: Vec<Field>,
  pub sub_table_key_li: Vec<Field>,
  pub index_li: Vec<Index>,
  pub max_depth: Option<usize>,
  pub ttl: Option<Duration>,
}
```

### Query & Expr

Query builder with filter expressions:

```rust
pub struct Query {
  pub sub_table_filter: Option<Expr>,
  pub val_filter: Option<Expr>,
  pub limit: Option<usize>,
  pub offset: Option<usize>,
  pub order: Order,
}
```

Expression operators:

| Op | Description |
|----|-------------|
| `Eq(Val)` | Equality |
| `In(HashSet<Val>)` | Set membership |
| `Range(Val, Val)` | Half-open interval `[start, end)` |
| `RangeInclusive(Val, Val)` | Closed interval `[start, end]` |
| `RangeFrom(Val)` | `[start, +∞)` |
| `RangeTo(Val)` | `(-∞, end)` |
| `RangeToInclusive(Val)` | `(-∞, end]` |

### Traits

#### Engine

```rust
pub trait Engine: Sized + Send + Sync {
  type Error: Debug + Send + Sync;
  type Gen: IdGen;
  type Table: Table;

  fn id_gen(&self) -> &Self::Gen;
  fn open<F, Fut>(&self, name: &[u8], create: F)
    -> impl Future<Output = Result<Self::Table, Self::Error>> + Send;
}
```

#### Table

```rust
pub trait Table: Sized + Send + Sync {
  type Error: Debug + Send + Sync;
  type SubTable: SubTable;
  type AsyncRow: AsyncRow;
  type Stream: Stream<Item = Result<AsyncItem<Self::AsyncRow>, Self::Error>> + Send;

  fn schema(&self) -> impl Future<Output = Schema> + Send;
  fn put(&self, key: &SubTableKey, data: &[Row])
    -> impl Future<Output = Result<Vec<Id>, Self::Error>> + Send;
  fn get(&self, key: &SubTableKey, id: Id)
    -> impl Future<Output = Result<Option<AsyncItem<Self::AsyncRow>>, Self::Error>> + Send;
  fn select(&self, q: &Query) -> impl Future<Output = Self::Stream> + Send;
  fn scan(&self, begin_id: u64, order: Order) -> impl Future<Output = Self::Stream> + Send;
  fn rm(&self, q: &Query) -> impl Future<Output = Result<u64, Self::Error>> + Send;
  // ...
}
```

#### SubTable

```rust
pub trait SubTable: Send + Sync {
  type Error: Debug + Send + Sync;
  type AsyncRow: AsyncRow;
  type Stream: Stream<Item = Result<(Id, Self::AsyncRow), Self::Error>> + Send;

  fn put(&self, data: &[Row])
    -> impl Future<Output = Result<Vec<Id>, Self::Error>> + Send;
  fn get(&self, id: Id)
    -> impl Future<Output = Result<Option<(Id, Self::AsyncRow)>, Self::Error>> + Send;
  fn select(&self, q: &Query) -> impl Future<Output = Self::Stream> + Send;
  fn key(&self) -> &SubTableKey;
  // ...
}
```

#### AsyncRow

```rust
pub trait AsyncRow: Send + Sync + Debug {
  type Error: Debug + Send + Sync;
  fn row(&self) -> impl Future<Output = Result<Row, Self::Error>> + Send;
}
```

## Architecture

```mermaid
graph TD
  subgraph Traits
    Engine --> Table
    Table --> SubTable
    Table --> Schema
    SubTable --> AsyncRow
    AsyncRow --> Row
  end

  subgraph Data
    Row --> Val
    Query --> Expr
    Expr --> Op
  end

  subgraph Types
    Id
    Col
    ColIdx
    SubTableKey
  end
```

### Call Flow

1. `Engine::open()` creates or opens `Table`
2. `Table` routes by `SubTableKey` to `SubTable`
3. `SubTable` executes CRUD operations
4. Query results return `AsyncRow` for lazy loading
5. `AsyncRow::row()` fetches actual `Row` data

## Tech Stack

| Dependency | Purpose |
|------------|---------|
| `futures-core` | Stream trait for async iteration |
| `hipstr` | Zero-copy string/binary types |
| `ordered-float` | Orderable float wrapper |
| `gxhash` | Fast hash for `HashSet<Val>` |

## Directory Structure

```
jdb_trait/
├── src/
│   ├── lib.rs        # Public exports, Engine, IdGen, AsyncItem
│   ├── val.rs        # Val enum with From impls
│   ├── row.rs        # Row type alias, AsyncRow trait
│   ├── expr.rs       # Expr, Op, Order
│   ├── query.rs      # Query struct
│   ├── schema.rs     # Schema, Field, Index
│   ├── sub_table.rs  # SubTable trait
│   └── table.rs      # Table trait
├── readme/
│   ├── en.md
│   └── zh.md
└── Cargo.toml
```

## History

The concept of database abstraction layers traces back to the 1970s when E.F. Codd proposed the relational model. The separation of logical and physical data representation became foundational to modern databases.

Key-value separation, central to `AsyncRow`, emerged from LSM-tree optimizations. WiscKey (2016) demonstrated that separating keys from values in SSTable-based storage significantly improves write amplification and space efficiency for large values.

The async trait pattern in Rust evolved significantly. Before Rust 1.75 (December 2023), async methods in traits required workarounds like `async-trait` crate. Native support for `impl Trait` in trait methods enabled cleaner APIs like those in jdb_trait.

Sub-table partitioning reflects distributed database designs from Google's Bigtable (2006) and Apache HBase, where row key prefixes route data to specific tablets/regions for horizontal scaling.

---

## About

This project is an open-source component of [js0.site ⋅ Refactoring the Internet Plan](https://js0.site).

We are redefining the development paradigm of the Internet in a componentized way. Welcome to follow us:

* [Google Group]https://groups.google.com/g/js0-site
* [js0site.bsky.social]https://bsky.app/profile/js0site.bsky.social

---

<a id="zh"></a>

# jdb_trait: 异步存储引擎数据库抽象层

## 目录

- [概述]#概述
- [特性]#特性
- [安装]#安装
- [核心概念]#核心概念
- [API 参考]#api-参考
- [架构]#架构
- [技术栈]#技术栈
- [目录结构]#目录结构
- [历史]#历史

## 概述

jdb_trait 定义异步 trait 接口,用于构建数据库存储引擎。提供表、子表(分区)、Schema、查询、行数据等抽象,支持键值分离。

## 特性

- 异步优先设计,基于 `Future` 的 API
- 子表分区,支持水平扩展
- Schema 版本控制,支持 TTL 和深度限制
- 灵活的查询表达式,支持 AND/OR/NOT 逻辑
- 通过 `AsyncRow` trait 实现键值分离
- 使用 `HipStr`/`HipByt` 实现零拷贝字符串/二进制
- 类型安全的 `Val` 枚举值表示

## 安装

```toml
[dependencies]
jdb_trait = "0.1"
```

## 核心概念

### Engine → Table → SubTable

```
Engine
  └── Table (含 Schema)
        └── SubTable (按 SubTableKey 分区)
              └── Row (Vec<Val>)
```

- `Engine`: 打开/创建表的入口
- `Table`: 管理 Schema,路由操作到子表
- `SubTable`: 存储实际行数据的分区
- `Row`: 同步行数据 (`Vec<Val>`)
- `AsyncRow`: 异步行访问器,用于键值分离

### 查询流程

```mermaid
graph TD
  A[Query] --> B{sub_table_filter}
  B -->|匹配| C[SubTable]
  C --> D{val_filter}
  D -->|匹配| E[AsyncRow]
  E --> F[Row 数据]
```

## API 参考

### 类型

| 类型 | 说明 |
|------|------|
| `Id` | 记录标识符 (`u64`) |
| `Col` | 列名 (`HipByt<'static>`) |
| `ColIdx` | 列索引 (`u16`) |
| `Row` | 同步行数据 (`Vec<Val>`) |
| `SubTableKey` | 分区路由键 (`Row`) |

### Val

支持多种类型的原子数据库值:

```rust
pub enum Val {
  Bool(bool),
  I8(i8), I16(i16), I32(i32), I64(i64), I128(i128),
  U8(u8), U16(u16), U32(u32), U64(u64), U128(u128),
  F32(OrderedFloat<f32>), F64(OrderedFloat<f64>),
  Str(HipStr<'static>),
  Bin(HipByt<'static>),
}
```

### Schema

带版本控制的表结构:

```rust
pub struct Schema {
  pub name: HipByt<'static>,
  pub ver: SchemaVer,
  pub col_li: Vec<Field>,
  pub sub_table_key_li: Vec<Field>,
  pub index_li: Vec<Index>,
  pub max_depth: Option<usize>,
  pub ttl: Option<Duration>,
}
```

### Query & Expr

查询构建器与过滤表达式:

```rust
pub struct Query {
  pub sub_table_filter: Option<Expr>,
  pub val_filter: Option<Expr>,
  pub limit: Option<usize>,
  pub offset: Option<usize>,
  pub order: Order,
}
```

表达式操作符:

| Op | 说明 |
|----|------|
| `Eq(Val)` | 相等 |
| `In(HashSet<Val>)` | 集合成员 |
| `Range(Val, Val)` | 半开区间 `[start, end)` |
| `RangeInclusive(Val, Val)` | 闭区间 `[start, end]` |
| `RangeFrom(Val)` | `[start, +∞)` |
| `RangeTo(Val)` | `(-∞, end)` |
| `RangeToInclusive(Val)` | `(-∞, end]` |

### Traits

#### Engine

```rust
pub trait Engine: Sized + Send + Sync {
  type Error: Debug + Send + Sync;
  type Gen: IdGen;
  type Table: Table;

  fn id_gen(&self) -> &Self::Gen;
  fn open<F, Fut>(&self, name: &[u8], create: F)
    -> impl Future<Output = Result<Self::Table, Self::Error>> + Send;
}
```

#### Table

```rust
pub trait Table: Sized + Send + Sync {
  type Error: Debug + Send + Sync;
  type SubTable: SubTable;
  type AsyncRow: AsyncRow;
  type Stream: Stream<Item = Result<AsyncItem<Self::AsyncRow>, Self::Error>> + Send;

  fn schema(&self) -> impl Future<Output = Schema> + Send;
  fn put(&self, key: &SubTableKey, data: &[Row])
    -> impl Future<Output = Result<Vec<Id>, Self::Error>> + Send;
  fn get(&self, key: &SubTableKey, id: Id)
    -> impl Future<Output = Result<Option<AsyncItem<Self::AsyncRow>>, Self::Error>> + Send;
  fn select(&self, q: &Query) -> impl Future<Output = Self::Stream> + Send;
  fn scan(&self, begin_id: u64, order: Order) -> impl Future<Output = Self::Stream> + Send;
  fn rm(&self, q: &Query) -> impl Future<Output = Result<u64, Self::Error>> + Send;
  // ...
}
```

#### SubTable

```rust
pub trait SubTable: Send + Sync {
  type Error: Debug + Send + Sync;
  type AsyncRow: AsyncRow;
  type Stream: Stream<Item = Result<(Id, Self::AsyncRow), Self::Error>> + Send;

  fn put(&self, data: &[Row])
    -> impl Future<Output = Result<Vec<Id>, Self::Error>> + Send;
  fn get(&self, id: Id)
    -> impl Future<Output = Result<Option<(Id, Self::AsyncRow)>, Self::Error>> + Send;
  fn select(&self, q: &Query) -> impl Future<Output = Self::Stream> + Send;
  fn key(&self) -> &SubTableKey;
  // ...
}
```

#### AsyncRow

```rust
pub trait AsyncRow: Send + Sync + Debug {
  type Error: Debug + Send + Sync;
  fn row(&self) -> impl Future<Output = Result<Row, Self::Error>> + Send;
}
```

## 架构

```mermaid
graph TD
  subgraph Traits
    Engine --> Table
    Table --> SubTable
    Table --> Schema
    SubTable --> AsyncRow
    AsyncRow --> Row
  end

  subgraph Data
    Row --> Val
    Query --> Expr
    Expr --> Op
  end

  subgraph Types
    Id
    Col
    ColIdx
    SubTableKey
  end
```

### 调用流程

1. `Engine::open()` 创建或打开 `Table`
2. `Table``SubTableKey` 路由到 `SubTable`
3. `SubTable` 执行 CRUD 操作
4. 查询结果返回 `AsyncRow` 实现延迟加载
5. `AsyncRow::row()` 获取实际 `Row` 数据

## 技术栈

| 依赖 | 用途 |
|------|------|
| `futures-core` | 异步迭代的 Stream trait |
| `hipstr` | 零拷贝字符串/二进制类型 |
| `ordered-float` | 可排序浮点数包装 |
| `gxhash` | `HashSet<Val>` 的快速哈希 |

## 目录结构

```
jdb_trait/
├── src/
│   ├── lib.rs        # 公开导出、Engine、IdGen、AsyncItem
│   ├── val.rs        # Val 枚举及 From 实现
│   ├── row.rs        # Row 类型别名、AsyncRow trait
│   ├── expr.rs       # Expr、Op、Order
│   ├── query.rs      # Query 结构体
│   ├── schema.rs     # Schema、Field、Index
│   ├── sub_table.rs  # SubTable trait
│   └── table.rs      # Table trait
├── readme/
│   ├── en.md
│   └── zh.md
└── Cargo.toml
```

## 历史

数据库抽象层概念可追溯至 1970 年代 E.F. Codd 提出的关系模型。逻辑与物理数据表示的分离成为现代数据库的基石。

键值分离是 `AsyncRow` 的核心思想,源于 LSM-tree 优化。WiscKey(2016)证明在基于 SSTable 的存储中分离键值,能显著改善大值场景下的写放大和空间效率。

Rust 的 async trait 模式经历重大演进。在 Rust 1.75(2023 年 12 月)之前,trait 中的异步方法需要 `async-trait` crate 等变通方案。原生支持 trait 方法中的 `impl Trait` 后,jdb_trait 这类更简洁的 API 成为可能。

子表分区反映了 Google Bigtable(2006)和 Apache HBase 等分布式数据库设计,通过行键前缀将数据路由到特定 tablet/region 实现水平扩展。

---

## 关于

本项目为 [js0.site ⋅ 重构互联网计划](https://js0.site) 的开源组件。

我们正在以组件化的方式重新定义互联网的开发范式,欢迎关注:

* [谷歌邮件列表]https://groups.google.com/g/js0-site
* [js0site.bsky.social]https://bsky.app/profile/js0site.bsky.social