graphitesql 0.0.4

A pure, safe, no_std Rust re-implementation of SQLite, compatible with the SQLite 3 file format.
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
# graphitesql roadmap

This document is the plan for building **graphitesql**: a single-crate, pure,
safe, `no_std` Rust implementation of SQLite with byte-for-byte compatibility
with the SQLite 3 file format.

It is meant to be read top-to-bottom once, then used as a checklist. Each phase
lists its **deliverable**, its **done criterion**, and the **upstream SQLite
files** that serve as the specification for that phase (fetch them with
`reference/fetch.sh`).

---

## 1. Architecture

SQLite has a famously clean layered design. We mirror it, because the layering is
what makes the file format and the SQL semantics tractable to re-implement
independently. Data flows top-to-bottom on writes and bottom-to-top on reads:

```
            ┌──────────────────────────────────────────────┐
  SQL text  │  api          Connection / Statement / Row    │  public API
            ├──────────────────────────────────────────────┤
            │  sql::token   tokenizer                        │
            │  sql::parser  parser  ──►  sql::ast            │  front end
            ├──────────────────────────────────────────────┤
            │  planner      query planning (join/index)      │
            │  codegen      AST  ──►  VDBE bytecode          │  compiler
            ├──────────────────────────────────────────────┤
            │  vdbe         register virtual machine         │  execution
            │  func collate built-in functions, collations   │
            ├──────────────────────────────────────────────┤
            │  btree        table & index B-trees, cursors   │  data model
            ├──────────────────────────────────────────────┤
            │  pager        page cache, transactions,        │  storage
            │               rollback journal, WAL, locking   │
            ├──────────────────────────────────────────────┤
            │  format       on-disk byte layout (the spec)   │  format
            ├──────────────────────────────────────────────┤
            │  vfs          Vfs / File traits (mem, std, …)  │  OS boundary
            └──────────────────────────────────────────────┘
```

### Module ↔ upstream map

graphitesql is one crate; these are its modules and the SQLite files that
specify each. Reading the C file is the fastest way to get a phase right.

| graphitesql module | responsibility | upstream reference |
|--------------------|----------------|--------------------|
| `vfs`              | OS abstraction: open/read/write/sync/lock | `os_unix.c`, `os_win.c`, `os.c` |
| `format`           | byte layout of header, pages, cells, records, freelist | `fileformat2.html`, `btree.c` (comments), `btreeInt.h` |
| `pager`            | page cache, atomic commit, journal, WAL, locking | `pager.c`, `wal.c`, `pcache.c`, `pcache1.c` |
| `btree`            | table/index B-trees, cursors, balancing | `btree.c`, `btreeInt.h` |
| `value` / record   | storage classes, serial types, affinity | `vdbemem.c`, `vdbeaux.c` |
| `sql::token`       | tokenizer | `tokenize.c`, `keywordhash.h` |
| `sql::parser`/`ast`| grammar → parse tree | `parse.y`, `expr.c`, `resolve.c` |
| `codegen`          | parse tree → bytecode | `build.c`, `insert.c`, `update.c`, `delete.c`, `select.c` |
| `planner`          | join order, index selection, query flattening | `where.c`, `wherecode.c`, `whereexpr.c`, `analyze.c` |
| `vdbe`             | the register VM and its opcodes | `vdbe.c`, `vdbeaux.c`, `vdbeapi.c`, `opcodes.h` |
| `func` / `collate` | scalar/aggregate funcs, collations | `func.c`, `date.c`, `callback.c` |
| `schema`           | parse `sqlite_schema`, build the catalog | `build.c`, `prepare.c` |
| `api`              | `Connection`/`Statement` and (later) C-API shim | `main.c`, `vdbeapi.c`, `legacy.c` |

### Why a VDBE (and not a tree-walker)

The VDBE bytecode is **not** stored in the database file, so file compatibility
does not require it. We adopt it anyway because:

1. SQLite's observable semantics (evaluation order, type coercion, `NULL`
   handling, the exact rows a query returns) are defined operationally by what
   the VDBE does. Matching the model is the surest path to matching behavior.
2. It cleanly separates "what to compute" (codegen/planner) from "how to run it"
   (vdbe), which keeps each piece independently testable.
3. `EXPLAIN` output becomes directly comparable to SQLite's for differential
   testing.

We will not be bug-for-bug identical in bytecode, but we will be result-identical.

---

## 2. Design principles

- **`#![forbid(unsafe_code)]`, no exceptions.** Enforced in `Cargo.toml` lints.
  If a hot path seems to need `unsafe`, redesign or accept the safe cost.
- **`no_std` + `alloc` is the baseline.** `std` is an additive feature (real
  files, `std::error::Error`). Nothing core may depend on `std`.
- **Zero dependencies.** No crates in the default build. Optional dev/test-only
  dependencies (e.g. for property testing) are acceptable behind `cfg(test)`.
- **The VFS is the only I/O boundary.** All file access goes through the `Vfs`
  and `File` traits. This is what makes `:memory:`, std files, and wasm uniform.
- **Compatibility is verified, not assumed.** Every format phase ends with a
  differential test against the real `sqlite3` library/CLI (see §4).
- **Fail loud while young.** Unimplemented paths return `Error::Unsupported`
  rather than silently producing wrong results.
- **Document with the spec.** Each format module carries the relevant
  file-format table in its doc comment, so the code and the spec live together.

---

## 3. Phases

Phases are ordered so that each builds only on completed lower layers, and so
that something useful and testable exists early (read-only access to real
databases lands well before write support).

### Phase 0 — Foundation ✅ *(done)*

- **Deliverable:** crate scaffold; varints; value/serial-type model; database
  header parse/serialize; error type; CI-ready `cargo test`/`clippy`.
- **Done:** `no_std` builds; header round-trips a real `sqlite3` file
  byte-for-byte; all unit tests green.
- **Files:** `src/util/varint.rs`, `src/value.rs`, `src/format/header.rs`,
  `src/error.rs`.

### Phase 1 — VFS & raw paging *(read side)**(done)*

- **Deliverable:** `Vfs`/`File` traits; in-memory VFS; std-file VFS (feature
  `std`); a read-only pager that maps page numbers → byte slices and validates
  page size against the header.
- **Done:** opens the committed real-SQLite fixtures (`tests/fixtures/*.db`)
  through the std VFS + pager, re-derives the header, and reads every page;
  page 1's 100-byte header offset is handled via `Page::body_offset`.
- **Files:** `src/vfs/{mod,memory,std_file}.rs`, `src/pager/mod.rs`,
  `tests/read_fixtures.rs`.
- **Reference:** `os_unix.c` (the VFS contract), `pager.c` (page lifecycle).

### Phase 2 — B-tree reader ✅ *(done)*

- **Deliverable:** parse table & index, interior & leaf b-tree pages; cell
  parsing including overflow-page chains; a forward/seek cursor over a table
  b-tree keyed by rowid, and over an index b-tree keyed by record.
- **Done:** table scan of `nums` (2000 rows, interior pages) sums correctly;
  20 KB blob row reassembles across overflow pages; rowid `seek` (exact / past-
  end / before-first) works; index scan yields every entry. All verified
  against real fixtures. *(Index seek-by-key awaits record comparison in
  Phase 3.)*
- **Files:** `src/btree/{mod,page,cursor}.rs`; tests in `tests/read_fixtures.rs`.
- **Reference:** `btree.c`, `btreeInt.h`; `fileformat2.html` (B-tree pages,
  cell format, overflow).

### Phase 3 — Record decoding & the schema catalog ✅ *(done)*

- **Deliverable:** decode records (header of serial types + bodies) into
  `Value`s; read and parse `sqlite_schema` (table 1) into an in-memory catalog
  of tables/indexes with their root pages and SQL.
- **Done:** full record decode incl. all integer widths (sign-extended),
  REAL, TEXT (UTF-8/16), BLOB; `Schema::read` enumerates `sqlite_schema`,
  resolves a table name → root page; end-to-end test decodes every row of
  `basic.db` to typed values (incl. INTEGER-PRIMARY-KEY → NULL aliasing).
- **Files:** `src/format/record.rs`, `src/schema/mod.rs`.
- **Reference:** `fileformat2.html` (record format, schema table), `build.c`.

### Phase 4 — SQL front end (tokenizer + parser + AST) ✅ *(done)*

- **Deliverable:** tokenizer matching SQLite's lexical rules and keyword set; a
  recursive-descent parser producing an AST for the core grammar: `SELECT`
  (with `WHERE`/`GROUP BY`/`HAVING`/`ORDER BY`/`LIMIT`/joins), `INSERT`,
  `UPDATE`, `DELETE`, `CREATE TABLE/INDEX`, `DROP`, transactions, and `PRAGMA`.
- **Done:** tokenizer covers strings/blobs/quoted-idents/comments/params/numbers
  (UTF-8 safe); Pratt expression parser with SQLite precedence incl.
  `IS [NOT] NULL`, `IN`, `BETWEEN`, `LIKE`/`GLOB`, `CASE`, `CAST`; reserved-word
  checks reject `SELECT FROM`; ~20 parser/tokenizer tests. *(Subqueries & CTEs
  deferred to Phase 9.)*
- **Files:** `src/sql/{mod,token,ast,parser}.rs`.
- **Reference:** `tokenize.c`, `parse.y`, `expr.c`. (We hand-write the parser
  rather than port the Lemon grammar, but `parse.y` is the source of truth for
  precedence and accepted forms.)

### Phase 5 — Read-query execution engine ✅ *(done)*

- **Deliverable:** a `Connection` that parses, resolves names against the schema,
  scans b-trees, decodes records, and evaluates expressions to produce result
  rows; `NULL` three-valued logic, SQLite comparison order & numeric coercion,
  `LIKE`/`GLOB`, `CASE`/`CAST`, a core of scalar functions, and the aggregates
  `count`/`sum`/`avg`/`min`/`max`/`total`/`group_concat` with `GROUP BY`/`HAVING`,
  plus `WHERE`, `ORDER BY` (by position/alias/expr), `LIMIT`/`OFFSET`, `DISTINCT`.
- **Done:** `Connection::open(path).query(sql)` returns correct rows on the real
  fixtures, verified differentially against `sqlite3` (e.g. `GROUP BY id%3`
  counts 666/667/667). INTEGER-PRIMARY-KEY rowid aliasing handled.
- **Files:** `src/exec/{mod,eval,func}.rs`, `src/util/float.rs`,
  `tests/query.rs`.
- **Implementation note — executor vs. bytecode:** the roadmap originally
  specified a VDBE bytecode VM. graphitesql instead ships an **operational,
  iterator-style executor** with the *same observable semantics*. This was the
  pragmatic path to a correct, testable read engine; adopting a VDBE bytecode IR
  is now an internal refactor (it changes how queries are represented, not their
  results) and is tracked for a later pass. *(Single-table only; joins/subqueries
  in Phase 9.)*
- **Reference:** `vdbe.c`, `select.c`, `where.c`, `func.c`.

### Phase 6 — B-tree writer + pager transactions *(write side)**(core done)*

- **Deliverable:** insert cells with page splitting; the rollback journal with
  correct fsync ordering and crash-safe atomic commit; create-from-scratch.
- **Done:** `WritePager` buffers all mutations in an overlay and commits
  atomically through a hash-verified rollback journal (originals → sync → write
  → sync → clear); journal **recovery on open** replays an interrupted commit;
  `ROLLBACK` = drop overlay. The b-tree writer inserts rows (with overflow-page
  allocation) and splits leaves/interiors bottom-up, growing a new root in place.
  **Cross-engine gate met:** a database built by graphitesql (incl. 200- and
  1500-row tables) is opened by the real `sqlite3` CLI and
  `PRAGMA integrity_check` returns `ok`; values round-trip through SQLite's
  decoder (`tests/write_compat.rs`).
- **Files:** `src/pager/write.rs`, `src/btree/writer.rs`, `src/format/record.rs`
  (`encode_record`), `tests/write_compat.rs`.
- **Carried to Phase 7/9:** `DELETE`/`UPDATE` at the b-tree level (need freelist
  page reclamation to keep `integrity_check` happy), freelist management, page
  merging/rebalancing on delete, and the SQLite-format journal (ours is a
  private, recoverable format). Locking is single-connection (no-op).
- **Reference:** `btree.c` (balance), `pager.c` (journal, commit),
  `fileformat2.html` (freelist, rollback journal format).

### Phase 7 — DDL & DML through the `Connection` API ✅ *(core done)*

- **Deliverable:** a writable `Connection` (`execute`/`query`, file & `:memory:`)
  running `CREATE TABLE`, `INSERT`, `UPDATE`, `DELETE`, and `BEGIN`/`COMMIT`/
  `ROLLBACK`; schema mutation writes `sqlite_schema` and bumps the schema cookie;
  `INTEGER PRIMARY KEY` rowid aliasing; `DEFAULT` values; autocommit + explicit
  transactions with read-your-writes.
- **Done:** in-memory and on-disk CREATE/INSERT/UPDATE/DELETE round-trip;
  data persists across reopen; a database built entirely through graphitesql SQL
  (`CREATE TABLE` + 300 parameterized `INSERT`s) is opened by real `sqlite3` with
  `PRAGMA integrity_check = ok` (`tests/dml.rs`).
- **Files:** `src/exec/mod.rs` (`Connection::{open,open_readonly,create,
  open_memory,execute,execute_params}`), `src/btree/writer.rs` (`delete_table`),
  `src/btree/cursor.rs` (`TableCursor::last`).
- **Carried to Phase 9:** `CREATE INDEX` + index maintenance on write, `ALTER`,
  `DROP`, `WITHOUT ROWID`, `UNIQUE`/`CHECK`/`NOT NULL` enforcement,
  `INSERT … SELECT`, and freelist-backed reclamation for overflow-row delete.
- **Reference:** `build.c`, `insert.c`, `update.c`, `delete.c`, `alter.c`.

### Phase 8 — WAL mode ✅ *(read + write done)*

- **Deliverable:** parse and overlay the `-wal` file so WAL-mode databases read
  correctly, with full frame validation.
- **Done:** `WalReader` parses the WAL header and frames, validates the
  **exact SQLite checksum** (Fibonacci-style, endianness from the magic's LSB)
  and salts, honors frames up to the last valid commit, and overlays them on the
  main file as a `PageSource`. `Connection::open_readonly` auto-detects `-wal`.
  Verified by reading a real WAL-mode database whose table and rows exist
  **only** in an uncheckpointed `-wal` produced by SQLite (`tests/wal.rs`,
  `tests/fixtures/wal.db{,-wal}`).
- **Write side (done in Phase 9):** `PRAGMA journal_mode=WAL` switches modes;
  commits append frames (exact SQLite checksums/salts) to `-wal` via
  `WritePager`'s WAL runtime; `PRAGMA wal_checkpoint` writes frames back to the
  main file; reopening loads an uncheckpointed `-wal`. Real `sqlite3` reads
  graphitesql-written WAL databases with `integrity_check = ok`
  (`tests/wal_write.rs`). The single-writer model means the `-shm` wal-index and
  the multi-process WAL locking protocol are not needed (SQLite rebuilds `-shm`
  from the `-wal`).
- **Files:** `src/pager/wal.rs` (reader), `src/pager/write.rs` (writer);
  `Connection` backend enum in `src/exec/mod.rs`.
- **Reference:** `wal.c`, `fileformat2.html` (the WAL format).

### Phase 9 — Compatibility hardening & breadth 🔄 *(ongoing track)*

This is the open-ended breadth phase — it grows continuously toward full SQLite
coverage rather than having a single "done".

- **Landed so far:**
  - multi-table **`INNER`/`LEFT`/cross/comma joins** (nested-loop, with `ON`,
    aliases, aggregation over joins) — `tests/joins.rs`;
  - **freelist reclamation**`free_page`/`allocate_page` reuse pages;
    overflow-row `DELETE` returns pages to the freelist (`tests/write_compat.rs`);
  - **`CREATE INDEX` + index maintenance** — index b-trees (record keys, splits,
    overflow); incremental insert on `INSERT`, in-place rebuild on
    `DELETE`/`UPDATE`; verified by sqlite3 `integrity_check``tests/indexes.rs`;
  - **index-driven query planning** — single-table `WHERE col = const` (and
    composite leftmost prefixes) use an index prefix-seek + rowid fetch instead
    of a full scan; the differential suite runs with indexes present so the path
    is exercised across the whole corpus (issue #4);
  - **`DROP TABLE`/`DROP INDEX`** (frees b-trees, removes schema rows, drops a
    table's dependent indexes);
  - **`ALTER TABLE`** `ADD COLUMN` (with `DEFAULT` applied to existing rows on
    read), `RENAME TO` (repointing dependent indexes), and `RENAME COLUMN`
    (updating dependent index definitions) — `tests/alter.rs`;
  - a **differential test harness** (`tests/differential.rs`) that runs 1500+
    generated `SELECT`s through both graphitesql and the real `sqlite3` and
    asserts identical output — currently 1640/1640 match;
  - **`CREATE VIEW` / `DROP VIEW`** and querying views (the view's `SELECT` runs
    as the source) — `tests/views.rs`;
  - **constraint enforcement**`NOT NULL`, `CHECK` (column + table), and
    `UNIQUE`/`PRIMARY KEY` on `INSERT`/`UPDATE`, with `INSERT OR IGNORE`/
    `OR REPLACE` (and `REPLACE INTO`);
  - **subqueries** — scalar `(SELECT …)` and `expr [NOT] IN (SELECT …)`
    (uncorrelated) — `tests/subquery.rs`;
  - non-recursive **`WITH` / CTEs** used as a query source — `tests/cte.rs`;
  - **compound queries** `UNION` / `UNION ALL` / `INTERSECT` / `EXCEPT` with an
    overall `ORDER BY`/`LIMIT`;
  - **type affinity** (comparison + storage) and `substr()` window semantics;
  - the full **`CREATE TABLE` constraint grammar** parses (`CHECK`,
    `REFERENCES`, `FOREIGN KEY`, named `CONSTRAINT`, conflict clauses) so
    real-world schemas load — `tests/realworld.rs`;
  - `VACUUM` accepted; more scalar functions (`concat`, `sign`, `zeroblob`,
    `quote`, `unhex`, …);
  - an AST→SQL printer (`sql::print`) used to regenerate stored `CREATE` text;
  - queryable **`PRAGMA`s** (`table_info`, `page_size`, …) and the `graphitesql`
    **CLI shell**.
  - **date/time functions** (`date`, `time`, `datetime`, `julianday`,
    `unixepoch`, `strftime`) and **`printf`/`format`** — a dependency-free port
    of `date.c`'s Julian-day engine, verified differentially (`tests/datetime.rs`);
  - **`EXPLAIN QUERY PLAN`** (SCAN/SEARCH plan rows in SQLite's detail format)
    plus a **rowid equality fast-path** that seeks the table b-tree directly
    (`tests/explain.rs`);
  - **recursive `WITH RECURSIVE` CTEs** (anchor + fixed-point recursive term,
    `UNION`/`UNION ALL`), CTEs referencing earlier CTEs, and CTEs as a join
    source, via a materialized CTE environment (`tests/cte.rs`);
  - **correlated subqueries** (scalar, `IN (SELECT …)`) and `[NOT] EXISTS` — the
    subquery resolves outer columns through an outer-scope frame stack
    (`tests/subquery.rs`);
  - **window functions**`row_number`/`rank`/`dense_rank`/`ntile`,
    `lag`/`lead`/`first_value`/`last_value`/`nth_value`, and aggregate windows
    over `PARTITION BY`/`ORDER BY` with the default frame (`tests/window.rs`);
  - **`%!.15g` real formatting** matching SQLite's double-to-text exactly;
  - **foreign-key enforcement** behind `PRAGMA foreign_keys` — child-side
    existence checks plus parent-side referential actions (NO ACTION/RESTRICT,
    CASCADE, SET NULL, SET DEFAULT) on DELETE/UPDATE (`tests/foreign_keys.rs`);
  - **row triggers**`CREATE TRIGGER`/`DROP TRIGGER` for
    `BEFORE`/`AFTER` `INSERT`/`UPDATE`/`DELETE`, with `OLD`/`NEW` references and a
    `WHEN` guard, fired non-recursively (`tests/triggers.rs`);
  - **views & CTEs as join sources** — either side of a `JOIN` (`tests/views.rs`);
  - **derived tables**`FROM (SELECT …) [AS] alias` as a sole source or join
    operand (`tests/subquery.rs`);
  - **explicit window frames**`ROWS`/`RANGE`/`GROUPS BETWEEN … AND …` with the
    full bound set (`tests/window.rs`);
  - **`PRAGMA recursive_triggers`** — recursive trigger firing (bounded), off by
    default (`tests/triggers.rs`);
  - **automatic indexes for `UNIQUE`/non-rowid `PRIMARY KEY`** — the implicit
    `sqlite_autoindex_*` b-trees are now written and maintained, so SQLite
    accepts the files (`tests/indexes.rs`);
  - **`WITHOUT ROWID` tables** — PK-clustered index-b-tree storage (records
    held PK-first), CRUD + composite PKs + `INSERT OR REPLACE`; files pass real
    `sqlite3 integrity_check` (`tests/without_rowid.rs`). *(Subset: the PRIMARY
    KEY is the only key; additional `UNIQUE` constraints on a WITHOUT ROWID
    table aren't supported yet, but explicit secondary `CREATE INDEX` is.)*;
  - **`INSTEAD OF` triggers / writable views** — DML on a view fires its
    `INSTEAD OF` triggers with `OLD`/`NEW` (`tests/triggers.rs`);
  - **secondary indexes on `WITHOUT ROWID` tables**`CREATE INDEX` builds a
    b-tree keyed by (indexed cols, PK cols), maintained across DML
    (`tests/without_rowid.rs`);
  - **WAL write path**`PRAGMA journal_mode=WAL`, frame-appending commits,
    `wal_checkpoint`, and reopen-from-`-wal` (`tests/wal_write.rs`; see Phase 8);
  - **real `VACUUM` compaction** — rebuilds into a fresh, gap-free image,
    shrinking the file and preserving content (`tests/vacuum.rs`); b-tree cursors
    now also tolerate empty pages left by heavy deletes;
  - **`UNIQUE` constraints on `WITHOUT ROWID` tables** — the implied
    `sqlite_autoindex` (unique cols + PK cols, numbered by declaration position)
    is written, maintained, and enforced (`tests/without_rowid.rs`).
- **Deliverable (remaining):** eager b-tree page *merging* on delete — a pure
  space-reclamation optimization. `VACUUM` already reclaims the space, cursors
  traverse the empty pages correctly meanwhile, and SQLite itself only merges
  opportunistically, so a non-merging delete is valid SQLite-compatible behavior.
  This is the last open item, and it affects neither correctness nor file-format
  compatibility.
  (we enforce by scan, not via an index b-tree yet); plain `EXPLAIN` (VDBE
  bytecode); full type-affinity & collation edge cases; WAL *write* path; b-tree
  page merging.
- **Goal:** pass a curated slice of SQLite's own test assertions and a large
  differential corpus.
- **Reference:** `fkey.c`, `trigger.c`, `window.c`, `date.c`, `pragma.c`,
  `vacuum.c`, `where.c`, plus SQLite's `test/` TCL suite as an oracle.

### Phase 10 — Ecosystem & extensions *(post-1.0, behind features)*

- C-API shim crate (`libsqlite3`-compatible surface), virtual tables, FTS5,
  R-Tree, `sqlite3_session`, user-defined functions from Rust, an async VFS for
  wasm. Each is opt-in and out of the core compatibility promise.

---

## 4. File-format compatibility strategy

This is the project's whole reason to exist, so it gets first-class testing.

- **Golden-file tests.** Check small, deterministic databases produced by a
  pinned `sqlite3` into the test suite (as byte arrays or fixtures) and assert
  graphitesql parses them exactly. Phase 0 already does this for the header.
- **Round-trip tests.** Parse → re-serialize → assert identical bytes for every
  format structure (header done; pages, cells, records, journal, WAL to come).
- **Differential tests.** For each phase from 5 on, run the same SQL through
  both `sqlite3` (the C library) and graphitesql and diff the results, and diff
  the resulting database files where determinism allows.
- **`integrity_check` as a gate.** Any database graphitesql writes must pass
  `sqlite3`'s `PRAGMA integrity_check` and `PRAGMA foreign_key_check`.
- **Fuzzing.** Fuzz the readers with malformed pages (must return
  `Error::Corrupt`, never panic) and fuzz SQL parsing.
- **Crash-recovery tests.** A VFS that can inject failures/truncation at chosen
  fsync points, asserting recovery to a consistent state.

### Known sources of legitimate file divergence

Two SQLite-compatible writers can produce different bytes for the same logical
content. We document and accept these rather than chase them:

- free-page reuse order and exact balancing splits,
- `change_counter` / `version_valid_for` values,
- the embedded `SQLITE_VERSION_NUMBER` of the writer,
- unused/reserved bytes left from deletions.

Compatibility means *both engines can read each other's files and agree on
contents*, not that bytes are identical for independently-built databases.

---

## 5. Open decisions

These are deliberately unresolved; we'll settle them with input rather than by
default. Tracked here so they don't get lost.

1. ~~**License.**~~ *Resolved:* **public domain**, mirroring SQLite, with the
   SQLite blessing in place of a legal notice (SPDX `blessing`). See `LICENSE`.
2. **Parser: hand-written vs. generated.** Plan is a hand-written
   recursive-descent parser (no build-time codegen, friendlier errors). The
   alternative is porting the Lemon grammar. → *Leaning hand-written.*
3. **Concurrency model.** Single-connection first. Whether to support multiple
   connections / threads sharing a pager (and how locking maps onto the `Vfs`
   trait) is a Phase 6+ question. → *Deferred.*
4. **Big integers / decimal.** SQLite stores reals as f64; we match that. No
   extended numeric types planned.
5. **MSRV policy.** Pinned at 1.95 today; revisit before 1.0.

---

## 6. Immediate next steps (Phase 1 kickoff)

1. Define `Vfs` and `File` traits in `src/vfs/mod.rs` (open flags, read-at,
   write-at, truncate, sync, file size, lock/unlock).
2. Implement `src/vfs/memory.rs` (a `Vec<u8>`-backed file) — works on wasm and
   `no_std`.
3. Implement `src/vfs/std_file.rs` behind `feature = "std"`.
4. Implement a read-only `Pager` that, given a `File`, validates the header and
   hands out page byte-slices, accounting for page 1's 100-byte header offset.
5. Add the first end-to-end test: open `gtest.db` through the std VFS + pager and
   reconstruct the header — the same `DatabaseHeader` Phase 0 parses directly.