1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
//! redb 2.x → 4.x corpus migration that PRESERVES data (no re-embedding).
//!
//! Why: the redb 2.6 → 4.x upgrade (#702 / #707) changed the on-disk file
//! format. redb 4.x cannot open a 2.x `index.redb` — the open returns
//! `DatabaseError::UpgradeRequired(_)`. The auto-recovery path
//! ([`crate::core::corpus_recovery`]) handles that today by moving the stale
//! file aside to `*.v2-incompatible` and creating a fresh EMPTY corpus, which
//! forces a full reindex. On a large corpus that reindex is expensive precisely
//! because it RE-EMBEDS every chunk (an ONNX forward pass per chunk). The
//! chunk text, entity lists, knowledge-graph adjacency, file hashes and schema
//! version are all already in the old file — only the *container format*
//! changed, not the row payloads. This module copies every row out of the 2.x
//! file and into a new 4.x file verbatim, so an upgrade preserves the index and
//! skips re-embedding entirely.
//!
//! What: [`migrate_redb_corpus`] opens the source with the redb **2.6** engine
//! (aliased `redb2` in `Cargo.toml`), iterates every known table, and writes
//! each row into a staging redb **4.x** database. It preserves the stored
//! `_meta` `schema_version` byte-for-byte so the normal in-app migration chain
//! (M001…M00x) still runs afterwards against the correct starting version. The
//! original file is backed up (numbered, non-clobbering, via the existing
//! [`crate::core::corpus_recovery`] backup convention) before the verified
//! staging file is atomically renamed into place. The source is never destroyed
//! until the new file is fully written and row-count-verified.
//!
//! Test: `tests` builds a small redb 2.6 fixture with chunks / entities / KG /
//! `_meta` rows, migrates it, and asserts the resulting 4.x corpus opens via
//! [`crate::core::corpus::CorpusStore`] and contains the same rows with the
//! same schema version. A `#[ignore]`-gated test points at a real
//! `*.v2-incompatible` file on the developer's machine.
use ;
use ;
use crate;
// ── Outcome ─────────────────────────────────────────────────────────────────
/// Result of a migration attempt.
///
/// Why: the caller (CLI handler) needs to distinguish "nothing to do, already
/// 4.x" from "migrated N rows" to print the right operator message and choose
/// an exit status, without re-opening the file itself.
/// What: `AlreadyV4` when the source already opens with redb 4.x (no-op);
/// `Migrated` carrying per-table row counts, the backup path, and the total
/// rows copied.
/// Test: the round-trip test asserts `Migrated` with the expected row total;
/// the idempotency test asserts a second run returns `AlreadyV4`.
// ── Public entry point ──────────────────────────────────────────────────────
/// Migrate a redb 2.x corpus at `dest` (or its `*.v2-incompatible` backup)
/// into a redb 4.x corpus at `dest`, preserving every row.
///
/// Why: see module docs — lets an upgrade preserve the index instead of
/// recreating it empty and re-embedding every chunk.
/// What: resolves the actual 2.x source (the live path, else the
/// `*.v2-incompatible` sibling the auto-recovery may have moved aside). If the
/// destination already opens with redb 4.x, returns [`MigrationOutcome::AlreadyV4`]
/// (idempotent no-op). Otherwise opens the source read-only with redb 2.6,
/// copies all catalogued tables into a staging 4.x file, verifies per-table row
/// counts match, backs up the original via the
/// [`crate::core::corpus_recovery`] numbered-backup convention, and atomically
/// renames the staging file into `dest`.
/// Test: `tests::round_trip_v2_to_v4` and `tests::idempotent_on_v4`.
// ── Detection / source resolution ───────────────────────────────────────────
/// Report whether the file at `path` opens cleanly with the redb 4.x engine.
///
/// Why: the detection step must not treat a healthy 4.x corpus as a migration
/// candidate. A successful 4.x open is the definitive "already migrated" signal.
/// What: attempts `redb::Database::open(path)` and returns whether it succeeded.
/// Any open error (including `UpgradeRequired` for a 2.x file) returns `false`.
/// Test: `tests::idempotent_on_v4` relies on this returning `true` for a 4.x DB.
/// Resolve the actual 2.x source file for a destination path.
///
/// Why: the auto-recovery path may already have moved the stale 2.x file aside
/// to `index.redb.v2-incompatible` and created a fresh empty 4.x file at
/// `index.redb`. In that case the rows to preserve live in the backup sibling,
/// not at the canonical path. We must find whichever file actually holds the
/// 2.x data.
/// What: prefers `dest` if it exists and is genuinely a 2.x file (opens with
/// redb2 but not redb4); otherwise falls back to the first existing
/// `*.v2-incompatible[.N]` sibling that is a 2.x file. Errors if neither
/// candidate is a readable 2.x corpus.
/// Test: `tests::round_trip_v2_to_v4` (source == dest) and
/// `tests::round_trip_from_incompatible_sibling` (source == sibling).
/// Report whether `path` is an old (pre-4.x) redb corpus the redb 2.x engine
/// should read.
///
/// Why: this is the load-bearing classifier, and it must NOT probe with
/// `redb2::Database::open` directly — redb 2.6 panics with an internal
/// `unreachable!()` when it tries to parse a redb 4.x file's region layout, so
/// using it as a probe on an arbitrary file would crash the process. Instead we
/// classify with the redb **4.x** engine, which returns clean `DatabaseError`s:
/// a 4.x file opens, and an older 2.x file fails with `UpgradeRequired` (or a
/// related incompatible-format error). Only a positive "incompatible old
/// format" classification means redb2 can safely read it.
/// What: opens `path` with redb 4.x. Returns `true` only when the open fails
/// with an incompatible/old-format error (reusing
/// [`crate::core::corpus_recovery::is_incompatible_corpus_format`]). A
/// successful 4.x open (already migrated), a missing file, or any transient
/// error returns `false`.
/// Test: covered by `resolve_source` round-trip tests and `idempotent_on_v4`
/// (which must NOT classify a 4.x file as 2.x and must not panic).
/// Enumerate the candidate `*.v2-incompatible[.N]` sibling paths for `dest`.
///
/// Why: the auto-recovery backup convention appends `.v2-incompatible` and then
/// `.1`, `.2`, … on repeated failures (see `corpus_recovery`). Source
/// resolution must consider all of them.
/// What: yields `<dest>.v2-incompatible` followed by `<dest>.v2-incompatible.1`
/// … up to a small bound (`64`), which far exceeds any realistic number of
/// failed boots.
/// Test: covered indirectly by `resolve_source`'s sibling round-trip test.
// ── Backup / staging helpers ────────────────────────────────────────────────
/// Suffix for the staging file the new 4.x corpus is built in before the
/// atomic rename into place.
///
/// Why: building the new corpus at a sibling temp path and renaming atomically
/// guarantees the canonical `index.redb` is only ever replaced by a fully
/// written, row-verified file — a crash mid-migration leaves the original
/// untouched and the half-written staging file is discarded on the next run.
/// What: the literal `".v4-migrating"` appended to the destination path.
/// Test: covered by `migrate_redb_corpus`'s round-trip test (the staging file
/// is renamed away on success and must not linger).
const STAGING_SUFFIX: &str = ".v4-migrating";
/// Compute the staging file path for a destination corpus.
///
/// Why: a single deterministic staging path keeps the migration's temp file
/// next to the destination (same filesystem → atomic rename) and easy to find
/// if a run aborts.
/// What: appends [`STAGING_SUFFIX`] to `dest`.
/// Test: covered by the round-trip test (the staging file must not survive a
/// successful run).
/// Ensure the original 2.x bytes are preserved, freeing `dest` for the rename.
///
/// Why: we must never lose the source data, and `std::fs::rename(staging,
/// dest)` requires `dest` to be replaceable. Two cases. (a) `source == dest`
/// (the live path is the 2.x file): move it aside to a numbered
/// `*.v2-incompatible` backup so `dest` is freed. (b) `source` is already a
/// `*.v2-incompatible` sibling (auto-recovery already moved it): the sibling IS
/// the backup; just remove the empty 4.x file the recovery created at `dest` so
/// the rename can land.
/// What: returns the path where the original bytes now live.
/// Test: `tests::round_trip_v2_to_v4` (case 1) and
/// `tests::round_trip_from_incompatible_sibling` (case 2).
/// Remove a file if it exists; a missing file is not an error.
///
/// Why: staging-file cleanup and the empty-recovery-file removal both want
/// idempotent "delete if present" semantics so re-runs are safe.
/// What: deletes `path`, swallowing `NotFound`, surfacing other I/O errors.
/// Test: exercised by the idempotency and round-trip tests.