rivet-cli 0.9.3

Rivet: PostgreSQL/MySQL/SQL Server → Parquet/CSV (local, S3, GCS, Azure). Crate name rivet-cli; binary rivet.
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
# Rivet Consolidated Pain-Driven Roadmap

This document is the **single source of truth** for Rivet planning. It consolidates:

- the current product direction and validated user pains
- the **numbered** strategic epics (§5, P0–P3)
- **execution status**: lettered epics (A–O, M), phase completion, task ✅/⏳, and Definition of Done (§9)
- packaging, security, trust, release, and external-adoption work that used to live in a separate roadmap

The purpose is to keep Rivet focused on recurring problems in database-first extraction while tracking what is shipped vs open.

---

# 0. Why Rivet exists

Most data teams that extract from PostgreSQL or MySQL end up writing a script. It starts simple — a `COPY` or `SELECT *` piped to a file — and then it grows:

- someone adds retry logic after a production incident
- someone adds batching because the table hit 50M rows
- someone adds a cursor column so the script does not re-read everything
- someone adds a memory guard after OOM on a wide table
- someone adds schema tracking after a column rename broke the warehouse

Within a year the script is 2,000 lines, understood by one person, and quietly critical to the business.

**Rivet replaces that script.**

It is a single binary that extracts data from relational databases into Parquet or CSV files — locally, to S3, or to GCS — with the safety mechanisms that production workloads eventually demand: preflight checks, tuning profiles, retry with error classification, incremental cursors, chunked parallelism, validation, reconciliation, and a crash-recovery model.

Rivet is deliberately **not** a CDC tool, not an ELT platform, and not a SaaS connector marketplace. It solves one problem well: **getting data out of a fragile database safely, predictably, and repeatedly.**

---

# 1. Product focus

Rivet should remain focused on:

**a source-aware, self-hosted extraction engine for fragile database-first systems**

Rivet is for teams that need to extract data from operational databases into analytics-friendly files/storage **safely, predictably, and with clear operational visibility**.

## Rivet is not currently optimizing to become:
- a Kafka replacement
- a CDC platform first
- an all-in-one ELT platform
- a SaaS connector marketplace
- a universal warehouse merge/load orchestrator
- a distributed execution platform

---

# 2. Validated pain themes

## Pain A — Fragile source databases
Repeated user pain:
- bad extraction queries can hurt production or replicas
- missing indexes make extraction dangerous
- parallelism can overload weak sources
- DB connection limits are often lower than requested parallelism
- billion-row tables require more careful strategies than generic JDBC partitioning

## Pain B — Weak incremental foundations
Repeated user pain:
- clean delta keys are often missing
- teams re-read large windows to avoid data loss
- late-arriving data forces replay/overlap strategies
- date-based slicing is often needed, but tools assume timestamp/unix-style partitioning
- sparse keys and weak partition columns make chunking unreliable or expensive

## Pain C — Poor visibility into change
Repeated user pain:
- schema drift is noticed too late
- widening text/string/json payloads break downstream assumptions without schema changes
- uniqueness/cardinality assumptions change and break downstream merge logic
- users want stronger visibility into what changed and when

## Pain D — Operational unpredictability
Repeated user pain:
- users do not know if extraction is safe until they run it
- users do not know if source-side migration/index changes are required
- users do not know what happened after interrupted runs
- users do not know what was exported, retried, or partially published

## Pain E — Self-built pipelines become hard to maintain
Repeated user pain:
- custom scripts grow into fragile systems
- behavior is understood by only one or two people
- guarantees are unclear
- retries, memory pressure, and late-data logic become messy over time

## Pain F — Connectivity and deployment friction
Repeated user pain:
- SSH / jump host / bastion access is common
- driver/URL handling is brittle and frustrating
- local SQLite state is not enough for stateless containerized deployments
- install/tryout friction matters for adoption

## Pain G — Event-driven alternatives exist, but are not the default reality
Observed market boundary:
- Kafka/CDC ecosystems solve a different class of problems
- many teams still live in a database-first world
- snapshot, replay, and backfill remain critical even where streaming exists

---

# 3. Roadmap principles

1. **Trust first, expansion second**  
   Before adding big new surface area, Rivet should become more trustworthy and explainable.

2. **Safety before throughput**  
   Parallelism and performance matter, but not before source safety and predictable behavior.

3. **One problem layer at a time**  
   Extraction trust layer first, then extraction intelligence, then adoption polish, then expansion.

4. **Solve recurring pains, not every theoretical use case**  
   New work should map back to validated user pain.

---

# 4. Priority tiers

Priority legend:
- **P0** = immediate / next milestone
- **P1** = strong next layer after P0
- **P2** = adoption/polish after core trust is solid
- **P3** = later expansion after the product is trusted

---

# 5. Consolidated roadmap

## P0 — Trust, Safety, and Explainability

These are the most validated and strategically important problems.

---

## Epic 1 — Preflight Planner & Source Safety
**Priority:** P0  
**Status:** ✅ DONE — lettered epic B; `rivet check` with strategy output, verdicts, sparse/dense warnings, connection limit warnings, profile recommendations  
**Pain coverage:** Pain A, Pain B, Pain D

### Goal
Turn Rivet into a tool that helps users understand whether extraction is safe before they run it.

### Deliverables
- stronger `rivet check`
- selected extraction strategy output
- tuning profile recommendation
- sparse range warnings
- connection pressure / parallelism warnings
- migration/index-required guidance
- degraded vs unsafe distinction
- initial recommended parallelism output

### Why this matters
Users repeatedly describe not wanting to find out too late that:
- a table needs an index
- a query will seq scan
- a chosen parallelism level is unrealistic
- source limits make the chosen settings invalid

---

## Epic 2 — Auditability, Manifest & Reconciliation
**Priority:** P0  
**Status:** ✅ DONE — lettered epics D, F; run summary, file manifest, `rivet metrics`, `rivet state files`, `--reconcile`  
**Pain coverage:** Pain C, Pain D, Pain E

### Goal
Make every export inspectable, attributable, and trustworthy.

### Deliverables
- run summary
- file manifest
- per-file row counts
- rows read / written / validated accounting
- reconciliation summary
- optional source-vs-output count checks for bounded modes

### Why this matters
After extract-to-bucket works, users immediately ask:
- what exactly was exported?
- which files belong to which run?
- how many rows were written?
- can I trust this result?

---

## Epic 3 — Recovery & Interrupted Run Semantics
**Priority:** P0  
**Status:** ✅ DONE — lettered epics C, H; crash matrix, lifecycle docs, E2E recovery paths, `tests/recovery.rs` (10 tests across 5 failure boundaries)  
**Pain coverage:** Pain D, Pain E

### Goal
Make restarts, partial runs, retries, and reruns predictable.

### Deliverables
- crash matrix
- chunk/file/run status model
- rerun semantics documentation
- failure injection tests
- recovery integration tests
- duplicate semantics documentation

### Why this matters
Without explicit recovery semantics, any interrupted run becomes:
- a debugging session
- a manual cleanup problem
- a trust issue

---

## Epic 4 — Durable State Backend
**Priority:** P0  
**Status:** ✅ DONE — `StateConn` enum supports SQLite and PostgreSQL backends; activated via `RIVET_STATE_URL=postgresql://...`; auto-migration on connect; `StateRef` for parallel chunk workers; documented in `docs/reference/cli.md`; `docker-compose.yaml` includes dedicated `postgres-state` service  
**Pain coverage:** Pain D, Pain F

### Goal
Support both local/dev workflows and durable production/container deployments.

### Deliverables
- state backend abstraction
- SQLite backend for local/dev
- PostgreSQL backend for durable/prod/k8s use
- init/migration command
- docs for deployment modes
- clear guidance: SQLite for local/dev, external backend for durable deployment

### Why this matters
SQLite is good for local and single-node workflows, but not enough for stateless workers in durable environments.

---

## P1 — Better Extraction Control and Change Visibility

These strengthen Rivet's core differentiation once the trust layer is solid.

---

## Epic 5 — Real Batch / Fetch / Write Control
**Priority:** P1  
**Status:** ✅ DONE — lettered epic M; `batch_size`, `batch_size_memory_mb`, `max_file_size`, `throttle_ms`, streaming uploads, per-export tuning overrides  
**Pain coverage:** Pain A, Pain B, Pain E

### Goal
Give users real control over how extraction interacts with:
- source load
- memory
- file generation
- runtime behavior

### Deliverables
- separate controls for:
  - logical chunk size
  - DB fetch size
  - writer flush size
  - parquet row group size
  - file rotation threshold
- docs explaining each
- planner hints for dangerous combinations

### Why this matters
Users repeatedly complain that existing tooling exposes “batch size” in unclear or misleading ways.

---

## Epic 6 — Date / Timestamp / Range Partition Intelligence
**Priority:** P1  
**Status:** ✅ DONE — `chunked` mode with `chunk_by_days`, date-native `>= / <` semantics, sparse range warnings, planner awareness  
**Pain coverage:** Pain A, Pain B, Pain D

### Goal
Improve support for large-table extraction by making partitioning more explicit and more realistic.

### Deliverables
- first-class support for date-based slicing
- timestamp-based slicing
- numeric range slicing
- explicit boundary semantics
- planner awareness of date-vs-timestamp tradeoffs
- docs/examples for large-table partition strategies

### Why this matters
Users repeatedly hit partitioning pain on 100M+ tables and often need date-based rather than unix-timestamp semantics.

---

## Epic 7 — Schema Drift Visibility & Policy
**Priority:** P1  
**Status:** ✅ DONE — column add/remove/type-change tracking in run summary + `on_schema_drift: warn|continue|fail` policy hook in YAML config  
**Pain coverage:** Pain C, Pain E

### Goal
Make structural source changes visible early and operationally understandable.

### Deliverables
- stronger schema drift warnings
- run summary schema flags
- documented behavior for add/remove/type-change
- future policy hooks: warn / fail / continue

### Why this matters
Users often discover schema drift only after downstream damage is already done.

---

## Epic 8 — Data Shape Drift Detection
**Priority:** P1  
**Status:** ✅ DONE — per-column max byte length tracked across runs; `shape_drift_warn_factor` YAML config; warns when growth exceeds threshold  
**Pain coverage:** Pain C

### Goal
Detect meaningful data-shape changes even when schema itself has not changed.

### Deliverables
- optional observed max-length tracking for text/string/json columns
- compare current run vs previous runs
- width-growth warnings
- docs for structural schema drift vs shape drift

### Why this matters
Widening text/string payloads are a real downstream failure source and are not captured by structural schema checks alone.

---

## Epic 9 — Data Contract Checks
**Priority:** P1  
**Status:** ✅ DONE — `quality:` YAML block; row count bounds, null ratio thresholds, uniqueness assertions; chunked aggregate quality gate  
**Pain coverage:** Pain C, Pain E

### Goal
Expose optional checks for downstream assumptions such as uniqueness and cardinality.

### Deliverables
- optional uniqueness probe for declared keys
- duplicate warnings
- docs for cost and usage
- foundation for future row-count bounds / anomaly checks

### Why this matters
A lot of downstream breakage comes not from extraction failure, but from drifting source assumptions.

---

## P2 — UX, Packaging, and Adoption Improvements

These matter for adoption and polish, but should not outrank trust/safety work.

---

## Epic 10 — Config & Connection UX Improvements
**Priority:** P2  
**Pain coverage:** Pain F

### Goal
Reduce friction around configuration and connectivity.

### Deliverables
- query vs table auto-detection
- better structured connection config
- explicit driver override
- clearer connection validation and errors
- better docs/examples for connection parameters

### Why this matters
These are real annoyances, but they are secondary to safety and trust.

---

## Epic 11 — Installation & Packaging
**Priority:** P2  
**Status:** ✅ DONE for current distribution baseline — GitHub release binaries, Docker GHCR image, Homebrew tap, crates.io package  
**Pain coverage:** Pain F

### Goal
Make trying and adopting Rivet easier.

### Deliverables
- ✅ GitHub release binaries
- ✅ Homebrew tap support
- ✅ Docker image
- ✅ crates.io package (`rivet-cli`)
- ✅ improved local quickstart and pilot docs

### Why this matters
Install friction is real, but only worth optimizing after product direction is clear.

---

## Epic 12 — Deployment Modes Guidance
**Priority:** P2  
**Status:** ✅ Partial — pilot guide, production checklist, Docker/Compose docs, and state backend guidance exist; k8s/Helm remains future  
**Pain coverage:** Pain F

### Goal
Help users choose the right deployment model.

### Deliverables
- docs for local/dev vs durable/prod modes
- Docker/Compose examples
- ⏳ k8s/Helm guidance
- state backend recommendations

### Why this matters
Many reliability problems come from running the tool in a mode that does not match the environment.

---

## P3 — Later Expansion

These are valuable, but should come only after the trust layer is mature and pilot feedback confirms demand.

---

## Epic 13 — SSH / Jump Host Access
**Priority:** P3  
**Pain coverage:** Pain F

### Goal
Support or document constrained connectivity environments.

### Deliverables
- documented external tunnel workflow first
- possible later native SSH tunnel support

### Why this matters
Real pain, but infrastructure/security complexity is non-trivial.

---

## Epic 14 — Narrow Warehouse Load Layer
**Priority:** P3 → **P1 (in progress)**  
**Status:** ✅ Partial — type system, type report, strict mode, BigQuery compat layer, complex types (M1–M6 complete); **v0.7.8 Type Roundtrip Proof shipped** (Phase 1 of the type/verify/UX track — 4 independent reader validators + native Parquet UUID/JSON logical types + type-fidelity benchmark); load path (Parquet → BQ) = future  
**Pain coverage:** Pain C, Pain E

### Goal
Add a narrow, compatibility-aware path from extracted files into selected warehouse targets.

### Deliverables
- ✅ canonical type system (`src/types/`: `RivetType`, `TypeMapping`, `TypeFidelity`)
- ✅ target schema adapters (`ExportTarget::BigQuery`, Arrow→BQ mapping with NUMERIC/BIGNUMERIC/REPEATED/etc.)
- ✅ compatibility report (`rivet check --type-report --target bigquery`)
- ✅ strict/permissive mapping policy (`TypePolicy`, `--strict` flag)
- ✅ column type overrides (`columns:` YAML block)
- ✅ complex types: Enum, Interval, List (Postgres); Enum/SET, TIME, arrays (MySQL)
- ✅ **type roundtrip proof (v0.7.8)** — PG/MySQL → Parquet/CSV matrix tested through 4 independent readers (DuckDB, ClickHouse, pyarrow, BigQuery), 31 live tests in `tests/type_roundtrip/`, `make test-types-validators`. Documented in [`docs/type-mapping.md`](docs/type-mapping.md), [ADR-0014](docs/adr/0014-target-type-materialization.md), and per-tool benchmark reports under [`docs/bench/reports/REPORT_types*.md`](docs/bench/reports/)
- ✅ **native Parquet logical types (v0.7.8)** — UUID → `FixedSizeBinary(16) + LogicalType::Uuid` via `arrow.uuid` extension; JSON → `LogicalType::Json` via `arrow.json` extension. Downstream engines (DuckDB, ClickHouse 25.x+, pyarrow) recognise UUID/JSON natively without a cast; BigQuery autoload promotes UUID→BYTES exact (one documented gap: BQ does not lift `LogicalType::Json` to native `JSON` without an explicit `--schema=attrs:JSON`)
- ✅ **MySQL driver fidelity fixes (v0.7.8)** — PG arrays preserve NULL elements; MySQL `ENUM`/`SET` flag detection so `rivet.logical_type=enum` survives; `native_type` distinguishes UNSIGNED, `TINYINT(1)`, `BIT(1)`, CHAR vs VARCHAR, BINARY vs VARBINARY
- ⏳ direct load path (write Parquet files directly into BigQuery) = future

### Why this matters
Useful, but this opens a new product layer and should follow extraction trust.

---

## Epic 15 — WAL/Binlog CDC to Files
**Priority:** P3  
**Pain coverage:** Pain B, Pain G

### Goal
Extend Rivet from snapshot extraction into logical change extraction after the trust layer is mature.

### Deliverables
- logical change extraction
- insert/update/delete event model
- durable LSN/binlog position state
- changelog file format
- snapshot + changelog workflow

### Why this matters
A natural evolution, but only after auditability, recovery, and data quality are solid.

---

## Epic 16 — Automatic Parallelism
**Priority:** P3  
**Pain coverage:** Pain A, Pain D

### Goal
Move from warning-based guidance to controlled automatic parallelism selection.

### Deliverables
- recommended parallelism first
- later auto-selected parallelism bounded by safety rules

### Why this matters
Parallelism is valuable, but automation should come after better planner guidance and safety semantics.

## Epic 17 — MySQL Source Parity (COM_STMT_FETCH)
**Priority:** P1
**Pain coverage:** Pain A, Pain B

### Goal

Close the source-pressure gap between MySQL and PostgreSQL.

PostgreSQL issues a single `DECLARE cursor FOR SELECT …` and then repeats `FETCH N` — every round-trip returns a small batch, and the longest single SQL statement on a 2M-row wide table is **0.19s**.

MySQL uses chunked `SELECT … WHERE pk BETWEEN start AND end` queries, one per chunk.  The longest single statement on the same fixture is **~9s** — still 15–23× better than sling/dlt defaults, but far from PostgreSQL parity and potentially above a DBA's `max_execution_time`.

### Root cause

MySQL exposes server-side cursors via the binary protocol: `COM_STMT_EXECUTE` with flag `CURSOR_TYPE_READ_ONLY = 0x01`, followed by `COM_STMT_FETCH N`.  The `mysql` Rust crate v28 does not surface this flag in its public API — `exec_iter()` opens a fresh query per call with no persistent cursor.

### Deliverables (two-phase)

**Phase A — Adaptive time-feedback chunking** (1–2 weeks, no protocol changes)

- After each chunk, measure wall time.  If last chunk exceeded a configurable `chunk_max_statement_s` (default: 2s), halve `effective_bs` for the next chunk.
- Cap: do not shrink below `chunk_min_rows` (default: 1000) to avoid pathological overhead.
- Add `RIVET_MYSQL_CHUNK_TIME_FEEDBACK=1` env flag to enable the adaptive path explicitly during benchmarking.
- Expected result: longest single query on 2M-row table drops from ~9s to ~1–2s without protocol changes.
- Update benchmark tables in README to reflect new numbers once measured.

**Phase B — COM_STMT_FETCH server-side cursor** (4–8 weeks, protocol-level)

- Investigate `mysql` crate internals: determine whether `CURSOR_TYPE_READ_ONLY` can be set via an existing low-level API or requires a fork/PR.
- If crate supports it: implement a single-cursor extraction path for MySQL that mirrors the PostgreSQL `DECLARE … FETCH N` loop.  Path: prepare statement → execute with `CURSOR_TYPE_READ_ONLY` → `FETCH chunk_size` in a loop → `COM_STMT_CLOSE`.
- If crate does not support it: open upstream PR to `blackbeard/mysql_rust` exposing `CursorType` on `StatementParams`; track until merged; vendor if needed for near-term delivery.
- Known MySQL cursor limitations to test: `ORDER BY` in cursor context, temp-table creation on large result sets, `max_execution_time` interaction.
- Expected result: longest single query drops to sub-second on wide tables, matching PostgreSQL story.

### Definition of done

- [ ] Phase A: adaptive chunk feedback implemented and unit-tested; `chunk_max_statement_s` config field documented.
- [ ] Phase A: new benchmark numbers measured and README benchmark tables updated.
- [ ] Phase B: `mysql` crate cursor path investigated; upstream PR opened or workaround documented.
- [ ] Phase B: integration test covering the `FETCH N` loop against a live MySQL fixture.
- [ ] README note "MySQL parity roadmap" replaced with concrete timing numbers.

### Why this matters

The current README is honest about the 9s MySQL number, but it creates an asymmetric product story: "sub-second on Postgres, 9s on MySQL."  Many teams run MySQL exclusively.  Closing this gap removes the biggest technical objection for MySQL-first users and makes the benchmark headline symmetric.

---

# 5.1 Packaging, Trust, and External Adoption Track

This track replaces the former standalone `rivet_packaging_trust_roadmap.md`.
Its purpose is not to add extraction surface area; it makes the existing product
credible to a team evaluating Rivet without help from the author.

## Product trust message

Rivet should be positioned as:

> Safe, resumable, observable database extraction under failure and resource constraints.

Not as:

> PostgreSQL/MySQL to Parquet/CSV exporter.

The stronger message is backed by explicit docs and code paths: plan/apply,
preflight diagnostics, tuning profiles, state, journal, manifest, reconciliation,
recovery semantics, quality gates, resource controls, and reliability coverage.

## Completed trust hygiene

| Item | Status | Evidence |
|---|---|---|
| Version consistency discipline | ✅ Active | release workflow, `rivet --version`, release tags, changelog/release docs |
| Security policy | ✅ Done | `SECURITY.md` documents access, artifacts, credentials, TLS, supply chain, reporting |
| Sensitive artifact hygiene | ✅ Done | README/security docs warn about `.rivet_state.db`, plans, journals, configs, outputs |
| README top positioning | ✅ Done | README leads with source-safe extraction, scope, non-goals, core promise |
| Fit / non-fit boundaries | ✅ Done | README and semantics explicitly exclude CDC, SaaS marketplace, k8s platform, loading |
| Execution semantics contract | ✅ Done | `docs/semantics.md` covers retry, crash, resume, repair, reconcile, non-guarantees |
| Reliability matrix | ✅ Done | `docs/reliability-matrix.md` separates PR CI, nightly, and manual coverage |
| Compatibility matrix | ✅ Done | `docs/reference/compatibility.md` covers PG 12-16 and MySQL 5.7/8.0 |
| Pilot kit | ✅ Done | `docs/pilot/README.md`, quickstarts, demo, walkthrough, production checklist |
| Benchmark methodology | ✅ Done | `docs/bench/` and best-practices methodology docs capture DB-side signals |

## Remaining trust work

| Work | Priority | Status | Definition of done |
|---|---|---|---|
| Release checksums (`SHA256SUMS`) | P1 | ✅ Done | `release.yml` publishes `SHA256SUMS.txt` per release; verification documented in README + SECURITY.md |
| Signed releases / attestations | P2 | ⏳ Open | Release artifacts can be cryptographically verified |
| SBOM | P2 | ⏳ Open | Release includes SBOM artifact and security docs mention it |
| 24h+ soak tests | P2 | ⏳ Open | Long-horizon extraction run documented in reliability matrix |
| Real-cloud destination release smoke | P2 | ⏳ Open | Manual release checklist records real S3/GCS smoke before publish |
| k8s/Helm deployment guidance | P3 | ⏳ Open | Docs explain supported pattern and non-goals without implying an operator |

## Demo/adoption priorities

The current GIFs and pilot docs cover the minimum adoption story:

1. `docs/gifs/basic.gif` — init, doctor, check, run, state.
2. `docs/gifs/plan-apply.gif` — sealed plan/apply with credential redaction.
3. `docs/gifs/reconcile-repair.gif` — chunked reconcile and targeted repair.
4. `docs/pilot/demo-quickstart.md` — scripted 14-table pilot fixture.

Next demos should focus on proof under pressure rather than happy-path export:

1. **Wide-table memory protection** — show RSS before/after batch caps and `work_mem`-aware FETCH sizing.
2. **Source-pressure run** — show export under concurrent OLTP writes with DB-side signals.
3. **Release verification** — once checksums/signing ship, document a clean install verification flow.

---

# 5.2 Production Feedback, v0.5 Stabilization, and Performance Track

This track consolidates the former `rivet_reddit_feedback_roadmap.md`,
`rivet_v0_5_stabilization_roadmap.md`, and `rivet_performance_roadmap.md`.
It keeps the production-viability feedback visible without fragmenting planning
across multiple roadmap files.

## Production feedback status

| Feedback item | Status | Current answer |
|---|---|---|
| pgBouncer / pooler safety | ✅ Done | Postgres transaction-pooler detection, `SET LOCAL` scoped inside guarded transactions, pgBouncer live tests, pool-safety coverage |
| MySQL pooler / multiplexer detection | ✅ Done | `MysqlProxyKind` (Direct, ProxySql, MaxScale, Multiplexed); 4-signal classifier (`PROXYSQL INTERNAL SESSION`, `@@version_comment`, `@@proxy_version`, `CONNECTION_ID()` drift) with 13 unit tests + ProxySQL live tests (`docker compose --profile pool up -d proxysql`) |
| MySQL live-test symmetry with PG | ✅ Done | 5 paired suites (`live_mysql_crash_recovery`, `live_mysql_chunked_recovery`, `live_mysql_resume`, `live_mysql_schema_drift`, `live_mysql_retry_and_faults`, `live_mysql_reconcile_repair`) — 29 MySQL tests mirror the PG matrix 1:1 for crash points / chunked recovery / resume / schema drift / retry+toxiproxy / reconcile+repair |
| Session-state cleanup | ✅ Done | Postgres RAII transaction guard; MySQL session variables reset on all paths; seed FK checks cleaned up |
| OLTP load tests | ✅ Done / ongoing | `live_oltp_load` plus `live_content_load` exercise concurrent writes, checkpoint pressure, MVCC snapshot behavior |
| Adaptive runtime feedback | ✅ Partial | `tuning.adaptive` reacts to Postgres `checkpoints_req` and MySQL `Innodb_log_waits`; still no `pg_stat_io` / tablespace-level controller |
| MCP operational visibility | ✅ Done | `rivet-mcp --stdio` (separate binary) exposes read-only Postgres, MySQL, and pgBouncer diagnostics |
| README/product boundaries | ✅ Done | README states no CDC, no SaaS marketplace, no k8s platform, no loading/transformation |

Open edge: feedback-loop guarantees are intentionally bounded. Rivet is still a
batch SELECT extractor with adaptive guardrails, not CDC, Arrow Flight SQL, or a
DB-resource governor.

## v0.5.x stabilization status

| Area | Status | Notes |
|---|---|---|
| Batch memory policies | ✅ Done | `max_batch_memory_mb` with `warn`, `fail`, `auto_shrink`; tests and best-practices docs |
| Row group tuning | ✅ Done | `parquet.row_group_strategy`, target/max row group settings, docs and golden tests |
| Compression profiles | ✅ Done | `none`, `fast`, `balanced`, `compact`; docs explain CPU/size trade-offs |
| Quality memory control | ✅ Partial | typed hashing and `unique_max_entries`; future: approximate distinct / default caps |
| Resume semantics | ✅ Done | stricter `--resume`, recovery docs, chunk checkpoint tests |
| Resource estimates | ✅ Partial | `rivet plan` shows advisory estimates; future: optional sampling (`plan --sample N`) |
| Benchmark evidence | ✅ Partial | `docs/bench/` cross-tool harness and DB-side signals; future: release-gated perf budget |

## Performance/resource-control status

| Area | Status | Current implementation / next step |
|---|---|---|
| Benchmark matrix | ✅ Partial | `docs/bench/harness/` captures wall/RSS/CPU/output and DB-side signals; keep manual before release |
| Adaptive batch by row width | ✅ Done | `batch_size_memory_mb`, first-batch observation, Postgres `work_mem` cap, MySQL row buffer cap |
| Hard Arrow batch cap | ✅ Done | actual Arrow buffer accounting via `get_array_memory_size()` |
| Quality unique hot path | ✅ Done | typed xxHash3-64 hashing; cap support remains opt-in |
| Compression presets | ✅ Done | profile-to-codec mapping shipped |
| Parquet row group tuning | ✅ Done | auto/fixed rows/fixed memory modes |
| PostgreSQL state optimization | ✅ Done | chunk task/state paths use structured stores and transactional writes |
| Direct multipart upload | ⏳ Future | optional complexity; current S3/GCS paths use streaming writes |

Near-term resource-control priorities:

1. Add optional `rivet plan --sample N` to replace purely heuristic memory estimates when users can afford a small source probe.
2. Add a warning when `quality.unique_columns` is configured without `unique_max_entries`.
3. Promote the DB-signal benchmark harness into the release checklist so source-friendliness regressions are visible before publishing.
4. Keep `pg_stat_io` / tablespace / IO-concurrency adaptation as future DBA-grade work, not a v0.x guarantee.

---

# 6. Suggested execution order

## Phase 1 — Make Rivet trustworthy
1. Epic 1 — Preflight Planner & Source Safety
2. Epic 2 — Auditability, Manifest & Reconciliation
3. Epic 3 — Recovery & Interrupted Run Semantics
4. Epic 4 — Durable State Backend

## Phase 2 — Make Rivet more source-aware
5. Epic 5 — Real Batch / Fetch / Write Control
6. Epic 6 — Date / Timestamp / Range Partition Intelligence
7. Epic 7 — Schema Drift Visibility & Policy
8. Epic 8 — Data Shape Drift Detection
9. Epic 9 — Data Contract Checks

## Phase 3 — Improve adoption and usability
10. Epic 10 — Config & Connection UX Improvements
11. Epic 11 — Installation & Packaging
12. Epic 12 — Deployment Modes Guidance

## Phase 3.5 — Source-engine parity
17. Epic 17 — MySQL Source Parity (COM_STMT_FETCH)

## Phase 4 — Expand carefully
13. Epic 13 — SSH / Jump Host Access
14. Epic 14 — Narrow Warehouse Load Layer
15. Epic 15 — WAL/Binlog CDC to Files
16. Epic 16 — Automatic Parallelism

---

# 7. Strongest near-term niche

The best near-term niche remains:

**small-to-mid data teams extracting from PostgreSQL/MySQL and similar database-first operational sources into analytics-friendly files/storage, where current custom pipelines are painful and heavy ingestion platforms are not a good fit.**

---

# 8. Final summary

The strongest and most defensible direction for Rivet remains:

**safe, predictable, source-aware extraction from fragile database-first systems**

The roadmap should continue to optimize for:
- trust
- safety
- explainability
- auditability
- recovery
- drift visibility
- controllable performance

before broadening into:
- warehouse loading
- CDC
- broader platform ambitions

---

# 9. Execution status (lettered epics & phases)

This section merges the former `rivet_roadmap_v3.md` task tracker. **Strategic priorities** remain the numbered epics in §5; **delivery tracking** uses the lettered epics below.

## 9.1 Crosswalk: numbered (§5) ↔ lettered (§9)

| §5 epic (pain roadmap) | Lettered execution epics | Notes |
|------------------------|--------------------------|--------|
| Epic 1 — Preflight | **B** | Strategy output, profile recommendation, sparse warnings |
| Epic 2 — Auditability | **D**, **F** | Run summary, manifest, metrics, `--reconcile` |
| Epic 3 — Recovery | **C**, **H** | Lifecycle semantics, crash matrix, E2E recovery |
| Epic 4 — Durable state backend | `src/state/mod.rs` | `StateConn::Postgres` via `RIVET_STATE_URL`; auto-migration; `StateRef` for workers ✅ |
| Epic 5 — Batch/fetch/write control | **M**, tuning | `batch_size`, `max_file_size`, streaming; not full split of row-group vs fetch |
| Epic 6 — Partition intelligence | **B**, modes | `chunked`, `time_window`, dense surrogate guidance; `chunk_by_days` date-native partitioning ✅ |
| Epic 7 — Schema drift | **D** (tracking) | `on_schema_drift: warn\|continue\|fail` ✅ |
| Epic 8 — Shape drift | **N2** (Phase 3 table) | `export_shape` + `shape_drift_warn_factor` ✅ |
| Epic 9 — Data contracts | `quality:` YAML | Row bounds, null ratio, uniqueness; chunked aggregate rules |
| Epic 10 — Config UX | **E**, **J**, misplaced-field validation | Docs + errors |
| Epic 11 — Packaging | **L** | Release workflow, binaries, Docker GHCR, Homebrew, crates.io ✅ |
| Epic 12 — Deployment | docs, `docker-compose`, state backend | pilot/production checklist + durable state ✅; k8s/Helm = future |
| Epic 13 — SSH | **O** | External tunnel docs first; native SSH = future |
| Epic 14 — Warehouse load | `src/types/`, M1–M6 | Type system + type report + BQ compat ✅; direct load path = future |
| Epic 15 — CDC | **N** | WAL/binlog = future |
| Epic 16 — Auto-parallel | *(none)* | Auto-parallelism = future |
| Epic 17 — MySQL parity | *(none yet)* | Phase A: adaptive chunk timing; Phase B: COM_STMT_FETCH |

**Auth and connectivity (lettered A)** underpin all runs and map across Epics 1–2 and §7 (niche).

---

## 9.2 Current state (0.6.0)

Rivet core is **feature-complete for stable extraction with type safety, resource controls, and external trust documentation**. All Wave 1–3 stabilisation epics are shipped; Epic 14 type safety layer (M1–M6) complete; observability layer includes persistent RunJournal; packaging/trust docs now cover security, semantics, reliability, compatibility, pilots, and benchmarks. 0.6.0 adds the `table:` config shortcut, MySQL chunking/memory parity with Postgres, `work_mem`-aware `FETCH` capping, ProxySQL / MaxScale detection, the first published cross-tool benchmark harness (defaults + steelman), and ships the MCP server as a separate `rivet-mcp` binary.

### Extraction

- PostgreSQL and MySQL streaming extraction
- Parquet and CSV (zstd default; snappy, gzip, lz4, none)
- Local, S3, GCS, **stdout** (streaming uploads)
- Modes: `full`, `incremental`, `chunked`, `time_window`
- `max_file_size`, `--param` / `${VAR}`, `skip_empty`, shell completions
- `batch_size_memory_mb`, **jemalloc** (default feature), misplaced `tuning` validation
- `rivet check`, `rivet doctor`, SQLite state + migrations, metrics, file manifest, chunk checkpoints
- Schema tracking, typed retries (SQLSTATE / MySQL codes), `--validate`, `--reconcile`
- Tuning profiles, Slack, data **quality** checks, meta columns
- Date-native chunking (`chunk_by_days`), connection limit warnings, `rivet init`
- **Plan/Apply workflow** — sealed execution artifacts (`rivet plan` / `rivet apply`, ADR-0005)
- **Parallel exports** — `--parallel-exports` (threads) and `--parallel-export-processes` (OS processes) with live cards UI

### Type safety (M1–M6, Epic 14)

- Canonical type system: `RivetType`, `TypeMapping`, `TypeFidelity`, `ColumnOverrides`
- `TypePolicy` with Fail/Warn/Allow per fidelity level; `--strict` CLI gate
- `rivet check --type-report [--json] [--target bigquery]`
- BigQuery compatibility layer: NUMERIC, BIGNUMERIC, TIMESTAMP, DATETIME, REPEATED, overflow warnings
- Per-column overrides: `columns: { col: decimal(p,s) }` inline string syntax; `rivet init` auto-generates from `information_schema` precision/scale
- Complex types: Enum, Interval→Utf8 ISO 8601 (Postgres), List/Array (Postgres), MySQL TIME/TIME2, MySQL ENUM/SET
- Unsupported-column errors now report **all** unmappable columns in one message (not just the first)
- **Golden E2E trust proof** — [`tests/live_type_golden.rs`](tests/live_type_golden.rs): Postgres *and* MySQL; DB → `rivet run` → Parquet → Arrow read-back; decimal sums, timestamp `tz=None` vs `UTC`, binary, UUID/VARCHAR text, INTERVAL ISO 8601 values. Dedicated `test-type-golden` CI gate (Postgres + MySQL only, faster than full `e2e` job).

### Observability

- `RunJournal` domain model: 13 `RunEvent` variants (plan snapshot, files, retries, chunks, quality issues, schema changes, outcome)
- **Persistent** — serialized to `run_journal` SQLite table (migration v7) at end of every run; `store_journal` / `load_journal` / `recent_journals` APIs
- **`rivet journal --config <file> --export <name>`** — inspect last N runs: per-run header (status, duration, run_id), files/rows/bytes summary, retries, quality issues, schema changes, error first line
- `--run-id <id>` flag to inspect a specific run
- **MCP server** — read-only DB introspection tools for Postgres/MySQL/pgBouncer (`pg_stat_activity`, checkpoint pressure, table stats, locks, `pg_stat_statements` IO, processlist, key metrics)

### Architecture (stabilisation plan — all complete)

- `ResolvedRunPlan` planning layer; `RawConfig` / `ValidatedConfig` separation (Epics 1, 3)
- Centralized `PlanValidator` with compatibility diagnostics (Epic 2)
- State update invariants + `tests/invariants.rs` (Epic 4, ADR-0001)
- CLI-product boundary locked, module visibility hardened (Epic 5, ADR-0002)
- Domain state stores: `CursorStore`, `ManifestStore`, `MetricsStore`, `SchemaHistoryStore`, `JournalStore` (Epics 6, 10)
- `ExtractionStrategy` explicit types; Planning / Execution / Persistence layers (Epics 7, 8, ADR-0003)
- `DestinationCapabilities`, `WriteCommitProtocol` per backend (Epic 9, ADR-0004)
- Semantic release gates — `test-invariants`, `test-recovery`, `test-compatibility`, **`test-type-golden`** required before build (Epics 11–14/arch)

### Release & distribution

- **CI:** rustfmt, clippy, tests, release build, **E2E** (Docker Compose), cargo audit, semantic gates + `test-type-golden`
- **Release:** Linux x86_64/arm64, macOS arm64/Intel binaries; Docker GHCR; Homebrew tap
- **Published:** `rivet-cli` on crates.io
- **Trust docs:** `SECURITY.md`, `docs/semantics.md`, `docs/reliability-matrix.md`, `docs/reference/compatibility.md`, `docs/pilot/`, `docs/bench/`

---

## 9.3 Phase 1 — Pilot alpha stabilization ✅ COMPLETE

### Epic A — Auth and connectivity ✅

| Task | Status | Notes |
|------|--------|-------|
| A1. Credential precedence matrix | ✅ | README §Credential precedence |
| A2. GCS ADC support | ✅ | ADC / `credentials_file` |
| A3. GCS explicit JSON credentials | ✅ | Validated at load |
| A4. DB credential normalization | ✅ | URL vs structured, mutual exclusion |
| A5. Auth diagnostics | ✅ | `rivet doctor` |

### Epic B — Preflight and planner 2.0 ✅

| Task | Status | Notes |
|------|--------|-------|
| B1–B6 | ✅ | Strategy, profile, sparse/dense warnings, parallel hints, verdict suggestions |
| B-conn | ✅ | Connection limit warning: `parallel >= max_connections` warns with exact numbers; skipped gracefully if fetch fails |

### Epic C — Execution semantics ✅

| Task | Status | Notes |
|------|--------|-------|
| C1–C5 | ✅ | Lifecycle, cursor timing, duplicates, retries, validation scope — documented |

### Epic D — Observability and run summary ✅

| Task | Status | Notes |
|------|--------|-------|
| D1–D4 | ✅ | Summary, `rivet metrics`, `rivet state files`, aligned IDs |

### Epic E — Documentation ✅

| Task | Status | Notes |
|------|--------|-------|
| E1–E5 | ✅ | README, USER_GUIDE modes/profiles/auth/limitations |

### Epic M — Output and CLI

| Task | Status | Notes |
|------|--------|-------|
| M1–M7 | ✅ | Compression, skip empty, splits, memory batch, completions, stdout, params |
| M8. Per-column Parquet encoding | ⏳ | Not started |

### Bonus (not in original alpha list)

| Item | Notes |
|------|--------|
| Streaming cloud uploads | `std::io::copy` to S3/GCS/stdout |
| Misplaced tuning detection | Clear errors + hints |
| Versioned SQLite migrations | `schema_version` |
| `aws_profile` for S3 | Sets `AWS_PROFILE` for OpenDAL chain |
| Chunked quality gate | Row-count bounds after all chunks; warn on null/unique in chunked mode |
| QA live-test harness | `tests/common/mod.rs` + **~55** ignored live tests across **11** `tests/live_*.rs` binaries mapped in [docs/reference/testing.md](docs/reference/testing.md); full offline suite (**~1345** tests) runs on every PR via `cargo test`. Includes **golden type round-trip** ([`live_type_golden.rs`](tests/live_type_golden.rs)). |

---

## 9.4 Phase 2 — Pilot readiness and battle testing

### Epic F — Auditability and correctness

| Task | Status | Priority | Notes |
|------|--------|----------|-------|
| F1 | ✅ Partial | P1 | Metrics + reconcile; formal “audit mode” TBD |
| F2 | ✅ | P1 | `--reconcile` |
| F3 | ✅ | — | Per-file row counts in state |
| F4 | ✅ | P1 | MATCH/MISMATCH in summary |
| F5 | ✅ | P2 | Reconcile vs validate tradeoff table added to cli.md |

### Epic G — Real-world test harness

| Task | Status | Priority | Notes |
|------|--------|----------|-------|
| G1 | ✅ | P1 | MinIO + E2E |
| G2 | ✅ | P2 | Toxiproxy wired into `docker-compose.yaml`, registered via `tests/common/mod.rs::ensure_toxi_proxy`, exercised by `tests/live_retry_and_faults.rs` + `tests/live_chaos.rs`; cross-process flock guard prevents suite races |
| G3 | ✅ Partial | P1 | `seed` inserts; limited mutations |
| G4 | ✅ Partial | P1 | `dev/` configs; edge fixtures TBD |
| G5 | ✅ | P0 | E2E matrix in CI — `ci.yml::e2e` runs both `dev/e2e/run_e2e.sh` **and** `cargo test --release -- --ignored` (**~55** ignored live tests across **11** `tests/live_*.rs` binaries, including **Trust golden** parity on Postgres + MySQL in `live_type_golden.rs`). See [docs/reference/testing.md](docs/reference/testing.md). |

### Epic H — Crash and recovery

| Task | Status | Priority | Notes |
|------|--------|----------|-------|
| H1 | ✅ | P1 | `dev/CRASH_MATRIX.md` |
| H2 | ✅ | P2 | Env-var-driven fault-injection hook in `src/test_hook.rs` (`RIVET_TEST_PANIC_AT`); four fault points across the write cycle (`after_source_read`, `after_file_write`, `after_manifest_update`, `after_cursor_commit`); crash-point recovery matrix in `tests/live_crash_recovery.rs`. Zero overhead when env var is unset (one relaxed atomic load per call). |
| H3 | ✅ | P1 | E2E recovery paths |
| H4 | ✅ Partial | P1 | Link CRASH_MATRIX from USER_GUIDE if missing |

### Epic I — Performance envelope

| Task | Status | Priority | Notes |
|------|--------|----------|-------|
| I1 | ✅ Partial | P1 | Manual runs; standardized datasets TBD |
| I2 | ✅ | P1 | `cargo bench` + `dev/scripts/bench.sh` save/compare; column_scan + shape_tracking groups |
| I3 | ✅ | — | USER_GUIDE defaults |
| I4 | ✅ | — | Check warnings |
| I5 | ✅ | P2 | Capacity/memory planning section in tuning.md (peak RSS formula, table width rules) |

### Epic J — Product UX

| Task | Status | Priority | Notes |
|------|--------|----------|-------|
| J1 | ✅ | P1 | `docs/` + `examples/` |
| J2–J4 | ✅ | — | Errors, troubleshooting, `doctor` |

### Epic K — First pilot rollout

| Task | Status | Priority | Notes |
|------|--------|----------|-------|
| K1 | ✅ | — | Pilot tables exercised |
| K2 | ✅ Partial | P1 | Multi-day automation TBD |
| K3 | ⏳ | P2 | Feedback template |
| K4 | ⏳ | P2 | Findings doc |

---

## 9.5 Phase 3 — Release engineering and ecosystem

### Epic L — Release and distribution *(lettered; not Epic 4 durable state)*

| Task | Priority | Status | Notes |
|------|----------|--------|-------|
| L1 | P0 | ✅ | Release workflow + matrix (Linux + macOS); green on v0.2.0-beta.2 |
| L2 | P0 | ✅ | GitHub Release assets published at v0.2.0-beta.2 |
| L3 | P0 | ✅ | `cargo publish` — rivet-cli v0.2.0-beta.2 published to crates.io |
| L4 | P1 | ✅ | Docker image via Dockerfile + GHCR (ghcr.io/panchenkoai/rivet) |
| L5 | P2 | ✅ | Homebrew tap panchenkoai/homebrew-rivet; auto-updated on release |

### Epic N — Advanced features (post-pilot)

| Task | Priority |
|------|----------|
| N1–N6 | P2–P3 as in prior roadmap (encoding, shape drift, strict YAML, webhook, rate limits) |

### Epic O — Future vision

| Task | Priority |
|------|----------|
| O1–O10 | P3 — CDC, Iceberg/Delta, multi-source, encryption, Prometheus, plugins, UI, sources, Flight, serverless |

---

## 9.6 Next priorities (rolling)

Prioritize by stabilization before distribution polish:

**Completed (v0.2.0-beta.2 → v0.3.5):**

1. ✅ **Green GitHub Release** — v0.2.0-beta.2 published (binaries, Docker, Homebrew tap).
2. ✅ **L3** — `cargo publish` — rivet-cli on crates.io.
3. ✅ **Connection limit warning** — `rivet check` warns when `parallel >= max_connections`.
4. ✅ **Date-native chunking** — `chunk_by_days` with `>= date AND < date` semantics, parallel support, E2E tested.
5. ✅ **Stabilisation plan Waves 1–3** — all 14 arch epics shipped (invariant/recovery/compatibility tests, semantic release gates).
6. ✅ **Parallel export processes** — `--parallel-export-processes` with live cards UI (v0.3.4).
7. ✅ **Cards UI for `--parallel-exports`** — unified cards renderer + compact summaries (v0.3.5).
8. ✅ **Type safety layer M1–M6** — `rivet check --type-report`, `TypePolicy`, BigQuery compat, complex types (Enum/Interval/List).
9. ✅ **Type Roundtrip Proof (v0.7.8)** — PG/MySQL → Parquet/CSV preserved through 4 independent reader validators (DuckDB, ClickHouse, pyarrow, BigQuery); native Parquet `LogicalType::Uuid` / `LogicalType::Json` via `arrow.uuid` / `arrow.json` extension types; 31 live tests; cross-tool fidelity benchmark. Three real driver bugs fixed along the way: PG arrays losing NULL elements, MySQL ENUM/SET misclassified as String, MySQL `native_type` collapsing UNSIGNED / `TINYINT(1)` / `BIT(1)` / CHAR / VARCHAR variants.

**Remaining open items (P1 first):**

1. ✅ **Epic 7 schema drift policy** — `on_schema_drift: warn|continue|fail` YAML hook shipped.
2. ✅ **Epic 8 data shape drift** — `export_shape` SQLite table; `shape_drift_warn_factor` YAML config; warns on `N×` growth.
3. ✅ **F5 + I5** — reconcile/validate tradeoffs (cli.md); capacity/memory planning (tuning.md).
4. ✅ **I2** — `cargo bench` + `dev/scripts/bench.sh` save/compare harness; column_scan + shape_tracking groups.
5. ✅ **Epic 4 (§5)** — external/durable state backend: `RIVET_STATE_URL` PostgreSQL backend shipped.
6. ⏳ **Verify / Validation Layer (v0.7.9, next focus)** — new top-level `rivet verify` command answering *"are produced files + manifest + state + summary internally consistent?"*. Three depth levels (light file/size/schema-hash check → sample row read → full file scan), stable error codes `RIVET_VERIFY_*`, read-only (mutating fixes stay in `repair`). Catches missing/partial files, size mismatch, orphan output, manifest/state divergence, schema-hash mismatch. JSON output for automation. Builds on the Type Roundtrip Proof: type contract is now provable, next we need *artifact* consistency to be provable too. **Design open:** extend existing `--validate` (per ADR-0013 "no new flags") vs new top-level `rivet verify` command — pick before implementation.
7. ⏳ **Operator UX & Diagnostics (v0.8.0)** — structured diagnostics with stable codes (`RIVET_CONFIG_*`, `RIVET_SOURCE_*`, `RIVET_VERIFY_*`, …), severity (low / medium / high / blocking), actionable hints; `--json` everywhere; strategy explanation in `rivet plan` (why this chunk size, why this mode, what risk remains); `doctor` capability + blocker report.

### 9.6.1 UX hardening backlog (v0.7.8 walk-through findings)

The fast-track + pilot blessed-path walk in the v0.7.8 session found three
P1-class bugs (already fixed and folded into item 9 above) plus a longer
list of P2/P3 friction worth addressing while polishing for v0.7.9 /
v0.8.0. Each line is one focused change; pick off in order or interleave
with verify-layer work as bandwidth allows.

**Fixed in v0.7.8 session** (evidence in [`src/preflight/{postgres,mysql,analysis}.rs`](src/preflight/), [`src/config/source.rs`](src/config/source.rs), [`src/pipeline/cli.rs`](src/pipeline/cli.rs), [`src/pipeline/chunked/{sequential,parallel}_checkpoint.rs`](src/pipeline/chunked/)):

- ✅ `rivet check` no longer reports "No index detected" for indexed `chunk_column` / `cursor_column` — catalog-based btree probe overrides the EXPLAIN-of-base-query heuristic; verdict thresholds relaxed so indexed > 10 M rows is ACCEPTABLE not DEGRADED.
- ✅ `WARN: source URL contains plaintext password` no longer fires when the user already chose `url_env:` / `url_file:` — only inline `url:` (the misconfig case) triggers it.
- ✅ `rivet state show` after chunked-only runs no longer says "No export state recorded yet" — distinguishes "never ran" from "ran chunked, look at metrics / state files" and prints the right next-step pointer.
- ✅ `summary.retries` now actually increments in chunked exports (sequential + parallel paths) — was silently stuck at 0, masking flaky-link runs that only worked because backoff covered for them. Visible in console summary card, `rivet metrics`, and `export_metrics.retries`.

**P2 — friction, not bugs** (tackle before v0.7.9 release if time permits):

- ⏳ **`check` verdict pessimism vs actual run.** UNSAFE / DEGRADED predicts high RSS on parallel reads against no-index tables, but the actual run frequently sits well under the predicted budget (e.g. 10 M-row chunked-parallel orders ran in 12 s at 117 MB RSS while check said UNSAFE). Either make the predictor use estimated row width + chunk batch size for a tighter bound, or downgrade UNSAFE to DEGRADED when the verdict can't show a concrete budget breach.
- ⏳ **`destination is not retry-safe` WARN spams local-destination runs.** `type: local` is the canonical dev/test destination but each run prints a retry-safety WARN. Either suppress for `local` (partial artifact = one failed-rewrite file, not silent data loss) or have `rivet init` add the right opt-in so the warn does not fire on defaults.
- ✅ **TLS warning now surfaces in `doctor`/`check`** (v0.7.8) — preflight calls `warn_if_tls_disabled` ([src/preflight/doctor.rs](src/preflight/doctor.rs), [src/preflight/mod.rs](src/preflight/mod.rs)), so the missing-`tls:` warning fires before a real extract, not only in `run`.
- ⏳ **`rivet init --schema X` includes ad-hoc / test tables.** Schema-wide init dumps every relation in the schema (140 entries in our dev DB, most of them leftover test artifacts). Add an `--exclude '<glob>'` flag or a heuristic that skips tables whose names match common temp/test patterns (`tmp_*`, `*_temp`, tables with PID-shaped suffixes).

**P3 — polish & doc clarity** (good v0.8.0 fodder):

- ⏳ **`rivet init` does not explain *why* it picked a mode.** Generated YAML says `mode: chunked` but not "(auto-selected because rows estimate ≥ 500 K)". One-line inline comment in the rendered config would close the loop.
- ⏳ **`rivet journal --export X` does not show retry events.** Doc promises "per-run events / retries / quality issues"; today it only shows status + duration. Plumb per-chunk retry attempts into the structured journal so post-mortem doesn't require digging in stderr WARNs.
- ⏳ **100 files per 10 M-row chunked export.** Default `chunk_size: 100_000` × big table = many small files. `rivet init` could scale `chunk_size` logarithmically with the row estimate (e.g. ≥ 10 M → 1 M chunk_size) so default file counts stay reasonable.
- ✅ **`status: skipped` now names the cursor** (v0.7.8) — shows `(no new rows since cursor '<col>')` ([src/pipeline/single.rs](src/pipeline/single.rs)) so the operator doesn't have to guess.
- ✅ **Doc note added: re-running `chunked` re-extracts everything** — [`docs/modes/chunked.md`](docs/modes/chunked.md) § "Clean re-runs are NOT idempotent" spells out that `--resume` only skips completed chunks after a crash.
- ⏳ **Doc note: `time_window` re-runs duplicate output.** Rolling window mode does not persist "we did this window already", so frequent re-runs produce duplicate files. Worth a paragraph in `modes/time-window.md`.
- ⏳ **Retry / I3 (Write Before Cursor) at-least-once dupe scenario** not yet covered by tests. The contract documents the duplicate possibility (`ADR-0001 I3`), and toxiproxy-based retry testing showed counters work; a dedicated SIGKILL-between-write-and-commit recovery test would pin the actual dupe behavior end-to-end.
- ⏳ **Stale roadmap items inherited from earlier sessions:** "2–3 pilot tables repeated on a schedule" in §9.7 (organizational), SBOM / signed release attestations (also §9.7 unchecked). *(Release checksums shipped in v0.7.8 — now checked.)*

---

## 9.7 Definition of done — stable v0.5.x

- [x] Auth predictable and documented
- [x] `rivet check` actionable strategy and safety guidance
- [x] Execution semantics frozen and documented
- [x] Run summary + reconciliation (`--reconcile`)
- [x] Crash/recovery tested (matrix + E2E)
- [x] Local battle lab (MinIO + compose + E2E)
- [x] Docs for real scenarios (`docs/` canonical; `USER_GUIDE.md` deprecated, navigation consolidated)
- [x] Architecture stabilisation — Waves 1–3 complete (14 epics, invariant/recovery/compatibility tests, semantic release gates)
- [x] Plan/Apply workflow — sealed execution artifacts, ADR-0005
- [x] Parallel exports — `--parallel-exports` + `--parallel-export-processes` with live cards UI
- [x] Type safety layer — `--type-report`, TypePolicy, BigQuery compat, complex types (M1–M6)
- [x] **Type roundtrip proof (v0.7.8)** — PG/MySQL × Parquet/CSV validated through DuckDB + ClickHouse + pyarrow + BigQuery; native Parquet `LogicalType::Uuid` / `LogicalType::Json`; `make test-types-validators`; per-tool fidelity benchmark in [`docs/bench/reports/`](docs/bench/reports/)
- [x] Schema drift policy hooks — `on_schema_drift: warn|continue|fail` (Epic 7)
- [x] Data shape drift detection — string/text width tracking (Epic 8)
- [ ] 2–3 pilot tables repeated on a schedule *(organizational; optional automation K2)*
- [x] Cross-platform release binaries **published** (v0.3.5: Linux x86_64/arm64, macOS arm64/Intel, Docker GHCR, Homebrew tap)
- [x] E2E matrix in CI
- [x] Published to crates.io (rivet-cli)
- [x] Security policy and sensitive-artifact guidance
- [x] Execution semantics and known non-guarantees documented
- [x] Reliability and compatibility matrices published
- [x] Pilot and production-readiness docs published
- [x] Release checksums *(v0.7.8 — `SHA256SUMS.txt` published per release; verify with `sha256sum -c`)*
- [ ] Signed releases / attestations
- [ ] SBOM

---

# 10. Engineering Optimization Backlog (v0.7.8 release + code audit)

This section captures findings from a code-level audit conducted against the
v0.7.8 release (tag, release notes, artifacts) and the live source tree. Unlike
§5–§9 (feature epics), these are **hardening / optimization releases**: each one
closes a gap between a documented promise and what the engine actually
guarantees, or removes a structural risk. They are ordered by
**impact ÷ uniqueness** — OPT-1 is foundational debt under the headline claim;
OPT-2 is the most differentiating new capability.

The audit also confirmed what is *already sound* and should not be re-litigated:
the I1–I8 invariant model + failure-point map (ADR-0001), SQLite `WAL` +
`busy_timeout` (`src/state/mod.rs:537-549`), the single-held-runtime async bridge
for OpenDAL destinations (`src/destination/gcs.rs:51-76`, no `block_on`-per-call),
and the two-engine separation with explicit revisit triggers (ADR-0010).

## 10.1 Findings summary

> **Validated against the source — see
> [`docs/planning/optimization-backlog-validation.md`](docs/planning/optimization-backlog-validation.md)**
> for per-item verdicts, file:line evidence, and implementation plans. A
> line-by-line pass corrected OPT-4 and OPT-5 substantially (a deterministic
> per-part content hash already exists; MySQL hard-refuses rather than silently
> degrading) and narrowed OPT-1/OPT-2. The validation doc is authoritative over
> the prose below.

| ID | Area | Priority | Status | One-line |
|---|---|---|---|---|
| OPT-1 | Memory safety | P2 | ✅ Done (core) | Per-value ceiling `RIVET_VALUE_TOO_LARGE` shipped (189d915, default 256 MB). Follow-ups: probe-batch warmup shrink, check-predictor feedback |
| OPT-2 | Adaptive concurrency | P1 | ✅ Done | Adaptive parallelism governor shipped (141bf33) — resizes worker count in `[min, parallel]` under source pressure (in-process engine). Follow-ups: subprocess engine (after OPT-6), richer `rivet-mcp` signals |
| OPT-3 | Type fidelity | P1 | ⏳ Open (validated) | Round-trip proof is enumerated-fixture; no proptest type test exists — confirmed |
| OPT-4 | MySQL parity | P1 | ✅ Done | MySQL keyset (seek) pagination shipped (40433a0) — single-column unique key, sequential, EXPLAIN-verified index range scan. Follow-ups: composite keys, parallel keyset, resume |
| OPT-5 | Dedup ergonomics | P2 | ⏳ Open (**corrected**) | Deterministic per-part `content_fingerprint` **already exists** (`manifest.rs:160`); gap = guarantee/expose + parquet byte-determinism |
| OPT-6 | Engine debt | P2 | ⏳ Open (validated) | Subprocess fan-out engine has **zero** crash tests; no signal handling — confirmed gap |
| OPT-7 | Doc/roadmap drift | P1 | ✅ Done | Checksums documented as shipped (SECURITY.md + README + §5.1/§9.7); 3 §9.6.1 items struck. *Verified 3 other claimed-shipped §9.6.1 items (local-retry WARN, init mode-comment, init chunk_size scaling) are NOT shipped — left open.* |

---

## OPT-1 — Memory-bound hardening (residual gaps on an existing adaptive cap)

**Priority: P2** — *corrected after reading the source.* An earlier draft of this
item claimed there was "no hard byte budget, only reactive sampling." **That was
wrong.** A row-width-adaptive byte-budget cap already exists on both engines and
genuinely backs the headline claim. This item is now narrow edge-case hardening,
not foundational debt.

**What already exists (do not re-build):**
- **MySQL** (`src/source/mysql/mod.rs:385-440`): a 500-row `PROBE_BATCH_SIZE`
  first batch measures real Arrow bytes/row (`SourceTuning::batch_memory_bytes`),
  then caps `effective_bs` so each flush fits `tuning.batch_size_memory_mb`
  (default `MYSQL_BATCH_TARGET_MB_DEFAULT = 64` MB), never exceeding the
  configured `batch_size`.
- **PostgreSQL** (`src/source/postgres/mod.rs:380-440`): a 500-row
  `PROBE_FETCH_SIZE` first FETCH, then `FETCH N` is capped under
  `work_mem × 0.7` (`pg_fetch_work_mem_bytes`) so the server-side cursor never
  spills to `pgsql_tmp/`. The in-code comment explicitly anticipates the
  "single huge FETCH triggers spill" case — that is *why* the probe exists.
- Plus a reactive RSS sampler (`src/resource.rs`) and `memory_threshold_mb`
  pause-gate (`src/pipeline/chunked/exec.rs:66,269,271`) as a backstop.

**Residual gaps (the actual scope of this item):**
1. **Probe-batch warmup is uncapped.** The first 500 rows are buffered before
   bytes/row is known. The code assumes "500 × ~4 KB ≈ 2 MB"; 500 genuinely
   huge rows (e.g. 500 × 1 MB) overshoot before the cap is computed.
2. **Single-outlier value.** The cap is based on *average* bytes/row in a batch.
   One 200 MB JSONB/`bytea` cell among normal rows still lands in a single
   batch; an average-based cap doesn't bound a lone giant value. There is no
   hard per-value ceiling with a typed error.
3. **Soft target × threads.** The cap is a per-batch *target* (~64 MB), and with
   `parallel` threads peak is ×N. It is not a hard process-level ceiling.

**Proposed change.**
- A hard per-value size limit with a typed error (`RIVET_VALUE_TOO_LARGE`) so a
  single fat cell fails cleanly instead of risking OOM.
- Optionally shrink the probe batch when `avg_row_bytes` (already estimated in
  preflight) is large, so warmup respects the budget too.
- Feed the measured row width back into the `check` predictor (tightens the
  pessimistic UNSAFE/DEGRADED verdict, §9.6.1).

**Definition of done.** A fixture with a deliberately fat value column
(e.g. 256 MB JSONB rows) either exports under the configured budget or fails
with `RIVET_VALUE_TOO_LARGE` — never OOMs; regression test in the soak matrix.

---

## OPT-2 — Adaptive concurrency governor (extend existing adaptation to parallelism)

**Priority: P1** — the differentiating item. *Scope corrected after reading the
source:* batch-size adaptation under source pressure **already exists** — this
item is the next step, not a greenfield build.

**What already exists (do not re-build):** `tuning.adaptive` already samples a
source-side pressure proxy each `ADAPTIVE_SAMPLE_INTERVAL` batches and resizes
the batch via `next_adaptive_batch_size` — PG via `pg_sample_checkpoints_req`
(`src/source/postgres/mod.rs`), MySQL via `mysql_sample_innodb_log_waits`
(`src/source/mysql/mod.rs:444-464`). So the control loop exists; it adjusts
*batch size* off *one* proxy signal.

**The actual gap (two dimensions the current loop doesn't cover):**
1. **It governs batch size, not parallelism / connection count.**
   The default `parallel` is **1** (`src/config/export.rs:319`, conservative),
   but each chunk worker opens its *own* source connection inside `s.spawn`
   (`src/pipeline/chunked/exec.rs:305`, gated by `Semaphore::new(parallel)` at
   `exec.rs:246`) — `Source: Send` not `Sync` (ADR-0011). So a user who dials
   `parallel: N` holds up to N connections against exactly the fragile,
   low-`max_connections` source the tool protects. Preflight only *warns*
   (`src/preflight/analysis.rs:164-182`); the count is static at runtime — the
   governor never backs it off. Extend the loop to adjust the semaphore permit
   count (and `throttle_ms`) within `[min, max]`, not just batch size.
2. **It reads one proxy signal, not the richer `rivet-mcp` set.** `rivet-mcp`
   already exposes `pg_stat_activity` (active / lock-wait / idle-in-txn),
   pgBouncer saturation, and MySQL `SHOW PROCESSLIST`. Feed those into the same
   governor so back-off reacts to real contention (lock waits, replication lag,
   active-query count), not just checkpoint/log-wait pressure. One pressure
   model shared between `rivet-mcp` and the run loop.

**Proposed change.**
- Promote the existing adaptive loop from "resize batch" to a governor that also
  resizes the active permit count + throttle.
- Reuse the `rivet-mcp` read-only query surface as the pressure source.
- Surface governor decisions in the run journal ("backed off parallel 16→8 at
  T+45s: lock_waits=12").

**Definition of done.** Under a synthetic concurrent-OLTP load fixture, an
adaptive run keeps a source-side pressure metric below a threshold that a
static `parallel=N` run breaches; decisions visible in `rivet journal`.

---

## OPT-3 — Property-based type round-trip (proof by construction)

**Priority: P1** — converts type rigor from "many tests" to "provable".

**Problem.** The four-reader round-trip matrix (`tests/type_roundtrip/`) is
excellent but proves only the **enumerated** types. `UNSIGNED BIGINT → Decimal128`
was found *by example* in 0.7.8 — a symptom that integer-width / precision
overflow is a *class* found one funeral at a time. The long tail
(`numeric(1000,…)`, arrays-of-composite, domains, ranges, `citext`/`hstore`/
PostGIS, MySQL `ZEROFILL`/`BIT`/`SET` edge cases) is where silent corruption
lives.

**Proposed change.** A property-based harness that generates random schemas +
values for the supported type universe, exports, reads back through ≥1
independent reader, and asserts value + metadata equality. Shrink failing cases
to a minimal reproducer.

**Definition of done.** `make test-types-property` runs N generated schemas in CI
(PR tier: small N; nightly: large N) and any discovered mismatch fails the gate
with a minimized fixture checked into `tests/type_roundtrip/`.

---

## OPT-4 — MySQL safety parity (or explicit degraded mode)

**Priority: P1** — the headline "source-safe" property is asymmetric across engines.

**Problem.** PostgreSQL gets `DECLARE CURSOR` + `work_mem`-aware `FETCH N`
(0.19 s longest query). MySQL has no widely-supported server-side cursor in the
current client stack, so safety rests on PK-range chunking (9 s), which *requires*
a clean monotonic numeric PK. On a MySQL table with composite / UUID / no PK the
"don't hold a long query, don't OOM" guarantee silently degrades to a buffered
read or an expensive global sort. The weaker engine is also the more common
"fragile shared prod" case in the SMB segment.

**Proposed change.**
- Either: pursue a streaming read shape on MySQL that bounds memory without a
  clean PK (e.g. keyset pagination on the best available index).
- Or (cheaper first step): make the unsafe shape an explicit `opt-in` —
  `rivet check` / `rivet run` refuses the buffered/sort path on MySQL unless the
  operator acknowledges the degraded guarantee, instead of falling into it.

**Definition of done.** A MySQL table with no usable PK either exports under a
bounded memory budget, or fails preflight with a clear degraded-mode
acknowledgement requirement — never silently buffers the whole table.

---

## OPT-5 — Deterministic dedup token in the manifest

**Priority: P2** — makes "boring" boring all the way downstream.

**Problem.** I3 (Write-Before-Cursor) is correct, but a crash between write and
cursor advance produces a duplicate file, and the manifest carries no run-scoped
idempotency key / content hash a downstream `MERGE` can dedup on deterministically.
"Boring extraction" that needs a hand-written dedup recipe
(`recipes/idempotent-warehouse-load.md`) isn't fully boring.

**Proposed change.** Stamp each manifest entry (and optionally the filename) with
a deterministic content hash + `(chunk_id, cursor_range)` so downstream dedup is
mechanical, not convention-based. Pairs naturally with the existing xxHash3 used
in quality gates.

**Definition of done.** Two runs that re-extract the same window after a simulated
SIGKILL produce files whose manifest dedup keys collide, so a consumer can drop
the duplicate by key alone. Closes the §9.6.1 open *"I3 at-least-once dupe
scenario not covered by tests"* with an end-to-end SIGKILL-between-write-and-commit
test.

---

## OPT-6 — Two-engine maintenance debt + crash-matrix symmetry

**Priority: P2** — accepted debt (ADR-0010), but it compounds.

**Problem.** ADR-0010 honestly lists the cost: two retry loops, two progress UIs,
two error-aggregation paths. Every cross-cutting feature (graceful shutdown,
tracing, OPT-2 governor, OPT-5 dedup token) must be built twice or it drifts. The
highest-risk corner is SIGTERM / parquet-footer finalization in the subprocess
engine — the classic "killed mid-run leaves a corrupt parquet" regression site.

**Proposed change.**
- Audit `dev/CRASH_MATRIX.md` for **symmetry**: every crash scenario should run
  against both the in-process thread engine (`src/pipeline/chunked/exec.rs`) and
  the subprocess fan-out engine (`src/pipeline/parallel_children.rs`).
- Add the OPT-2/OPT-5 cross-cutting features to a shared layer where possible so
  they are not implemented twice.

**Definition of done.** The crash matrix asserts no corrupt/footerless output
file under SIGTERM/SIGKILL for *both* engines, and the gap (if any) is documented
as a known non-guarantee.

---

## OPT-7 — Doc / roadmap drift sync (cheap, do first)

**Priority: P1** — near-zero effort, removes a credibility leak, and prevents
re-planning work that is already done.

**Findings.**
- **Checksums already ship, docs say otherwise.** `release.yml:153` generates
  `SHA256SUMS.txt`, and the v0.7.8 release publishes it as an asset — yet
  `SECURITY.md:165-169` still tells users *"until checksums and signatures are
  published, verify by rebuilding from source… `git checkout v0.6.0`"* (also a
  stale tag), and §5.1 (line ~579) marks "Release checksums" as `⏳ Open / P1`.
  The v0.7.8 release notes don't mention the checksums at all. → Add verification
  instructions to README/SECURITY, mention checksums in release notes, flip
  §5.1 + §9.7 to done.
- **Several §9.6.1 ⏳ items were already fixed in 0.7.8 but not struck:** retry-safe
  WARN demoted to DEBUG for local (line ~996), TLS warning now fires from
  `doctor`/`check` (~997), `rivet init` explains its mode choice inline (~1002),
  `chunk_size` scaled to row estimate (~1004), `status: skipped (no new rows since
  cursor X)` (~1005). The 0.7.8 release notes' `polish(ux)` bullet is the evidence.
  → Strike these so the backlog stops overstating remaining work.

**Definition of done.** §5.1, §9.6.1, and §9.7 reflect the actual shipped state of
v0.7.8; README/SECURITY document checksum verification against the published
`SHA256SUMS.txt`.

---

## 10.2 Suggested execution order

By impact ÷ effort (revised after source audit — the memory cap already exists,
so OPT-1 drops from the top spot):

1. **OPT-7** (doc/roadmap sync) — hours, removes a trust leak, unblocks accurate planning.
2. **OPT-3** (property-based types) — most autonomous; immediately strengthens the trust story.
3. **OPT-2** (adaptive governor → parallelism) — the differentiating bet; builds on the existing adaptive loop.
4. **OPT-4** (MySQL parity / explicit degrade) — closes the engine asymmetry.
5. **OPT-1** (memory residual: per-value cap + probe warmup) — edge-case hardening on an existing cap.
6. **OPT-5 / OPT-6** — ergonomics + debt paydown, interleave as bandwidth allows.