# Row-size RSS sweep — rivet vs ingestr
Generated by `harness/sweep.py` (driven by `../scenarios.yaml`). The point isn't a
single throughput number — it's the **shape of the memory curve** as the row count
grows, measured the same way on the same machine.
## Method
- Fixture: the wide 20-column `content_items`, sliced to each scale with `CREATE
TABLE … AS SELECT … LIMIT n`.
- Tools: **rivet 0.14.0** (release binary, `mode: full` → local Parquet/snappy) vs
**ingestr 1.0.43** (`postgres → parquet`, its default 100k-row Arrow batches).
- Per (scale, tool): **1 warmup run discarded, then the median of 3 measured runs.**
- Wall + peak RSS via `/usr/bin/time -l` (external, not the tool's self-report).
- Box: macOS arm64 (a head-to-head — same machine, not the published Linux bench).
## Result
| 100,000 | rivet | 3.0 | **47** | — |
| 100,000 | ingestr | 3.1 | 882 | **19×** |
| 500,000 | rivet | 14.4 | **56** | — |
| 500,000 | ingestr | 10.3 | 1322 | **24×** |
| 1,000,000 | rivet | 30.1 | **70** | — |
| 1,000,000 | ingestr | 19.6 | 1261 | **18×** |
## Read it honestly
- **Memory: rivet uses ~18–24× less RAM at every scale.** Both curves are roughly
*flat* with row count — rivet because it works to a byte budget, ingestr because
its peak is one fixed 100k-row batch. So the gap is **structural and constant**,
not a small-data artifact: rivet wins ~20× at 100k *and* at 1M.
- **This sweep does NOT show ingestr "climbing"** — and it shouldn't. ingestr's RSS
scales with row **width × batch_rows**, not row count. The follow-up that shows it
*diverging* is a **width sweep** (narrow → wide fixtures), not this row-count one.
- **Wall: ingestr is ~1.5× faster at 500k / 1M** (native pgx + big Arrow batches),
tied at 100k. We don't hide it — different trade-off: rivet spends ~1.5× wall to
hold RAM ~20× lower and width-independent.
## Caveats
macOS, single box (not the Linux bench machine). rivet uses `mode: full` here for a
single-file apples-to-apples with ingestr; `mode: chunked` (byte budget) lands a
touch lower still. Re-run with `10_000_000` in `scenarios.yaml` on a box with the
disk for it (~34 GB of wide rows) to extend the curve.