drip-cli 0.1.1

Delta Read Interception Proxy — sends only file diffs to your LLM agent
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
# Benchmarks

This document collects measured numbers for DRIP: token savings on
realistic agent workloads, per-language compression rates, latency
budgets, and cost projections.

**All numbers are from real measurements** on production-grade source
files (roughly 500–850 lines each, 8 languages). Nothing is hand-tuned or
rounded favourably — if a workflow saves 31 %, the table prints 31 %.
Re-run any time:

```bash
cargo build --release
bash scripts/bench_multilang.sh                   # token savings + latency
bash scripts/verify_signatures.sh                 # signature preservation
python3 scripts/generate_benchmarks_md.py > BENCHMARKS.md
```

The fixtures live in `scripts/bench_fixtures/`; the JSON dumps in
`scripts/bench_output/`. Both are tracked in git so you can verify
the numbers without rerunning.

---
## Summary

Measured on 8 production-grade fixtures (roughly 500–850 lines each), 4
single-file agent workflows per language, 45 effective latency samples
(50 raw, 5 warmup discarded) per operation. Workflow rows show the **aggregate-by-tokens** ratio (the
sum-of-all-fixtures ratio that the cost projection in section 5
also uses); the per-fixture range is shown in parentheses.

| Metric                                                       | Value                                                            |
|--------------------------------------------------------------|------------------------------------------------------------------|
| Languages tested                                             | 8 of 8                                                            |
| First-read semantic compression                              | **42% simple average across fixtures** (35–62 % per fixture) — **41% aggregate by tokens** |
| Explore workflow (1 read + 1 same-session unchanged re-read) | **70% aggregate** (67–80 % per fixture)                                |
| Debug workflow (1 read + 4 same-session unchanged re-reads)  | **88% aggregate** (86–92 % per fixture)                                |
| Edit workflow (read + edit-cert + unchanged + edit-cert)     | **83% aggregate** (80–89 % per fixture)                                |
| Multi-edit workflow (3 edit cycles, 7 reads)                 | **48% aggregate** (47–51 % per fixture)                                |
| Glob hook (`.dripignore` on a synthetic noisy tree)         | **33% saved** (paths filtered, see section 3)                 |
| Grep hook (`.dripignore` on a synthetic noisy tree)         | **67% saved** (matches filtered, see section 3)               |
| Post-edit verification certificate (see section 4)         | **99% saved aggregate** (209,001 → 3,098 B across all 8 fixtures) |
| Latency tail (worst p99 across 45 samples per outcome)       | **59.6 ms**                                              |
| Memory (max RSS, 1 MB file)                                  | ~10 MB (constant)                                                |
| Signature / import / type preservation                       | **100 %** (all 8 fixtures, see section 7)                              |

DRIP's largest win is **avoiding repeated context reinjection of
files the agent has already seen**. First-read semantic compression
saves about **42%** on this fixture set; workflows that re-read
the same file save substantially more (Debug aggregates
**88%**) because subsequent reads return a minimal
unchanged sentinel rather than the full file content. The Edit row
measures DRIP's post-edit verification certificate path: when the
agent immediately re-reads a file after an edit, DRIP returns a
compact attestation (hash + touched ranges, ~390 B) instead of
reinjecting the full file. The Multi-edit row captures a broader
refactor loop where repeated edits + reads still cost more than a
pure unchanged sentinel, but each post-edit re-read still rides the
cert path. Both numbers are published as-measured rather than
massaged to inflate the headline.

---
## 1. Semantic compression on first reads

Reduction comes from signature-preserving elision: function bodies
are replaced with `{ ... }` while every signature, doc-comment,
import, and type/class declaration stays visible. The compressor is
conservative — when in doubt it keeps the body inline rather than
risk mangling output.

| Language   | Fixture                | Lines | Bytes   | Tokens (full → sent) | Reduction | Elided   | Hidden    |
|------------|------------------------|------:|--------:|---------------------:|----------:|---------:|----------:|
| Python     | `pricing_engine.py` |   731 |  27,096 B |  6,774 →  2,575 | ** 62 %** |   15 fns |  426 lines |
| Rust       | `session_manager.rs` |   582 |  22,915 B |  5,729 →  3,727 | ** 35 %** |    6 fns |  218 lines |
| TypeScript | `api_client.ts` |   646 |  26,109 B |  6,528 →  4,057 | ** 38 %** |    6 fns |  278 lines |
| Java       | `UserRepository.java` |   744 |  30,106 B |  7,527 →  4,666 | ** 38 %** |    8 fns |  292 lines |
| Go         | `http_handler.go` |   569 |  20,595 B |  5,149 →  2,822 | ** 45 %** |    9 fns |  285 lines |
| C++        | `json_parser.cpp` |   691 |  23,984 B |  5,996 →  3,549 | ** 41 %** |    8 fns |  250 lines |
| C#         | `OrderService.cs` |   659 |  27,728 B |  6,932 →  4,340 | ** 37 %** |    7 fns |  248 lines |
| Kotlin     | `DataRepository.kt` |   840 |  30,260 B |  7,565 →  4,860 | ** 36 %** |    8 fns |  297 lines |

Variance across languages is real and expected: short bodies are kept
inline, dense docstring/JSDoc files compress harder, languages with
heavier ceremony (Java annotations, C# attributes) reduce less in
percentage terms because the headers are themselves verbose. None of
that is hidden — every value above is read directly from the
rendered first-read output that the agent would receive.

---
## 2. Token savings — four agent workflows

Each workflow runs independently in its own DRIP session, with the
same fixture file as the only file the agent touches. Tokens are
DRIP's `bytes / 4` estimator (see section 10); the percentages are derived
quantities from that estimator and should be read as **trends on
this fixture set**, not as a guarantee about any specific
tokenizer's exact savings.

### Workflow A — Explore (2 reads)

First read + 1 unchanged re-read **in the same DRIP session**. Tests the same-session unchanged path: DRIP recognises that the second read sees byte-identical content to the first and responds with a minimal unchanged sentinel — no file content is reinjected. Cross-session behaviour is *not* exercised here — both reads share `DRIP_SESSION_ID`.

| Language   | Reads | Without DRIP | With DRIP | Saved |
|------------|------:|-------------:|----------:|------:|
| Python     |     2 |     13,548 |      2,684 | ** 80 %** |
| Rust       |     2 |     11,458 |      3,836 | ** 67 %** |
| TypeScript |     2 |     13,056 |      4,163 | ** 68 %** |
| Java       |     2 |     15,054 |      4,776 | ** 68 %** |
| Go         |     2 |     10,298 |      2,929 | ** 72 %** |
| C++        |     2 |     11,992 |      3,656 | ** 70 %** |
| C#         |     2 |     13,864 |      4,447 | ** 68 %** |
| Kotlin     |     2 |     15,130 |      4,968 | ** 67 %** |

### Workflow B — Debug (5 reads)

First read + 4 unchanged re-reads in the same session. Simulates the agent re-reading a single module while debugging.

| Language   | Reads | Without DRIP | With DRIP | Saved |
|------------|------:|-------------:|----------:|------:|
| Python     |     5 |     33,870 |      2,806 | ** 92 %** |
| Rust       |     5 |     28,645 |      3,958 | ** 86 %** |
| TypeScript |     5 |     32,640 |      4,282 | ** 87 %** |
| Java       |     5 |     37,635 |      4,901 | ** 87 %** |
| Go         |     5 |     25,745 |      3,051 | ** 88 %** |
| C++        |     5 |     29,980 |      3,778 | ** 87 %** |
| C#         |     5 |     34,660 |      4,569 | ** 87 %** |
| Kotlin     |     5 |     37,825 |      5,090 | ** 87 %** |

### Workflow C — Edit cycle (4 reads, 1 edit)

Read → edit (swap to v2, fire PostToolUse:Edit hook) → re-read (edit certificate) → re-read (unchanged) → revert (swap back to v1, fire hook again) → re-read (edit certificate). The cert path replaces what would otherwise be a native full-file shipment every time the post-edit re-read fires; `DRIP_CERT_DISABLE=1` reverts the workflow to the legacy passthrough path (each post-edit re-read then ships the full file natively, with `tokens_sent = tokens_full` accounted to match what the agent sees).

| Language   | Reads | Without DRIP | With DRIP | Saved |
|------------|------:|-------------:|----------:|------:|
| Python     |     4 |     27,500 |      2,938 | ** 89 %** |
| Rust       |     4 |     23,182 |      4,113 | ** 82 %** |
| TypeScript |     4 |     27,858 |      4,553 | ** 84 %** |
| Java       |     4 |     28,306 |      5,206 | ** 82 %** |
| Go         |     4 |     20,768 |      3,329 | ** 84 %** |
| C++        |     4 |     24,626 |      3,981 | ** 84 %** |
| C#         |     4 |     28,366 |      4,742 | ** 83 %** |
| Kotlin     |     4 |     27,742 |      5,545 | ** 80 %** |

### Workflow D — Multi-edit (7 reads, 3 edit cycles)

First read + 3 (edit + 2 re-reads) cycles. Simulates a refactor session where the agent reads, modifies, and re-reads the same file repeatedly.

| Language   | Reads | Without DRIP | With DRIP | Saved |
|------------|------:|-------------:|----------:|------:|
| Python     |     7 |     48,226 |     23,495 | ** 51 %** |
| Rust       |     7 |     40,635 |     21,374 | ** 47 %** |
| TypeScript |     7 |     49,188 |     25,576 | ** 48 %** |
| Java       |     7 |     49,085 |     25,642 | ** 48 %** |
| Go         |     7 |     36,387 |     18,633 | ** 49 %** |
| C++        |     7 |     43,256 |     22,371 | ** 48 %** |
| C#         |     7 |     49,800 |     25,966 | ** 48 %** |
| Kotlin     |     7 |     47,919 |     25,230 | ** 47 %** |

---
## 3. Non-read hooks: Glob, Grep

DRIP doesn't only intercept `Read`. Its `.dripignore`-aware Glob and
Grep hooks filter the agent's tool-call output. Both are measured
on a **synthetic noisy project tree** built by the bench (sources +
`.git/` + `target/` + `node_modules/` + `build/` + lock files).
Treat these as **representative scenarios** — Glob/Grep savings
depend on how much of your real repo matches the `.dripignore`
patterns.

| Hook  | Scenario                                            | Without DRIP | With DRIP | Detail | Saved |
|-------|-----------------------------------------------------|-------------:|----------:|--------|------:|
| Glob  | find -type f over a synthetic project tree (sources + .git/ + target/ + node_modules/ + build/ + 3 lock files) | 2,427 B (paths 28) | 1,632 B (paths 19) | 9 paths dropped | **33%** |
| Grep  | rg '\b(fn\|def\|func\|function\|public)\b' over the same tree | 157,877 B (1,146 matches) | 51,801 B (346 matches) | 800 matches dropped | **67%** |

Glob and Grep rows are a single tool-call result. Every DRIP byte
count is read live from the binary — no modeling.

**Latency** (45 effective samples per operation — 50 raw, 5 warmup
discarded; same methodology as section 6):

| Hook | Scenario | p50 (ms) | p95 (ms) | p99 (ms) |
|------|----------|---------:|---------:|---------:|
| Glob | filtered `find`             | 1.74 | 2.23 | 2.68 |
| Grep | filtered `rg` over the tree | 4.25 | 5.52 | 5.57 |

p50 / p95 were stable in local reruns; p99 has more variance and
is reported as a tail indicator, not a guarantee.

**Reading the numbers honestly.** Glob and Grep are real token-level
savings via `.dripignore` filtering. Their magnitude depends almost
entirely on how much of your repo lives under ignored paths
(`node_modules`, `target`, `.git`, lock files). The synthetic tree
above is loaded with that noise on purpose; an "all-source" repo
with a near-empty `.dripignore` would save much less.

Re-run with `bash scripts/bench_non_read_hooks.sh`. Raw output:
`scripts/bench_output/non_read_hooks.json`.

---
## 4. Post-edit certificates

When an agent edits a file and then immediately reads it back to verify
the change (a "must Read before Edit" pattern enforced by some tool
harnesses), DRIP returns a compact `[DRIP: edit verified | hash: …]`
certificate instead of letting the read fall through to a native
full-file shipment. The certificate carries the file hash, the touched
line ranges parsed from the diff hunks, and a refresh hint.

Without the certificate path, the post-edit re-read sends the entire
file contents (the harness ignores DRIP's deny-as-substitute responses
in this narrow window). With it on, the agent gets a few hundred bytes
of attestation. Disable with `DRIP_CERT_DISABLE=1`; window via
`DRIP_CERT_WINDOW_SECS` (default 300).

| Language   | Fixture | Without DRIP | With cert | Saved |
|------------|---------|-------------:|----------:|------:|
| Python     | `pricing_engine.py` | 27,122 B | 389 B | **99%** |
| Rust       | `session_manager.rs` | 22,941 B | 391 B | **98%** |
| TypeScript | `api_client.ts` | 26,135 B | 381 B | **99%** |
| Java       | `UserRepository.java` | 30,132 B | 393 B | **99%** |
| Go         | `http_handler.go` | 20,621 B | 385 B | **98%** |
| C++        | `json_parser.cpp` | 24,010 B | 385 B | **98%** |
| C#         | `OrderService.cs` | 27,754 B | 385 B | **99%** |
| Kotlin     | `DataRepository.kt` | 30,286 B | 389 B | **99%** |
| **Aggregate** || 209,001 B | 3,098 B | **99%** |

The bench drives the actual hooks (`drip hook claude-post-edit` then
`drip hook claude`) so the numbers reflect what the agent really
sees. Re-run with `bash scripts/bench_post_edit.sh`. Raw output:
`scripts/bench_output/post_edit_cert.json`.

---
## 5. Cost projection

>**This is a linear projection for the measured file-read workload,
> not a prediction of total invoice savings.** In real sessions,
> tokens outside the measured file-read workload — system prompt,
> assistant output, test logs, lint output, build chatter, and other
> unmeasured tool output — dilute the percentage saved on the
> overall bill. Treat the figures below as a back-of-envelope sense
> of scale on file-read traffic only.

Extrapolating the **multi-edit** workflow (364,496 tokens
without DRIP, 188,287 tokens with DRIP — 48 % saved
across all 8 languages combined) to a solo developer
running 5 sessions / day, 22 work-days / month:

| Model                  | Price         | Without DRIP | With DRIP | Saved / month |
|------------------------|---------------|-------------:|----------:|--------------:|
| Claude Sonnet 4.6      | \$ 3.00/Mtok | \$ 120.28 | \$  62.13 | **\$  58.15** |
| Claude Opus 4.6        | \$15.00/Mtok | \$ 601.42 | \$ 310.67 | **\$ 290.74** |
| Claude Haiku 4.5       | \$ 1.00/Mtok | \$  40.09 | \$  20.71 | **\$  19.38** |
| GPT-5 (Codex)          | \$10.00/Mtok | \$ 400.95 | \$ 207.12 | **\$ 193.83** |
| Gemini 2.5 Pro         | \$ 2.50/Mtok | \$ 100.24 | \$  51.78 | **\$  48.46** |

To estimate your own case, point `drip meter --history` at a real
session — it surfaces dollar savings using whatever
`DRIP_PRICE_PER_MTOK` you configure, against your actual mix of
unchanged / delta / first-read traffic. The table above only
exists to give a back-of-envelope sense of scale.

---
## 6. Latency

End-to-end wall time including Rust process startup. The internal
DRIP work (DB lookup + diff) measures < 1 ms; the rest is the
roughly-flat ~5 ms cost of spawning the binary.

> **Sampling note.** Each cell is computed from **45 samples** (50 raw, first 5 discarded as warmup) per language × operation. p50 and p95 were stable in local reruns; p99 over 45 samples still has noticeable variance and should be read as a tail indicator, not a guarantee.

| Language   | First read (p50 / p95 / p99) | Unchanged (p50 / p95 / p99) | Delta (p50 / p95 / p99) |
|------------|-----------------------------:|----------------------------:|------------------------:|
| Python     |  5.37 /  6.72 /  7.25 |  5.08 /  5.84 /  6.27 |  5.41 /  5.96 /  9.75 |
| Rust       |  5.64 /  6.81 /  7.25 |  5.73 /  7.89 / 13.29 |  6.01 / 11.31 / 59.63 |
| TypeScript |  5.83 /  8.70 / 10.60 |  5.43 /  6.26 /  6.36 |  5.77 /  6.43 /  6.45 |
| Java       |  6.02 /  6.76 / 10.42 |  5.46 /  8.39 / 15.11 |  6.14 /  6.85 /  7.13 |
| Go         |  5.91 /  6.49 /  8.16 |  5.53 /  6.27 /  7.11 |  6.09 /  6.69 /  7.06 |
| C++        |  6.30 /  6.98 / 12.43 |  5.77 /  6.54 /  6.61 |  6.06 /  8.02 /  8.42 |
| C#         |  6.44 /  7.99 /  8.93 |  5.89 /  6.74 / 10.60 |  6.18 /  6.89 /  7.50 |
| Kotlin     |  7.03 / 10.47 / 23.58 |  5.91 /  6.65 /  8.32 |  6.88 /  7.82 /  8.28 |

All values in milliseconds. **Worst observed tail across every
language and operation: 59.63 ms** (out of 1080
samples total — 45 per outcome × 3 outcomes × 8 languages).
Medians cluster around 6–7 ms, p95s around 7–11 ms — the regime
that actually matters for the perceived hook latency. Numbers
were taken on macOS arm64 (Apple Silicon); we have not run an
equivalent multi-language bench on Linux x86_64, so cross-platform
deltas are not claimed here.

---
## 7. Signature / import / type preservation — 100 %

For every fixture the verifier extracts signature lines, type
declarations (`class`/`struct`/`enum`/`interface`/`trait`/`record`),
and imports from the original file, then asserts each one appears
**verbatim** (modulo leading whitespace) in DRIP's rendered
first-read output.

| Language   | Signatures | Type decls | Imports | Result |
|------------|-----------:|-----------:|--------:|:------:|
| Python     | 20/20 | 10/10 | 9/9 ||
| Rust       | 19/19 | 8/8 | 12/12 ||
| TypeScript | 18/18 | 10/10 | 4/4 ||
| Java       | 52/52 | 4/4 | 54/54 ||
| Go         | 16/16 | 9/9 | 1/1 ||
| C++        | 36/36 | 8/8 | 14/14 ||
| C#         | 12/12 | 14/14 | 8/8 ||
| Kotlin     | 37/37 | 11/11 | 13/13 ||

**All languages pass: every function/method signature, every type declaration, and every import line in the original file appears verbatim in the rendered first-read output.**

**What this check does and doesn't claim.** It shows that the
agent's *structural surface* — every callable name, every type
declaration, every import — appears verbatim in the rendered
output. It does **not** claim that no information was lost:
function bodies are deliberately elided behind a `[DRIP-elided:
N lines, run drip refresh for full]` placeholder. If the agent
needs the elided code, it asks for it — `drip refresh <file>`
re-serves the full content on the next read. Run the verifier
yourself with `bash scripts/verify_signatures.sh`.

---
## 8. Current benchmark scope

This document deliberately stays small and reproducible. Here is
what the numbers above actually cover, and — explicitly — what
they don't.

**What this benchmark measures.**
- Single-file, same-session agent workflows (Explore / Debug /
  Edit / Multi-edit) on 8 production-grade fixtures.
- First-read semantic compression (raw bytes → rendered bytes).
- Same-session unchanged re-reads, where DRIP returns a minimal
  unchanged sentinel and **does not reinject file contents**.
- Same-session delta size after a real edit (v1 ↔ v2 swaps).
- Hook + process latency, 45 samples per operation (50 raw, 5 warmup discarded).
- Verbatim preservation of every signature / type declaration /
  import line (section 7).
- Non-read hooks: Glob and Grep `.dripignore` filtering (section 3).
- Post-edit verification certificates, where an immediate read after an edit is replaced by a compact hash + range attestation (section 4).

**What it does *not* measure yet.**
- Real-world sessions: long-running, multi-file, multi-day workflows
  on a real agent (Claude Code / Codex / Gemini). DRIP is too
  young — we want to ship the tool first, then publish those
  numbers when they exist.
- Cross-session behaviour: the cross-session registry is exercised
  by integration tests but is *not* benchmarked here (every
  workflow uses a single, isolated `DRIP_SESSION_ID`).
- Compression of agent-side outputs (test runner output, build
  logs, lint output, other non-read tool results) — not covered by
  this benchmark.
- Multi-agent / parallel workflows.
- Task success rate (does the agent reach the goal faster, with
  fewer wrong turns?). This is the most interesting question to
  answer next; it requires running a real benchmark like SWE-bench
  with and without DRIP, which is on the roadmap and not done yet.
- Statistically robust latency (p99 from 1,000+ samples) — see
  the sampling note in section 6.

These aren't fatal gaps; they're the next benchmarks we want to
publish, in priority order. **None of the numbers in this file
should be read as a claim about any of the items in the second
list.**

---
## 9. Reproducibility

Everything in this file is generated from raw measurements committed
to the repo. To re-run from scratch:

```bash
git clone https://github.com/drip-cli/drip
cd drip
cargo build --release

# Token savings + latency  (~30 s on Apple Silicon)
bash scripts/bench_multilang.sh

# Signature preservation
bash scripts/verify_signatures.sh

# Non-read hooks: Glob / Grep filtering  (~15 s)
bash scripts/bench_non_read_hooks.sh

# Post-edit verification certificates  (~10 s)
bash scripts/bench_post_edit.sh

# Regenerate this file
python3 scripts/generate_benchmarks_md.py > BENCHMARKS.md
```

To rerun a single language:

```bash
LANGS=python bash scripts/bench_multilang.sh
LANGS=python bash scripts/verify_signatures.sh
```

Per-language raw output lives in `scripts/bench_output/<lang>.json`
(with `<lang>_first_read.txt` showing the literal bytes DRIP returns
on a first read).

---
## 10. Methodology notes

**Token estimator.** DRIP uses `bytes / 4` (rounded up) so the
benchmark is portable and doesn't pin to any one model's
tokenizer. The headline percentages should be read as **relative
trends on this fixture set**, not as guaranteed savings against a
specific tokenizer. Real BPE tokenizers (`cl100k_base`, GPT-5,
Claude, Gemini) vary in how they split punctuation, identifiers,
and whitespace, which can move the absolute numbers by
single-digit percentages in either direction. A future version
may add per-model tokenizer plug-ins for tighter accounting; we
haven't shipped that yet.

**DB state.** Each workflow runs in a fresh, named DRIP session
(`DRIP_SESSION_ID=bench-...`) so cross-session registry effects
don't bleed between benchmarks. The session is reset between
workflows; latency samples use independent sessions for first-read
to avoid interference from cached baselines.

**Process startup overhead.** A non-trivial fraction of the measured
wall-time is Rust binary startup (~5 ms). The internal diff path is
~1 ms. Future work: optional daemon mode behind a feature flag.

**Hardware.** Latency was taken on macOS arm64 (Apple Silicon).
We have not yet run an equivalent multi-language bench on Linux
x86_64; the older single-language `scripts/bench.sh` tracked
within ±15 % on the same operations, but that's not a measurement
of *this* fixture set and shouldn't be quoted as one.

**Where DRIP doesn't help much.**
- *Tiny files* (< 50 lines): function bodies are too short to elide;
  a re-read still hits the unchanged-path savings, but first-read
  compression is near zero.
- *Files in constant churn*: every read produces a new diff. DRIP
  still wins over the no-cache counterfactual but the gap narrows.
- *Languages where bodies are dominated by signatures themselves*
  (heavy Java annotations, dense XML doc): the percentage reduction
  on first read is lower than for languages with longer bodies.

**Why first-read reduction varies by language.**
Even after compressor tuning, languages with verbose structural
surface — annotations, generics, doc comments, imports, or
compact bodies — have less removable body text as a percentage of
the file. *Rust*, *TypeScript*, *Java*, *C#*, and *Kotlin* all
cluster in the mid-30s % range because their *signatures* consume
a large fraction of the total bytes: even when bodies elide
cleanly the maximum achievable reduction is lower than for *Python*
or *Go* where docstring-heavy bodies dominate. Re-read savings
(workflows B–D) are unaffected — the unchanged path is independent
of first-read elision.

**Workload limits.** The 4 workflows here are micro-benchmarks
constructed to be reproducible in seconds. Real agent sessions
combine all four patterns and add Edit / Write / Glob / Grep
traffic that DRIP's other hooks intercept (`drip meter --history`
shows the live mix on your machine). We do **not** claim that
real-session numbers will be equal to or better than what's
tabulated here — that claim requires running DRIP on long-form
agent traces, which is on the roadmap (see section 8) and not done yet.

---

*Generated by `scripts/generate_benchmarks_md.py` from JSON dumps in
`scripts/bench_output/`. Last run: see git log.*