oxidelta 0.1.4

VCDIFF (RFC 3284) delta encoder/decoder — Rust reimplementation of xdelta3
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
# Xdelta Repository Analysis

## Scope and Structure

This workspace has two layers:

- `src/main.rs`: a minimal Rust placeholder (`Hello, world!`).
- `xdelta/xdelta3/*`: the real C implementation (library + CLI) of Xdelta3/VCDIFF.

The analysis below focuses on `xdelta/xdelta3`, especially:

- `xdelta/xdelta3/xdelta3.c` (core engine and encoder)
- `xdelta/xdelta3/xdelta3-decode.h` (decoder state machine)
- `xdelta/xdelta3/xdelta3-main.h` (CLI, file I/O, external compression, recode/merge tools)
- `xdelta/xdelta3/xdelta3-hash.h` (rolling/hashed matching primitives)
- `xdelta/xdelta3/xdelta3-{djw,fgk,lzma}.h` (secondary compressors)
- `xdelta/xdelta3/xdelta3-blkcache.h` (source block cache and getblk callback glue)

---

## High-Level Architecture

Xdelta3 is organized around a re-entrant stream object (`xd3_stream`) that drives two state machines:

- Encode state machine (`ENC_INIT -> ENC_INPUT -> ENC_SEARCH -> ENC_INSTR -> ENC_FLUSH -> ENC_POSTOUT -> ENC_POSTWIN`) in `xdelta/xdelta3/xdelta3.c:3071`.
- Decode state machine (`DEC_VCHEAD -> ... -> DEC_FINISH`) in `xdelta/xdelta3/xdelta3-decode.h`.

CLI (`xdelta3-main.h`) wraps these APIs and adds:

- File I/O abstraction across POSIX/stdio/Win32.
- Optional external (subprocess-based) decompression/recompression.
- VCDIFF tools (`printhdr`, `printhdrs`, `printdelta`, `recode`, `merge`).
- App-header defaults and command-line parsing.

Core flow:

1. Configure stream (`xd3_config_stream`) with matching/compression flags.
2. Feed input windows (`xd3_avail_input` + `xd3_encode_input` / `xd3_decode_input`).
3. Handle return codes (`XD3_INPUT`, `XD3_OUTPUT`, `XD3_GETSRCBLK`, `XD3_GOTHEADER`, `XD3_WINSTART`, `XD3_WINFINISH`).
4. Consume output, then close/free stream.

---

## Core Algorithms

## 1) Rolling Hash and Match Discovery

### Small checksum path (target-to-target copies)

- Implemented in `xdelta/xdelta3/xdelta3-hash.h`.
- Uses a 4-byte read (`UNALIGNED_READ32`) multiplied by an LCG constant (`xd3_hash_multiplier32`).
- Update function shifts by one byte (`xd3_small_cksum_update`); note `look` is currently hardcoded effectively to 4 (comment in file).
- Hash bucket index is computed by `xd3_checksum_hash`.
- Candidate chains are traversed in `xd3_smatch` (`xdelta/xdelta3/xdelta3.c:4170`) using `small_prev` linked history (bounded by `small_chain` or `small_lchain`).

### Large checksum path (target-to-source copies)

- Polynomial/Rabin-Karp style rolling checksum in `xdelta/xdelta3/xdelta3-hash.h`.
- `xd3_large*_cksum`: weighted sum using precomputed powers.
- `xd3_large*_cksum_update`: rolling update using multiplier, outgoing byte, incoming byte.
- Source hash table is built incrementally by `xd3_srcwin_move_point` (`xdelta/xdelta3/xdelta3.c:4331`), which the source comments explicitly call one of the most expensive paths.

### Match expansion

- Source matches are validated/extended by `xd3_source_match_setup` + `xd3_source_extend_match` (`xdelta/xdelta3/xdelta3.c:3688`, `xdelta/xdelta3/xdelta3.c:3889`).
- Expansion supports backward and forward growth across source blocks, potentially re-entering via `XD3_GETSRCBLK`.
- Target self-matches are handled by `xd3_smatch`.
- Run-length detection exists via `xd3_comprun` + `xd3_emit_run`.

### Matching strategy profiles

Template-generated matchers are defined in `xdelta/xdelta3/xdelta3-cfgs.h`:

- `fastest`, `faster`, `fast`, `default`, `slow`, and `soft`.
- Tunables include `LLOOK`, `LSTEP`, `SCHAIN`, `SLCHAIN`, `MAXLAZY`, `LONGENOUGH`.
- CLI compression level maps to these profiles (see CLI section).

---

## 2) Instruction Selection and Encoding

### IOPT buffer and 1.5-pass optimization

- Candidate COPY/RUN instructions are queued in an instruction optimization buffer (`iopt_used`/`iopt_free`).
- Overlap resolution and pruning happen in `xd3_iopt_flush_instructions` (`xdelta/xdelta3/xdelta3.c:2313`).
- `xd3_iopt_erase` removes covered instructions when backward source extension finds better ranges.
- ADD instructions are synthesized around unmatched gaps (`xd3_iopt_add` / `xd3_iopt_add_finalize`).

### Code-table encoding and double-instruction packing

- Default RFC3284 table is generated from `__rfc3284_code_table_desc` in `xdelta/xdelta3/xdelta3.c:813`.
- Instruction emission chooses single or packed double opcodes (`xd3_emit_single`, `xd3_emit_double`) based on `xd3_choose_instruction`.
- Header/data/inst/addr are emitted into separate output chains, then concatenated for output.

### Address compression

- Address mode selection (`SELF`, `HERE`, `NEAR`, `SAME`) is done in `xd3_encode_address` (`xdelta/xdelta3/xdelta3.c:1276`).
- It greedily selects the encoding that minimizes integer byte length for the address delta.
- `xd3_update_cache` maintains the NEAR and SAME caches.

---

## 3) Compression Layers

### Primary (VCDIFF) encoding

- ADD/COPY/RUN instruction stream plus data/address side sections.
- Windowed encoding to keep memory bounded for normal encode/decode.

### Secondary compression (optional, per section)

- Implemented through `xd3_sec_type` and `xd3_encode_secondary`/`xd3_decode_secondary`.
- Supported algorithms (compile-time dependent):
  - DJW static/semi-adaptive Huffman (`xdelta3-djw.h`)
  - FGK adaptive Huffman (`xdelta3-fgk.h`)
  - LZMA (`xdelta3-lzma.h`)
- Can be enabled per section (DATA/INST/ADDR) using flags.

### External compression (CLI convenience layer)

- Detects compressed inputs by magic numbers and forks decompressor pipelines in `xdelta3-main.h`.
- Recompression can be applied on output in decode flows.
- Supported external types table (`extcomp_types`):
  - `bzip2`, `gzip`, `compress/uncompress`, `xz`

---

## VCDIFF Format Implementation Details

## Header and Window Fields

Decoder in `xdelta/xdelta3/xdelta3-decode.h` parses:

- File magic/version:
  - bytes: `0xD6 0xC3 0xC4 0x00`
- Header indicator bits:
  - `VCD_SECONDARY`, `VCD_CODETABLE`, `VCD_APPHEADER`
- Optional secondary ID
- Optional code-table block metadata
- Optional application header
- Per-window fields:
  - window indicator (`VCD_SOURCE`, `VCD_TARGET`, `VCD_ADLER32`)
  - copy window len/offset
  - encoding len
  - target window len
  - delta indicator (`VCD_DATACOMP`, `VCD_INSTCOMP`, `VCD_ADDRCOMP`)
  - data/inst/addr section lengths
  - optional Adler-32

## Variable-length integer coding

- Size/offset integers use base-128 varint style helpers in `xdelta3-internal.h` (`xd3_decode_size`, `xd3_emit_size`, etc.).
- Overflow checks are enforced during parse (`USIZE_T_OVERFLOW`, `XOFF_T_OVERFLOW`).

## Address modes and cache

- Mode decode/encode uses RFC NEAR/SAME cache semantics (`xd3_decode_address`, `xd3_encode_address`).
- Cache state is reset per window (`xd3_init_cache`) and updated per COPY.

## Section decode strategy

- Decoder can avoid copies when all section bytes are already contiguous in `avail_in` (`xd3_decode_section` zero-copy branch).
- Otherwise it allocates section staging buffers (`copied1`) and optionally second buffers after secondary decompression (`copied2`).

## Checksums

- Window checksum algorithm is Adler-32 only.
- Encode: optional per-window checksum emit.
- Decode: optional checksum verification; can be disabled with `XD3_ADLER32_NOVER`/`-n`.

## Explicitly unsupported or partial spec areas

- `VCD_CODETABLE` decode path returns unimplemented (`"VCD_CODETABLE support was removed"`).
- `VCD_TARGET` output reconstruction is marked unimplemented in decode setup.
- Default table encode path is fully implemented; arbitrary custom-table decode is not available in this branch.

---

## CLI Commands, Flags, and Behaviors

Parser is in `xdelta/xdelta3/xdelta3-main.h` and is custom (not `getopt`), with optional-argument support for `-A` and `-S`.

## Commands

- `encode` (default when encoder compiled)
- `decode`
- `config`
- `test` (only if compiled with regression tests)
- `printhdr`
- `printhdrs`
- `printdelta`
- `recode`
- `merge`

## Standard flags

| Flag | Behavior |
|---|---|
| `-0..-9` | Compression level (maps to matcher profile; `-0` also forces `XD3_NOCOMPRESS`). |
| `-c` | Force stdout output. |
| `-d` | Decode mode (equivalent to `decode` command). |
| `-e` | Encode mode (equivalent to `encode` command). |
| `-f` | Overwrite output; also used in external-compression force/garbage handling paths. |
| `-F` | Pass `-f` to external compressor/decompressor subprocesses (`EXTERNAL_COMPRESSION` builds only). |
| `-h` | Help/usage output. |
| `-q` | Quiet mode (suppresses verbose output). |
| `-v` | Increase verbosity (repeatable). |
| `-V` | Version output. |

## Memory/window flags

| Flag | Behavior |
|---|---|
| `-B bytes` | Source window size (`option_srcwinsz`), bounded by `[XD3_MINSRCWINSZ, XD3_MAXSRCWINSZ]`. |
| `-W bytes` | Input window size (`winsize`), bounded by `[XD3_ALLOCSIZE, XD3_HARDMAXWINSIZE]`. |
| `-P size` | Duplicate-window tracking size (`sprevsz`), affects small-match chain memory. |
| `-I size` | IOPT instruction buffer entries (`0` means unlimited growth mode). |

## Compression and metadata flags

| Flag | Behavior |
|---|---|
| `-s source` | Source file path for delta mode. |
| `-S [lzma\|djw\|fgk\|none]` | Secondary compression select; no argument disables secondary. If `-S` is omitted entirely, CLI defaults to LZMA when compiled, otherwise none. `djwN` (`N=0..9`) tunes section application aggressiveness. |
| `-N` | Disable small-string matching/target-side compression (`XD3_NOCOMPRESS`). |
| `-D` | Disable external input decompression (CLI wrapper feature). |
| `-R` | Disable external output recompression (CLI wrapper feature). |
| `-n` | Disable checksum generation/verification behavior. |
| `-C a,b,c,d,e,f,g` | Soft matcher tuning values (7 integers), undocumented/advanced. |
| `-A [apphead]` | With argument: set app header string; with no arg: disable app header. |
| `-J` | No output writes; decode/compute/check only. |
| `-m file` | Additional merge input (used by `merge` command). |

## Parsing and positional behavior details

- Positional arguments: up to two (`input`, `output`).
- If `input` missing: stdin.
- If `output` missing: derived later (appheader/default/stdout flow).
- `-c` overrides positional output filename.
- `XDELTA` environment variable is prepended as extra args (space-split, no shell-like quote parser).
- Optional arg forms accepted for `-A`/`-S`: `-Afoo`, `-A foo`, `-A=foo`, and no-arg variants.

## Command-specific execution behavior

- `encode`: uses `xd3_encode_input`.
- `decode`: uses `xd3_decode_input`.
- `printhdr`/`printhdrs`/`printdelta`: decode-only introspection with `XD3_JUST_HDR`/`XD3_SKIP_WINDOW`/`XD3_SKIP_EMIT`.
- `recode`: decodes each window and re-emits with updated appheader/secondary/checksum settings.
- `merge`: loads/combines whole-delta instruction streams in memory, then re-encodes.

## Notes on CLI docs vs code

- `xdelta3.1` includes `-T` (alternate code table), but parser in this tree does not implement `-T`.
- Option string still contains legacy letters (`E`, `L`, `O`, `M`) without switch handlers.

---

## Compression Level Mapping

CLI level to matcher config (`xdelta3-main.h`):

- `0`: `XD3_NOCOMPRESS` + `XD3_SMATCH_FASTEST`
- `1`: `XD3_SMATCH_FASTEST`
- `2`: `XD3_SMATCH_FASTER`
- `3..5`: `XD3_SMATCH_FAST`
- `6`: `XD3_SMATCH_DEFAULT`
- `7..9`: `XD3_SMATCH_SLOW`

Template profiles are defined in `xdelta/xdelta3/xdelta3-cfgs.h`.

---

## Performance-Critical Paths

These paths dominate CPU or end-to-end latency:

1. Source checksum indexing and source-window movement:
   - `xd3_srcwin_move_point` (`xdelta3.c:4331`)
2. Small-match candidate scan:
   - `xd3_smatch` (`xdelta3.c:4170`)
3. Source match extension across blocks:
   - `xd3_source_extend_match` (`xdelta3.c:3889`)
4. IOPT overlap compaction and emission:
   - `xd3_iopt_flush_instructions` (`xdelta3.c:2313`)
5. Address mode selection per COPY:
   - `xd3_encode_address` (`xdelta3.c:1276`)
6. Decode instruction execution:
   - `xd3_decode_output_halfinst` (`xdelta3-decode.h:401`) (commented as decoder hotspot)
7. Secondary compressors (especially DJW/FGK):
   - `xdelta3-djw.h`, `xdelta3-fgk.h`
8. External compression process setup:
   - multiple `fork/pipe/dup/exec` stages in `xdelta3-main.h`

---

## Data Structures and Relationships

## Core runtime graph

`xd3_stream` is the center object (`xdelta3.h:868`) and owns references/state for:

- Source context: `xd3_source *src`
- Hash configs/tables: `large_hash`, `small_hash`, `large_table`, `small_table`, `small_prev`
- Address cache: `xd3_addr_cache acache`
- Encoder instruction queues: `iopt_used`, `iopt_free`, `iopt_alloc`, `iout`
- Output section lists: `enc_heads[HDR/DATA/INST/ADDR]`, `enc_tails[]`, `enc_free`
- Decoder sections/buffers: `data_sect`, `inst_sect`, `addr_sect`, `dec_buffer`, `dec_lastwin`
- Secondary compression streams: `sec_stream_d`, `sec_stream_i`, `sec_stream_a`
- Whole-delta state for merge/recode tools: `whole_target`

## Supporting structures

- `xd3_rinst`: pending instruction record in IOPT queue.
- `xd3_output`: paged output block (`base`, `next`, `avail`, `next_page`).
- `xd3_desect`: decoded section view + allocated backing.
- `xd3_hash_cfg`: rolling-hash parameters and precomputed powers.
- `xd3_addr_cache`: NEAR and SAME caches.
- `xd3_whole_state`: fully materialized instruction/add buffers for merge/recode.

## CLI-side data structures

- `main_file`: OS-abstracted file handle + metadata/compressor context.
- `main_blklru` (blkcache): source block cache entries sharing one large allocation.
- `main_extcomp`: external compressor/decompressor command metadata.

---

## Memory Management Strategies

## Library memory strategy

- Allocation API is pluggable via callbacks (`config.alloc`, `config.freef`).
- Default library allocator is `malloc/free`.
- Many structures are lazy-allocated:
  - Hash tables and checksum tables allocated on first real matching use.
  - Decoder section buffers allocated only when section copy is required.
- Output uses fixed-size pages (`XD3_ALLOCSIZE`) linked as chains.
- IOPT can be fixed-size or unlimited:
  - `iopt_size > 0`: fixed queue, flushes when full.
  - `iopt_size == 0`: dynamically grows by chunk allocations.

## CLI memory strategy

- `main_bufalloc/main_buffree` use `VirtualAlloc/VirtualFree` on Win32, `malloc/free` elsewhere.
- Source window buffer:
  - One allocation of `option_srcwinsz`.
  - Partitioned into up to `MAX_LRU_SIZE` blocks (32) for LRU/FIFO behavior.
- `main_cleanup` and `xd3_free_stream` aggressively free all owned buffers and reset state.

## Bounded-memory guarantees and exceptions

- Normal encode/decode are windowed and bounded by configured window sizes.
- Decoder hard-limits window size (`XD3_HARDMAXWINSIZE`) to reject malicious inputs.
- `merge`/`recode` whole-state tools intentionally buffer complete instruction streams in memory (not constant-space).

---

## Feature Checklist

## Encoding/decoding modes

- [x] CLI encode (`-e` / `encode`)
- [x] CLI decode (`-d` / `decode`)
- [x] Library non-blocking API (`xd3_encode_input` / `xd3_decode_input`)
- [x] In-memory helpers (`xd3_encode_memory`, `xd3_decode_memory`)
- [x] VCDIFF inspection tools (`printhdr`, `printhdrs`, `printdelta`)
- [x] Re-encode existing VCDIFF (`recode`)
- [x] Merge multiple VCDIFFs (`merge`, `-m`)
- [ ] `VCD_TARGET` decode support (explicitly unimplemented)
- [ ] Custom `VCD_CODETABLE` decode support (removed/unimplemented)

## Compression levels and methods

- [x] Levels `0..9` (matcher profile selection)
- [x] Small-string match compression
- [x] Source-copy delta compression
- [x] Target-copy compression
- [x] Run-length encoding (`RUN`)
- [x] Undocumented soft matcher tuning (`-C`, 7 fields)
- [x] Disable target-side compression (`-N` / `XD3_NOCOMPRESS`)

## Streaming capabilities

- [x] Re-entrant stream state machine
- [x] Incremental input via `xd3_avail_input`
- [x] Incremental output via `XD3_OUTPUT` + `xd3_consume_output`
- [x] Async source-block fetch option (`XD3_GETSRCBLK`)
- [x] CLI windowed processing with EOF flush

## Buffer management

- [x] Tunable input window (`-W`)
- [x] Tunable source window (`-B`)
- [x] Tunable IOPT queue (`-I`)
- [x] Tunable duplicate-chain memory (`-P`)
- [x] Paged output section buffers (`XD3_ALLOCSIZE`)
- [x] Source block LRU/FIFO cache (`MAX_LRU_SIZE`)
- [x] Zero-copy decoder section path when possible

## External compression support

- [x] Auto-detect compressed input by magic bytes
- [x] External decompression subprocess pipeline
- [x] External recompression on output
- [x] Supported external formats: `gzip`, `bzip2`, `compress`, `xz`
- [x] Flags `-D`, `-R`, `-F` for control
- [ ] Guaranteed bit-identical recompression output (not guaranteed)

## Secondary compression

- [x] DJW static/semi-adaptive Huffman
- [x] FGK adaptive Huffman
- [x] LZMA secondary
- [x] Per-section enable/disable (`DATA/INST/ADDR`)
- [x] Secondary section delta-indicator bits (`VCD_DATACOMP`, etc.)
- [ ] Standardized IANA secondary IDs (uses project-specific IDs)

## Checksum options

- [x] Adler-32 window checksum emit/verify
- [x] Disable checksum behavior via `-n` (`XD3_ADLER32_NOVER` on decode)
- [ ] MD5
- [ ] SHA-1
- [ ] SHA-256

---

## C-Code Bottlenecks and Optimization Opportunities

Priority is based on likely runtime impact in typical large-file delta workloads.

## High Priority

1. Source checksum indexing loop (`xd3_srcwin_move_point`)

- Why it hurts: heavy per-byte/per-step rolling checksum work + frequent source block fetch checks.
- Evidence: source comments mark it as one of the most expensive and critical paths (`xdelta3.c:4327`).
- Opportunities:
  - Specialize/inline `large_look` and `large_step` for active matcher profile.
  - SIMD/vectorized checksum kernels.
  - Better move-point policy (currently admits being arbitrary in comments).
  - Reduce hash insert pressure with adaptive sampling under low match rates.

2. Small-match candidate expansion (`xd3_smatch`)

- Why it hurts: byte-by-byte compare and linked-chain traversal in hot loop.
- Opportunities:
  - Replace byte-at-a-time compare with word/SIMD compare where safe.
  - Tighten candidate pruning using address-cost estimate earlier.
  - Consider secondary hash to cut false positives before full compare.

3. Source match extension (`xd3_source_extend_match`)

- Why it hurts: backward extension uses scalar loops and may cross many block boundaries with repeated `xd3_getblk`.
- Opportunities:
  - Chunked backward compare similar to existing forward chunk path.
  - Keep more contiguous source mapped to reduce `getblk` churn.
  - Rework TOOFARBACK handling to avoid repeated failed probes.

4. Decoder output execution (`xd3_decode_output_halfinst`)

- Why it hurts: called for every instruction and performs all copies/adds/runs.
- Opportunities:
  - Use `memmove` fast path for overlapping target copies (instead of manual byte loop).
  - Fuse checksum computation with output-copy loop when checksum verification is enabled.
  - Special-case very common instruction patterns for branch reduction.

## Medium Priority

5. IOPT overlap compaction (`xd3_iopt_flush_instructions`)

- Why it hurts: list-walk overlap resolution is branchy and can degrade with many candidates.
- Opportunities:
  - Convert active queue to contiguous array for cache locality at flush time.
  - Introduce scoring model that includes address cache cost, not only size overlap.
  - Consider bounded DP for dense-overlap regions.

6. Window hash-table reset cost

- Why it hurts: `small_table` memset each window can be expensive at large table sizes.
- Opportunities:
  - Generation-tagged table entries to avoid full clears.
  - Keep absolute offsets (comment already notes this as a possible improvement).

7. Secondary DJW/FGK runtime

- Why it hurts: deeply nested loops, many passes, bit-level encode/decode, adaptive/group rebuild cost.
- Opportunities:
  - Table-driven decode in DJW (already hinted in code comments).
  - Earlier inefficiency bailout.
  - Parallelize section compression (`DATA`, `INST`, `ADDR`) when latency matters.

8. External compression process overhead

- Why it hurts: `fork/pipe/exec` overhead dominates small and medium files.
- Opportunities:
  - Optional in-process codec libraries for gzip/bzip2/xz paths.
  - Reuse worker subprocess for batch operations.

## Low Priority / Correctness-Adjacent Findings

9. Secondary compression logging condition

- `main_set_secondary_flags` prints selected algorithm using bitwise-OR checks, which can misreport.
- Low runtime impact, but confusing observability.

10. CLI level-to-LZMA preset propagation

- LZMA secondary init reads `XD3_COMPLEVEL_*` flags, but CLI path primarily maps levels to matcher config and does not set those bits.
- Potential compression-ratio/speed tuning mismatch for `-S lzma`.

---

## Summary

The repository contains a mature and heavily optimized VCDIFF encoder/decoder with a rich CLI wrapper and optional secondary/external compression support. The core architecture is robust and re-entrant, with careful overflow checks and bounded window decoding, but there are known performance hotspots in match indexing/extension, decoder output execution, and secondary compressors. The biggest practical optimization gains are likely in `xd3_srcwin_move_point`, `xd3_smatch`, and `xd3_decode_output_halfinst`.