pub fn diff_correcting(r: &[u8], v: &[u8], opts: &DiffOptions) -> Vec<Command>Expand description
Correcting 1.5-Pass algorithm (Section 7, Figure 8) with fingerprint-based checkpointing (Section 8).
The hash table is auto-sized to max(q, 2 * num_seeds / p) so that checkpoint spacing m ≈ p, giving near-seed-granularity sampling. TABLE_SIZE acts as a floor for small files.
|C| = q (hash table capacity, auto-sized from input). |F| = next_prime(2 * num_R_seeds) (footprint universe, Section 8.1, pp. 347-348: “|F| ≈ 2L”). m = ceil(|F| / |C|) (checkpoint spacing, p. 348). k = checkpoint class (Eq. 3, p. 348).
A seed with fingerprint fp passes the checkpoint test iff (fp % |F|) % m == k. Its table index is (fp % |F|) / m (p. 348: “i = floor(f/m)”).
Step 1 (R pass): compute fingerprint at every R position, apply checkpoint filter, store first-found offset per slot. Steps 3-4 (V scan): compute fingerprint at every V position, apply checkpoint filter, look up matching R offset. Step 5: extend match both forwards and backwards (Section 7, p. 345). Step 6: encode with tail correction via lookback buffer (Section 5.1). Backward extension (Section 8.2, p. 349) recovers true match starts that fall between checkpoint positions.