zoomvtools 1.1.1

Video motion vector analysis utilities in pure Rust
Documentation
# OVERLAPS KNOWLEDGE BASE

**Generated:** 2026-04-25

## OVERVIEW

Overlap windowing system for block-based motion compensation. Cosine-squared windows, 9 spatial variants per block.

## STRUCTURE

```
src/overlaps/
├── rust.rs    # Scalar overlap application kernels
├── avx2.rs    # AVX2 SIMD overlap kernels (70-75% faster on 32x32+ blocks)
├── tests.rs   # Test generator macro
└── overlaps.rs # Module root (not present - logic in src/overlaps.rs)
```

Module root: `src/overlaps.rs` (338 lines). Subdirectory contains `rust.rs`, `avx2.rs`, `tests.rs`.

## WHERE TO LOOK

| Area | File | Notes |
|------|------|-------|
| OverlapWindows struct | `src/overlaps.rs:17` | 9-window cosine-squared system |
| build_axis_windows | `src/overlaps.rs:112` | Generates window/first/last axis profiles |
| cosine_squared | `src/overlaps.rs:145` | Core window function: `cos²(π·value)` |
| quantize_window | `src/overlaps.rs:151` | Float→u16 quantization (scale 2048) |
| select_overlaps | `src/overlaps.rs:177` | Function pointer dispatch by block size |
| select_to_pixels | `src/overlaps.rs:332` | u16→u8 / u32→u16 conversion dispatch |
| AVX2 kernels | `overlaps/avx2.rs` | 64x64+ gated behind `experimental` feature |

## CONVENTIONS (OVERLAPS-SPECIFIC)

- `OverlapWindows::new()` precomputes all 9 windows at initialization: TL, TM, TR, ML, MM, MR, BL, BM, BR.
- Windows quantized to u16 with `WINDOW_SCALE = 2048.0` for fixed-point arithmetic.
- `select_overlaps<T>()` dispatches by `(size_of::<T>(), width, height)` match table — 27 block sizes for u8, 27 for u16.
- AVX2 variants: 70% faster (u8) / 75% faster (u16) on 32x32+ blocks.
- 64x64+ AVX2 kernels gated behind `#[cfg(feature = "experimental")]` — currently 10-40% slower than scalar.
- `select_to_pixels<T>()` has no AVX2 path — always scalar (u16→u8 or u32→u16).
- Used by `mv_block_fps.rs` for overlap blending in block-based FPS conversion.

## ANTI-PATTERNS

- Adding block sizes to AVX2 dispatch without `experimental` gate for 64x64+ (performance regression).
- Changing `WINDOW_SCALE` without updating quantization logic and test expectations.
- Assuming `select_to_pixels` has SIMD acceleration (scalar only).
- Mixing horizontal+vertical window generation (must stay separate for 9-window combinatorics).