# REFINEMENT KNOWLEDGE BASE
**Generated:** 2026-04-17
## OVERVIEW
Sub-pixel interpolation filters for motion search refinement. Provides half-pixel and quarter-pixel samples.
## STRUCTURE
```
src/refine/
├── bicubic/ # 4-tap Catmull-Rom (rust.rs + avx2.rs + avx512.rs + tests.rs)
├── bilinear/ # 2-tap averaging; 3 kernels H/V/D (rust.rs + avx2.rs + avx512.rs + tests.rs)
├── wiener/ # 6-tap optimized; fast + exact u16 paths (rust.rs + avx2.rs + avx512.rs + tests.rs)
└── refine.rs # Module root, refine_ext_pel{2,4} orchestration
```
## WHERE TO LOOK
| Bilinear | `bilinear/` | 2-tap averaging; fastest; 3 kernels: H/V/D |
| Bicubic | `bicubic/` | 4-tap Catmull-Rom; smoother than bilinear |
| Wiener | `wiener/` | 6-tap optimized; highest quality |
## CONVENTIONS (REFINE-SPECIFIC)
- Each method directory: `rust.rs` + `avx2.rs` + `avx512.rs` + `tests.rs`; AVX2 compiles with `avx2`, AVX-512 with `avx512`.
- Called from `MVPlane::refine()` as sub-pixel interpolation backend.
- `SubpelMethod` values: `Bilinear = 0`, `Bicubic = 1`, `Wiener = 2`. Never change numeric values.
- Output window counts fixed by pel factor: `pel2` = 3 windows, `pel4` = 15 windows.
- H/V/D passes must stay numerically consistent between scalar, AVX2, and AVX-512 paths.
- AVX-512 implementations mirror `src/sad/avx512.rs` dispatch: runtime picks `cpudetect::x86_64::is_x86_64_v4_compatible()` first, then `cpudetect::x86_64::is_x86_64_v3_compatible()`, then scalar.
- Refine widths are runtime, so AVX-512 kernels use three in-function phases with shared loop counters: AVX-512 main loop, AVX2 tail loop, then scalar tail.
- Shared per-iteration AVX2 kernels live in `avx2.rs` as `#[target_family("x86_64_v3")] #[inline] pub(super) unsafe fn` helpers, reused by both AVX2 entries and AVX-512 Phase-2 tails. Trivial single-intrinsic cases stay inline at call sites.
- Current shared helpers: `apply_diagonal_bilinear_u{8,16}_avx2`, `apply_bicubic_kernel_{u8,u16_fast,u16_exact}_avx2`, `apply_wiener_kernel_{u8_fast,u16_fast,u16}`.
- All AVX-512 functions use `#[target_family("x86_64_v4")]`.
- Test generator macros cover `rust`, `avx2`, and `avx512` backends. Each `tests.rs` exposes `should_run(module)` and compiles SIMD backends behind matching Cargo features.
## ANTI-PATTERNS (REFINE)
| Changing `SubpelMethod` numeric values | FORBIDDEN | Keep wire values stable (`0/1/2`) |
| Generating incomplete window sets | FORBIDDEN | Always emit full set for `pel2`/`pel4` |
| Divergent scalar/SIMD coefficients | DENIED | Keep taps/rounding identical |
| Bypassing `MVPlane::refine()` dispatch | DISCOURAGED | Route through central refine path |