# SAD KERNELS KNOWLEDGE BASE
**Generated:** 2026-04-17
## OVERVIEW
Sum of Absolute Differences kernels for block matching. Core metric for motion search.
## STRUCTURE
```
src/sad/
├── rust.rs # Scalar implementation
├── avx2.rs # AVX2 SIMD implementation
├── avx512.rs # AVX-512 SIMD implementation with AVX2 fallback below thresholds
├── tests.rs # Test generator macro (get_sad_tests!)
└── sad.rs # Module root, dispatch
```
## WHERE TO LOOK
| Scalar SAD | `rust.rs` | Baseline reference implementation |
| AVX2 SAD | `avx2.rs` | SIMD-optimized, compiled with `avx2` feature |
| AVX-512 SAD | `avx512.rs` | Compiled with `avx512` feature; preferred on `cpudetect::x86_64::is_x86_64_v4_compatible()`, falls back to AVX2 for small blocks |
| Test coverage | `tests.rs` | `get_sad_tests!` macro generates tests for both backends |
| Dispatch | `sad.rs` | `#[cfg]` + cpudetect-backed runtime gate |
## CONVENTIONS
- Follows standard `rust.rs` + `avx2.rs` + `avx512.rs` + `tests.rs` pattern.
- `avx2` feature gates AVX2 backend compilation; `avx512` feature gates AVX-512 compilation and implies `avx2`.
- Dispatch via `#[cfg]` and checking `cpudetect::x86_64::is_x86_64_v4_compatible()` before `cpudetect::x86_64::is_x86_64_v3_compatible()` at runtime.
- AVX-512 thresholds: u8 widths below 32 and u16 widths below 16 pass through to AVX2 via const-generic dispatch.
- Test generator `get_sad_tests!($module)` produces tests for `rust`, `avx2`, and `avx512` backends.
- Tested across sizes: 2x2 through 128x128, plus edge cases (non-square, odd dimensions).
- Both `u8` and `u16` pixel types covered.
- Pitch/padding scenarios tested systematically.
## ANTI-PATTERNS
- Diverging results between scalar and SIMD paths (validated by `verify_asm!` macro).
- Missing edge case coverage for non-square blocks.
- Using direct indexing instead of `semisafe_get()` in test code.