//! SIMD-accelerated filter kernels returning u64 bitmasks.
//!
//! Each kernel compares a column slice against a scalar and returns a
//! packed `Vec<u64>` where bit *i* is set iff the predicate holds for
//! element *i*. One u64 word covers 64 rows.
//!
//! Runtime CPU detection selects the fastest path:
//! - AVX-512 (512-bit, 16 u32 / 8 f64|i64 per op)
//! - AVX2 (256-bit, 8 u32 / 4 f64|i64 per op)
//! - Scalar fallback (auto-vectorized by LLVM)
//!
//! Companion helpers: `popcount`, `bitmask_and`, `bitmask_or`, `bitmask_to_indices`.
pub
pub
pub
pub
pub
pub
pub
pub use ;
pub use ;