Expand description
§simd-lookup
High-performance SIMD utilities for fast table lookups, compression, and data processing.
§Features
- Cross-platform SIMD: Automatic dispatch to optimal implementation (AVX-512, AVX2, NEON)
- Zero-cost abstractions: Thin wrappers over platform intrinsics via the
widecrate - ARM NEON optimized: Compress operations achieve up to 12 Gelem/s on Apple Silicon
§Core Modules
simd_compress— Stream compaction (VCOMPRESS): pack selected elements by bitmasksimd_gather— Parallel memory gather with SIMD indicessmall_table— 64-byte lookup tables using ARM TBL4 / AVX-512 VPERMBwide_utils— Shuffle, widen, split, and bitmask utilities forwidetypesprefetch— Cross-platform memory prefetch (L1/L2/L3 hints)
§Quick Example
use simd_lookup::{compress_store_u32x8};
use wide::u32x8;
// Compress: select elements where mask bits are set
let data = u32x8::from([10, 20, 30, 40, 50, 60, 70, 80]);
let mask = 0b10110010u8; // Select positions 1, 4, 5, 7
let mut output = [0u32; 8];
let count = compress_store_u32x8(data, mask, &mut output);
assert_eq!(count, 4);
assert_eq!(&output[..count], &[20, 50, 60, 80]);§Platform Support
| Platform | Optimization Level |
|---|---|
| ARM aarch64 (Apple Silicon) | Full NEON optimization |
| x86-64 AVX-512 (Ice Lake+) | Native compress/gather |
| x86-64 AVX2 | Shuffle-based fallbacks |
| Other | Scalar fallbacks |
Re-exports§
pub use eight_value_lookup::EightValueLookup;pub use entropy_map_lookup::EntropyMapBitpackedLookup;pub use entropy_map_lookup::EntropyMapLookup;pub use lookup_kernel::PipelinedSingleTableU32U8Lookup;pub use lookup_kernel::SimdCascadingTableU32U8Lookup;pub use lookup_kernel::SimdDualTableWithHashLookup;pub use simd_compress::compress_store_u32x8;pub use simd_compress::compress_store_u32x16;pub use simd_compress::compress_store_u8x16;pub use simd_compress::compress_u32x8;pub use simd_compress::compress_u32x16;pub use simd_compress::compress_u8x16;pub use wide_utils::FromBitmask;pub use wide_utils::SimdSplit;pub use wide_utils::WideUtilsExt;pub use simd_gather::gather_u32index_u8;pub use simd_gather::gather_masked_u32index_u8;pub use simd_gather::gather_u32index_u32;pub use simd_gather::gather_masked_u32index_u32;
Modules§
- bulk_
vec_ extender - BulkVecExtender is a simple utility trait that allows you to bulk extend a Vec
and return a &mut [T]slice that you can write to - much faster than individualpush()calls, which has to check for both bounds and capacity. - eight_
value_ lookup - SIMD-accelerated lookup for finding positions in small u32 tables
- entropy_
map_ lookup - Entropy-map based lookup tables using Perfect Hash Functions (PHFs)
- lookup_
kernel - Arrow-style “lookup kernel” similar to arrow-select::take::take kernel. There are “columnar style” which does table lookups for one table first, and then cascading to another table, and the Cascading kernel uses SIMD extensively. Other kernels just do scalar lookups which are often as fast as SIMD GATHER, but all allow SIMD functions to operate on looked up values.
- prefetch
- Cross-platform prefetch intrinsics for x86 and ARM architectures.
- simd_
compress - SIMD compress operations
- simd_
gather - SIMD gather operations for efficient indexed memory access
- small_
table - SIMD enabled efficient small table lookups - for 64 entries or 64K entries. May be 2-D lookups as well.
- wide_
utils - SIMD utilities and trait extensions for the
widecrate