Crate simd_lookup

Source

Expand description

§simd-lookup

High-performance SIMD utilities for fast table lookups, compression, and data processing.

§Features

Cross-platform SIMD: Automatic dispatch to optimal implementation (AVX-512, AVX2, NEON)
Zero-cost abstractions: Thin wrappers over platform intrinsics via the wide crate
ARM NEON optimized: Compress operations achieve up to 12 Gelem/s on Apple Silicon

§Core Modules

simd_compress — Stream compaction (VCOMPRESS): pack selected elements by bitmask
simd_gather — Parallel memory gather with SIMD indices
small_table — 64-byte lookup tables using ARM TBL4 / AVX-512 VPERMB
wide_utils — Shuffle, widen, split, and bitmask utilities for wide types
prefetch — Cross-platform memory prefetch (L1/L2/L3 hints)

§Quick Example

use simd_lookup::{compress_store_u32x8};
use wide::u32x8;

// Compress: select elements where mask bits are set
let data = u32x8::from([10, 20, 30, 40, 50, 60, 70, 80]);
let mask = 0b10110010u8; // Select positions 1, 4, 5, 7
let mut output = [0u32; 8];

let count = compress_store_u32x8(data, mask, &mut output);
assert_eq!(count, 4);
assert_eq!(&output[..count], &[20, 50, 60, 80]);

§Platform Support

Platform	Optimization Level
ARM aarch64 (Apple Silicon)	Full NEON optimization
x86-64 AVX-512 (Ice Lake+)	Native compress/gather
x86-64 AVX2	Shuffle-based fallbacks
Other	Scalar fallbacks

Re-exports§

pub use eight_value_lookup::EightValueLookup;
pub use entropy_map_lookup::EntropyMapBitpackedLookup;
pub use entropy_map_lookup::EntropyMapLookup;
pub use lookup_kernel::PipelinedSingleTableU32U8Lookup;
pub use lookup_kernel::SimdCascadingTableU32U8Lookup;
pub use lookup_kernel::SimdDualTableWithHashLookup;
pub use simd_compress::compress_store_u32x8;
pub use simd_compress::compress_store_u32x16;
pub use simd_compress::compress_store_u8x16;
pub use simd_compress::compress_u32x8;
pub use simd_compress::compress_u32x16;
pub use simd_compress::compress_u8x16;
pub use wide_utils::FromBitmask;
pub use wide_utils::SimdSplit;
pub use wide_utils::WideUtilsExt;
pub use simd_gather::gather_u32index_u8;
pub use simd_gather::gather_masked_u32index_u8;
pub use simd_gather::gather_u32index_u32;
pub use simd_gather::gather_masked_u32index_u32;

Modules§

bulk_vec_extender: BulkVecExtender is a simple utility trait that allows you to bulk extend a Vec and return a &mut [T] slice that you can write to - much faster than individual push() calls, which has to check for both bounds and capacity.
eight_value_lookup: SIMD-accelerated lookup for finding positions in small u32 tables
entropy_map_lookup: Entropy-map based lookup tables using Perfect Hash Functions (PHFs)
lookup_kernel: Arrow-style “lookup kernel” similar to arrow-select::take::take kernel. There are “columnar style” which does table lookups for one table first, and then cascading to another table, and the Cascading kernel uses SIMD extensively. Other kernels just do scalar lookups which are often as fast as SIMD GATHER, but all allow SIMD functions to operate on looked up values.
prefetch: Cross-platform prefetch intrinsics for x86 and ARM architectures.
simd_compress: SIMD compress operations
simd_gather: SIMD gather operations for efficient indexed memory access
small_table: SIMD enabled efficient small table lookups - for 64 entries or 64K entries. May be 2-D lookups as well.
wide_utils: SIMD utilities and trait extensions for the wide crate

Crate simd_lookup

Crate simd_lookup Copy item path

§simd-lookup

§Features

§Core Modules

§Quick Example

§Platform Support

Re-exports§

Modules§

Crate simd_lookup