Crate simd_lookup

Crate simd_lookup 

Source
Expand description

§simd-lookup

High-performance SIMD utilities for fast table lookups, compression, and data processing.

§Features

  • Cross-platform SIMD: Automatic dispatch to optimal implementation (AVX-512, AVX2, NEON)
  • Zero-cost abstractions: Thin wrappers over platform intrinsics via the wide crate
  • ARM NEON optimized: Compress operations achieve up to 12 Gelem/s on Apple Silicon

§Core Modules

  • simd_compress — Stream compaction (VCOMPRESS): pack selected elements by bitmask
  • simd_gather — Parallel memory gather with SIMD indices
  • small_table — 64-byte lookup tables using ARM TBL4 / AVX-512 VPERMB
  • wide_utils — Shuffle, widen, split, and bitmask utilities for wide types
  • prefetch — Cross-platform memory prefetch (L1/L2/L3 hints)

§Quick Example

use simd_lookup::{compress_store_u32x8};
use wide::u32x8;

// Compress: select elements where mask bits are set
let data = u32x8::from([10, 20, 30, 40, 50, 60, 70, 80]);
let mask = 0b10110010u8; // Select positions 1, 4, 5, 7
let mut output = [0u32; 8];

let count = compress_store_u32x8(data, mask, &mut output);
assert_eq!(count, 4);
assert_eq!(&output[..count], &[20, 50, 60, 80]);

§Platform Support

PlatformOptimization Level
ARM aarch64 (Apple Silicon)Full NEON optimization
x86-64 AVX-512 (Ice Lake+)Native compress/gather
x86-64 AVX2Shuffle-based fallbacks
OtherScalar fallbacks

Re-exports§

pub use eight_value_lookup::EightValueLookup;
pub use entropy_map_lookup::EntropyMapBitpackedLookup;
pub use entropy_map_lookup::EntropyMapLookup;
pub use lookup_kernel::PipelinedSingleTableU32U8Lookup;
pub use lookup_kernel::SimdCascadingTableU32U8Lookup;
pub use lookup_kernel::SimdDualTableWithHashLookup;
pub use simd_compress::compress_store_u32x8;
pub use simd_compress::compress_store_u32x16;
pub use simd_compress::compress_store_u8x16;
pub use simd_compress::compress_u32x8;
pub use simd_compress::compress_u32x16;
pub use simd_compress::compress_u8x16;
pub use wide_utils::FromBitmask;
pub use wide_utils::SimdSplit;
pub use wide_utils::WideUtilsExt;
pub use simd_gather::gather_u32index_u8;
pub use simd_gather::gather_masked_u32index_u8;
pub use simd_gather::gather_u32index_u32;
pub use simd_gather::gather_masked_u32index_u32;

Modules§

bulk_vec_extender
BulkVecExtender is a simple utility trait that allows you to bulk extend a Vec and return a &mut [T] slice that you can write to - much faster than individual push() calls, which has to check for both bounds and capacity.
eight_value_lookup
SIMD-accelerated lookup for finding positions in small u32 tables
entropy_map_lookup
Entropy-map based lookup tables using Perfect Hash Functions (PHFs)
lookup_kernel
Arrow-style “lookup kernel” similar to arrow-select::take::take kernel. There are “columnar style” which does table lookups for one table first, and then cascading to another table, and the Cascading kernel uses SIMD extensively. Other kernels just do scalar lookups which are often as fast as SIMD GATHER, but all allow SIMD functions to operate on looked up values.
prefetch
Cross-platform prefetch intrinsics for x86 and ARM architectures.
simd_compress
SIMD compress operations
simd_gather
SIMD gather operations for efficient indexed memory access
small_table
SIMD enabled efficient small table lookups - for 64 entries or 64K entries. May be 2-D lookups as well.
wide_utils
SIMD utilities and trait extensions for the wide crate