simdsieve 0.1.1

SIMD-accelerated byte pattern pre-filtering with AVX-512, AVX2, NEON, and scalar fallback
Documentation

simdsieve

Crates.io Docs.rs License: MIT

A SIMD-accelerated multi-pattern pre-filtering engine with zero-allocation streaming iteration. Scans byte haystacks for up to 16 patterns simultaneously using the widest SIMD instructions available.

What It Does

simdsieve is a first-pass candidate filter for substring search:

  1. Prefix Extraction: Takes the first 1-4 bytes of each pattern
  2. SIMD Filtering: Scans haystack in 32-128 byte blocks, finding positions where prefixes match using vectorized comparison
  3. Verification: Confirms full pattern matches before yielding
  4. Zero False Positives: Every yielded offset is a verified match start

Use this when you need to quickly filter large data streams for multiple patterns before applying more expensive analysis.

Architecture

Backend Instruction Set Block Size Throughput* Registers
AVX-512 AVX-512F + AVX-512BW 128 bytes >50 GB/s 512-bit
AVX2 AVX2 64 bytes >25 GB/s 256-bit
NEON AArch64 NEON 64 bytes >15 GB/s 128-bit
Scalar Portable Rust 64 bytes >2 GB/s 64-bit

* Single-byte pattern on modern x86_64. Multi-byte patterns scale linearly with prefix length.

Backend Selection

The optimal backend is auto-selected at construction time:

  1. AVX-512: If avx512f and avx512bw CPU features are detected
  2. AVX2: If avx2 is detected
  3. NEON: On aarch64 targets (guaranteed by the architecture)
  4. Scalar: Portable fallback for all other platforms

Dual-Pump Processing

AVX-512, AVX2, and NEON use "dual-pump" processing where each logical block is split into two halves processed together. This hides load latency by interleaving independent operations across both load ports.

Design Limits

  • 16 patterns maximum: Unrolled register layout saturates execution ports
  • 4-byte prefix: First 4 bytes used for SIMD filtering; full pattern for verification
  • ASCII case-insensitive: Only a-z folded; non-ASCII compared verbatim

Usage

Add to Cargo.toml:

[dependencies]
simdsieve = "0.1"

Basic exact matching:

use simdsieve::SimdSieve;

let haystack = b"The quick brown fox jumps over the lazy dog";
let patterns = &[b"fox"[..], b"dog"[..]];

let sieve = SimdSieve::new(haystack, patterns).unwrap();
let matches: Vec<usize> = sieve.collect();

assert_eq!(matches, vec![16, 40]); // Positions of "fox" and "dog"

Case-insensitive matching:

use simdsieve::SimdSieve;

let haystack = b"Hello World HELLO";
let patterns = &[b"hello"[..]];

let sieve = SimdSieve::new_case_insensitive(haystack, patterns).unwrap();
let matches: Vec<usize> = sieve.collect();

assert_eq!(matches, vec![0, 12]); // Matches "Hello" and "HELLO"

Density estimation (without full verification):

use simdsieve::SimdSieve;

let haystack = b"aaaaaa";
let patterns = &[b"a"[..]];

let count = SimdSieve::estimate_match_count(haystack, patterns, false);
assert_eq!(count, 6); // One prefix hit at each position

Iterator Behavior

SimdSieve implements Iterator<Item = usize> and FusedIterator:

  • Yields match offsets in ascending order
  • After returning None, all subsequent calls return None
  • Zero heap allocation during iteration
  • Thread-safe: Send and Sync (immutable iterator)

Error Handling

Construction can fail with SimdSieveError:

  • EmptyPatternSet: No patterns provided
  • PatternLimitExceeded(usize): More than 16 patterns provided
use simdsieve::{SimdSieve, SimdSieveError};

let result = SimdSieve::new(b"haystack", &[]);
assert!(matches!(result, Err(SimdSieveError::EmptyPatternSet)));

Platform Support

Platform Backend Notes
x86_64 AVX-512, AVX2, Scalar Runtime feature detection
aarch64 NEON, Scalar NEON guaranteed on AArch64
wasm32 Scalar No SIMD backend yet
other Scalar Portable fallback

Safety

simdsieve contains a small amount of unsafe code confined to platform-specific backend modules. Every unsafe block has been audited for soundness:

  • SIMD loads use unaligned load intrinsics (loadu), safe for any valid pointer alignment.
  • Block bounds are verified before each SIMD call; multi-byte prefixes require block_len >= block_size + max_prefix_len - 1 trailing bytes.
  • Target features are detected at construction; #[target_feature] gates prevent executing unsupported instructions.
  • No uninitialized memory is read; SIMD struct arrays are zero-initialized.
  • unsafe_op_in_unsafe_fn is forbidden at the crate level.

Performance Tuning

When to Use This Crate

Good for:

  • Filtering large streams (>1MB) for multiple patterns
  • Finding candidate positions for regex engines
  • Log analysis with multiple fixed-string patterns
  • Network packet inspection with signature matching

Not ideal for:

  • Single small haystacks (<1KB)
  • Patterns longer than 4 bytes that differ only after position 4
  • When you need capture groups or regex features

Prefetching

The engine automatically issues prefetch hints 512 bytes ahead. This value (8 cache lines) balances:

  • Enough lookahead to hide memory latency on modern CPUs
  • Not so much that we evict useful data from cache

Pattern Selection

For best performance:

  • Use patterns with distinct first bytes when possible
  • Place more frequent patterns earlier in the pattern list
  • For case-insensitive mode, use lowercase in patterns

Benchmarks

Run benchmarks with:

cargo bench

Example throughput on AMD EPYC 9R14 (AVX-512):

Haystack Patterns Throughput
1 MB 1 ~55 GB/s
1 MB 4 ~45 GB/s
1 MB 16 ~30 GB/s

License

Licensed under the MIT License. See LICENSE for details.

Contributing

Contributions welcome! Areas of interest:

  • WebAssembly SIMD backend
  • Additional benchmarks and fuzz targets
  • New verification strategies

Please ensure:

  • cargo test passes
  • cargo clippy -- -D warnings is clean
  • cargo fmt --check passes
  • New code has module-level and item-level documentation