SimdCascadingTableU32U8Lookup

Struct SimdCascadingTableU32U8Lookup 

Source
pub struct SimdCascadingTableU32U8Lookup<'a> { /* private fields */ }
Expand description

SIMD “Cascading” 2nd/3rd Table Lookup Kernel

This kernel is designed to “cascade” and build on top of the primary SingleTable kernel to efficiently look up secondary or additional tables. How does this work?

  • First call SimdSingleTableU32U8Lookup to look up the primary table, using the lookup_compress_into_nonzeroes() method. This returns compressed results and indices of the nonzero results.
  • Now feed these Vecs into this kernel, which uses compressed output to do a packed lookup into the second table. This is faster than having to filter all the results from the first kernel.
  • The lookup function is called for nonzero table1 results and looked up second table lookups, and should return results for all 16 values in the u8x16.
  • Then, this kernel will COMPRESS the results and again output nonzero results and indices, filtered from the input.

Basically, this kernel can be cascaded for additional tables.

The theory is that this cascading and packed lookup approach allows us to come closest to kernels where even with multiple tables, the runtime is roughly O(num_nonzero_lookups). UPDATE 12/2/2025: Intel Xeon results show that, even at huge (15M) tables, this results in a 40% speedup over the V2 kernel. The speedups increase for smaller table sizes - 4M shows over 50% increase, and even bigger for smaller tables - which shows that this design inherently scales well.

Implementations§

Source§

impl<'a> SimdCascadingTableU32U8Lookup<'a>

Source

pub fn new(lookup_table: &'a [u8]) -> Self

Source

pub fn cascading_lookup<F>( &self, values: &[u32], in_nonzero_results: &[u8], in_indices: &[u32], f: F, out_results: &mut Vec<u8>, out_indices: &mut Vec<u32>, )
where F: Fn(u8x16, u8x16) -> u8x16,

Given a slice of u32 values, looks up each one. Designed to work in cascading mode. One needs to pass in the nonzero_results and indices output from SimdSingleTableU32U8Lookup::lookup_compress_into_nonzeroes(), along with the values (which are the keys for the lookup table in this struct).

For this to be efficient, the length of values probably should be at least hundreds or thousands of values.

§Arguments
  • values - &u32 of indices to lookup. NOTE: these are ORIGINAL values, NOT filtered, thus its length should be the same length as the values fed into SimdSingleTableU32U8Lookup kernel. In other words, the length of values will probably be larger than in_nonzero_results.
  • in_nonzero_results - &u8 of nonzero results from SimdSingleTableU32U8Lookup::lookup_compress_into_nonzeroes()
  • in_indices - &u32 of indices from SimdSingleTableU32U8Lookup::lookup_compress_into_nonzeroes() These indices should be indices into the values array.
  • f - function to mix the results from nonzero_results and the looked up values from this lookup table. The results (u8x16) returned from this function, will be zero-compressed along with indices to generate more nonzero output.
  • out_results - &mut Vec to store the nonzero results from the lookup function f
  • out_indices - &mut Vec, basically same as input indices but with nonzeroes compressed out

Trait Implementations§

Source§

impl<'a> Clone for SimdCascadingTableU32U8Lookup<'a>

Source§

fn clone(&self) -> SimdCascadingTableU32U8Lookup<'a>

Returns a duplicate of the value. Read more
1.0.0§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl<'a> Debug for SimdCascadingTableU32U8Lookup<'a>

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

Blanket Implementations§

§

impl<T> Any for T
where T: 'static + ?Sized,

§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
§

impl<T> Borrow<T> for T
where T: ?Sized,

§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
§

impl<T> BorrowMut<T> for T
where T: ?Sized,

§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
§

impl<T> CloneToUninit for T
where T: Clone,

§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
§

impl<T> From<T> for T

§

fn from(t: T) -> T

Returns the argument unchanged.

§

impl<T, U> Into<U> for T
where U: From<T>,

§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

§

impl<T> ToOwned for T
where T: Clone,

§

type Owned = T

The resulting type after obtaining ownership.
§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.