pub struct SimdCascadingTableU32U8Lookup<'a> { /* private fields */ }Expand description
SIMD “Cascading” 2nd/3rd Table Lookup Kernel
This kernel is designed to “cascade” and build on top of the primary SingleTable kernel to efficiently look up secondary or additional tables. How does this work?
- First call SimdSingleTableU32U8Lookup to look up the primary table, using the
lookup_compress_into_nonzeroes()method. This returns compressed results and indices of the nonzero results. - Now feed these Vecs into this kernel, which uses compressed output to do a packed lookup into the second table. This is faster than having to filter all the results from the first kernel.
- The lookup function is called for nonzero table1 results and looked up second table lookups, and should return results for all 16 values in the u8x16.
- Then, this kernel will COMPRESS the results and again output nonzero results and indices, filtered from the input.
Basically, this kernel can be cascaded for additional tables.
The theory is that this cascading and packed lookup approach allows us to come closest to kernels where even with multiple tables, the runtime is roughly O(num_nonzero_lookups). UPDATE 12/2/2025: Intel Xeon results show that, even at huge (15M) tables, this results in a 40% speedup over the V2 kernel. The speedups increase for smaller table sizes - 4M shows over 50% increase, and even bigger for smaller tables - which shows that this design inherently scales well.
Implementations§
Source§impl<'a> SimdCascadingTableU32U8Lookup<'a>
impl<'a> SimdCascadingTableU32U8Lookup<'a>
pub fn new(lookup_table: &'a [u8]) -> Self
Sourcepub fn cascading_lookup<F>(
&self,
values: &[u32],
in_nonzero_results: &[u8],
in_indices: &[u32],
f: F,
out_results: &mut Vec<u8>,
out_indices: &mut Vec<u32>,
)
pub fn cascading_lookup<F>( &self, values: &[u32], in_nonzero_results: &[u8], in_indices: &[u32], f: F, out_results: &mut Vec<u8>, out_indices: &mut Vec<u32>, )
Given a slice of u32 values, looks up each one. Designed to work in cascading mode. One needs to pass in the nonzero_results and indices output from SimdSingleTableU32U8Lookup::lookup_compress_into_nonzeroes(), along with the values (which are the keys for the lookup table in this struct).
For this to be efficient, the length of values probably should be at least hundreds or thousands of values.
§Arguments
values- &u32 of indices to lookup. NOTE: these are ORIGINAL values, NOT filtered, thus its length should be the same length as the values fed into SimdSingleTableU32U8Lookup kernel. In other words, the length of values will probably be larger than in_nonzero_results.in_nonzero_results- &u8 of nonzero results from SimdSingleTableU32U8Lookup::lookup_compress_into_nonzeroes()in_indices- &u32 of indices from SimdSingleTableU32U8Lookup::lookup_compress_into_nonzeroes() These indices should be indices into the values array.f- function to mix the results from nonzero_results and the looked up values from this lookup table. The results (u8x16) returned from this function, will be zero-compressed along with indices to generate more nonzero output.out_results- &mut Vecto store the nonzero results from the lookup function f out_indices- &mut Vec, basically same as input indices but with nonzeroes compressed out
Trait Implementations§
Source§impl<'a> Clone for SimdCascadingTableU32U8Lookup<'a>
impl<'a> Clone for SimdCascadingTableU32U8Lookup<'a>
Source§fn clone(&self) -> SimdCascadingTableU32U8Lookup<'a>
fn clone(&self) -> SimdCascadingTableU32U8Lookup<'a>
1.0.0§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreAuto Trait Implementations§
impl<'a> Freeze for SimdCascadingTableU32U8Lookup<'a>
impl<'a> RefUnwindSafe for SimdCascadingTableU32U8Lookup<'a>
impl<'a> Send for SimdCascadingTableU32U8Lookup<'a>
impl<'a> Sync for SimdCascadingTableU32U8Lookup<'a>
impl<'a> Unpin for SimdCascadingTableU32U8Lookup<'a>
impl<'a> UnwindSafe for SimdCascadingTableU32U8Lookup<'a>
Blanket Implementations§
§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
§unsafe fn clone_to_uninit(&self, dest: *mut u8)
unsafe fn clone_to_uninit(&self, dest: *mut u8)
clone_to_uninit)