pub struct SimdSingleTableU32U8Lookup<'a> { /* private fields */ }Expand description
Single table lookup kernel with SIMD function - u32 to u8 lookup table kernel. The user is responsible for generating the lookup table - so this can be used for different use cases, including CASE..WHEN and bitmasking/filtering.
It allows for SIMD operations on looked up values, but SIMD isn’t actually used in the lookups themselves as there aren’t major advantages for SIMD in terms of lookup for huge tables with random indices. However, we look up 16 values at a time for efficiency. This kernel makes sense to call on hundreds or thousands of values at a time, columnar style.
Note: for general purpose expressions where the lookup can be any type, instead just use arrow::compute::take()
to do a very efficient lookup where the lookup table can be any type, but then you pay the cost of write memory
I/O. These kernels here allow user to operate on each looked up u8x16 and do something.
Implementations§
Source§impl<'a> SimdSingleTableU32U8Lookup<'a>
impl<'a> SimdSingleTableU32U8Lookup<'a>
pub fn new(lookup_table: &'a [u8]) -> Self
Sourcepub fn lookup_func<F>(&self, values: &[u32], f: &mut F)
pub fn lookup_func<F>(&self, values: &[u32], f: &mut F)
Given a slice of u32 values, looks up each one and calls the user given function on an assembled u8x16 (16 looked up values) at a time.
The user function is passed (lookedup_values: u8x16, num_bytes: usize), where num_bytes is 16 other than the last/remainder chunk, where it may be less than that.
If the slice does not divide evenly into 16-item chunks, the rest is handled by filling missing values in the u8x16 with zeroes. Thus, the lookup assumes the zero is basically a NOP.
Sourcepub fn lookup_into_vec(&self, values: &[u32], buffer: &mut Vec<u8>)
pub fn lookup_into_vec(&self, values: &[u32], buffer: &mut Vec<u8>)
Convenience function which does lookup and writes the results into a Vec of the same length as the input slice. Does not transform the looked up values. Actually, extends a mutable Vec of u8.
Sourcepub fn lookup_into_u8x16_buffer(&self, values: &[u32], buffer: &mut [u8x16])
pub fn lookup_into_u8x16_buffer(&self, values: &[u32], buffer: &mut [u8x16])
Version of lookup_into_vec which writes into a mutable u8x16 buffer, for cascaded lookups
Sourcepub fn lookup_extend_u8x16_vec(&self, values: &[u32], vec: &mut Vec<u8x16>)
pub fn lookup_extend_u8x16_vec(&self, values: &[u32], vec: &mut Vec<u8x16>)
Prepares a Vec of u8x16 for lookup_into_u8x16_buffer by setting the length and preparing. The Vec is extended by the amount necessary to hold the results.
§Safety
- We unsafe set the length because we know we will overwrite every element.
Sourcepub fn lookup_compress_into_nonzeroes(
&self,
values: &[u32],
nonzero_results: &mut Vec<u8>,
indices: &mut Vec<u32>,
base_index: u32,
)
pub fn lookup_compress_into_nonzeroes( &self, values: &[u32], nonzero_results: &mut Vec<u8>, indices: &mut Vec<u32>, base_index: u32, )
Convenience function which compresses and extends two Vecs:
nonzero_results- Vecof nonzero looked up u8 results indices- Vecof indices of the nonzero results
This method is intended to be used with the cascading SIMD kernels which extend lookup into two or more tables by leveraging the nonzero output to do packed lookups into the second table.
§Arguments
values- &u32 of indices to lookupnonzero_results- &mut Vecto store the nonzero looked up u8 results indices- &mut Vecto store the indices of the nonzero results base_index- base index value for the indices output.
For example, if you wanted to extend empty Vecs (reusing them as temporary buffers), then
pass base_index = 0 and the indices will be 0, 16, 32, etc. Also pass empty Vecs, and clear them
every time before calling.
§Performance and Architecture
The lookup function is heavily optimized for Intel AVX512, using VCOMPRESS kernel (simd_compress.rs). Using VCOMPRESS this is nearly as fast as lookup_into_vec() which does nothing but copy the results! On other platforms, it falls back to a scalar approach which will be potentially much slower.
Trait Implementations§
Source§impl<'a> Clone for SimdSingleTableU32U8Lookup<'a>
impl<'a> Clone for SimdSingleTableU32U8Lookup<'a>
Source§fn clone(&self) -> SimdSingleTableU32U8Lookup<'a>
fn clone(&self) -> SimdSingleTableU32U8Lookup<'a>
1.0.0§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreAuto Trait Implementations§
impl<'a> Freeze for SimdSingleTableU32U8Lookup<'a>
impl<'a> RefUnwindSafe for SimdSingleTableU32U8Lookup<'a>
impl<'a> Send for SimdSingleTableU32U8Lookup<'a>
impl<'a> Sync for SimdSingleTableU32U8Lookup<'a>
impl<'a> Unpin for SimdSingleTableU32U8Lookup<'a>
impl<'a> UnwindSafe for SimdSingleTableU32U8Lookup<'a>
Blanket Implementations§
§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
§unsafe fn clone_to_uninit(&self, dest: *mut u8)
unsafe fn clone_to_uninit(&self, dest: *mut u8)
clone_to_uninit)