Struct simdeez::sse41::Sse41

source ·
pub struct Sse41;

Trait Implementations§

Vector of i16s. Corresponds to __m128i when used with the Sse impl, __m256i when used with Avx2, or a single i16 when used with Scalar.
Vector of i32s. Corresponds to __m128i when used with the Sse impl, __m256i when used with Avx2, or a single i32 when used with Scalar.
Vector of f32s. Corresponds to __m128 when used with the Sse impl, __m256 when used with Avx2, or a single f32 when used with Scalar.
Vector of f64s. Corresponds to __m128d when used with the Sse impl, __m256d when used with Avx2, or a single f64 when used with Scalar.
Vector of i64s. Corresponds to __m128i when used with the Sse impl, __m256i when used with Avx2, or a single i64 when used with Scalar.
The width of the vector lane. Necessary for creating lane width agnostic code.
Note SSE2 will select B only when all bits are 1, while SSE41 and AVX2 only check the high bit. To maintain portability ensure all bits are 1 when using blend. Results of comparison operations adhere to this.
Note SSE2 will select B only when all bits are 1, while SSE41 and AVX2 only check the high bit. To maintain portability ensure all bits are 1 when using blend. Results of comparison operations adhere to this.
Note SSE2 will select B only when all bits are 1, while SSE41 and AVX2 only check the high bit. To maintain portability ensure all bits are 1 when using blend. Results of comparison operations adhere to this.
Note SSE2 will select B only when all bits are 1, while SSE41 and AVX2 only check the high bit. To maintain portability ensure all bits are 1 when using blend. Results of comparison operations adhere to this.
Currently scalar will have different results in some cases depending on the current SSE rounding mode.
When using Sse2, fastceil uses a faster version of floor that only works on floating point values small enough to fit in an i32. This is a big performance boost if you don’t need a complete floor.
When using Sse2, fastfloor uses a faster version of floor that only works on floating point values small enough to fit in an i32. This is a big performance boost if you don’t need a complete floor.
Actual FMA instructions will be used when Avx2 is used, otherwise a mul and add are used to replicate it, allowing you to just always use FMA in your code and get best perf in both cases.
Actual FMA instructions will be used when Avx2 is used, otherwise a mul and add are used to replicate it, allowing you to just always use FMA in your code and get best perf in both cases.
Actual FMA instructions will be used when Avx2 is used, otherwise a mul and add are used to replicate it, allowing you to just always use FMA in your code and get best perf in both cases.
Actual FMA instructions will be used when Avx2 is used, otherwise a mul and add are used to replicate it, allowing you to just always use FMA in your code and get best perf in both cases.
Actual FMA instructions will be used when Avx2 is used, otherwise a mul and sub are used to replicate it, allowing you to just always use FMA in your code and get best perf in both cases.
Actual FMA instructions will be used when Avx2 is used, otherwise a mul and sub are used to replicate it, allowing you to just always use FMA in your code and get best perf in both cases.
Actual FMA instructions will be used when Avx2 is used, otherwise a mul and sub are used to replicate it, allowing you to just always use FMA in your code and get best perf in both cases.
Actual FMA instructions will be used when Avx2 is used, otherwise a mul and sub are used to replicate it, allowing you to just always use FMA in your code and get best perf in both cases.
Adds all lanes together. Distinct from h_add which adds pairs.
Adds all lanes together. Distinct from h_add which adds pairs.
Sse2 and Sse41 paths will simulate a gather by breaking out and doing scalar array accesses, because gather doesn’t exist until Avx2.
Sse2 and Sse41 paths will simulate a gather by breaking out and doing scalar array accesses, because gather doesn’t exist until Avx2.
Note, SSE2 and SSE4 will load when mask[i] is nonzero, where AVX2 will store only when the high bit is set. To ensure portability ensure that the high bit is set.
Note, SSE2 and SSE4 will load when mask[i] is nonzero, where AVX2 will store only when the high bit is set. To ensure portability ensure that the high bit is set.
Note, SSE2 and SSE4 will load when mask[i] is nonzero, where AVX2 will store only when the high bit is set. To ensure portability ensure that the high bit is set.
Note, SSE2 and SSE4 will load when mask[i] is nonzero, where AVX2 will store only when the high bit is set. To ensure portability ensure that the high bit is set.
Note, SSE2 and SSE4 will store when mask[i] is nonzero, where AVX2 will store only when the high bit is set. To ensure portability ensure the high bit is set.
Note, SSE2 and SSE4 will store when mask[i] is nonzero, where AVX2 will store only when the high bit is set. To ensure portability ensure the high bit is set.
Note, SSE2 and SSE4 will store when mask[i] is nonzero, where AVX2 will store only when the high bit is set. To ensure portability ensure the high bit is set.
Note, SSE2 and SSE4 will store when mask[i] is nonzero, where AVX2 will store only when the high bit is set. To ensure portability ensure the high bit is set.
Mullo is implemented for Sse2 by combining other Sse2 operations.
Round is implemented for Sse2 by combining other Sse2 operations.
When using Sse2, fastround uses a faster version of floor that only works on floating point values small enough to fit in an i32. This is a big performance boost if you don’t need a complete floor.
amt must be a constant
amt must be a constant
amt does not have to be a constant, but may be slower than the srai version
amt does not have to be a constant, but may be slower than the srli version
amt does not have to be a constant, but may be slower than the slli version
amt must be a constant
amt must be a constant

Auto Trait Implementations§

Blanket Implementations§

Gets the TypeId of self. Read more
Immutably borrows from an owned value. Read more
Mutably borrows from an owned value. Read more

Returns the argument unchanged.

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

The type returned in the event of a conversion error.
Performs the conversion.
The type returned in the event of a conversion error.
Performs the conversion.