[−][src]Crate safe_arch
A crate that safely exposes arch intrinsics via #[cfg()]
.
safe_arch
lets you safely use CPU intrinsics. Those things in the
core::arch
modules. It works purely via #[cfg()]
and
compile time CPU feature declaration. If you want to check for a feature at
runtime and then call an intrinsic or use a fallback path based on that then
this crate is sadly not for you.
SIMD register types are "newtype'd" so that better trait impls can be given
to them, but the inner value is a pub
field so feel free to just grab it
out if you need to. Trait impls of the newtypes include: Default
(zeroed),
From
/Into
of appropriate data types, and appropriate operator
overloading.
- Most intrinsics (like addition and multiplication) are totally safe to use as long as the CPU feature is available. In this case, what you get is 1:1 with the actual intrinsic.
- Some intrinsics take a pointer of an assumed minimum alignment and
validity span. For these, the
safe_arch
function takes a reference of an appropriate type to uphold safety.- Try the bytemuck crate (and turn on the
bytemuck
feature of this crate) if you want help safely casting between reference types.
- Try the bytemuck crate (and turn on the
- Some intrinsics are not safe unless you're very careful about how you use them, such as the streaming operations requiring you to use them in combination with an appropriate memory fence. Those operations aren't exposed here.
- Some intrinsics mess with the processor state, such as changing the floating point flags, saving and loading special register state, and so on. LLVM doesn't really support you messing with that within a high level language, so those operations aren't exposed here. Use assembly or something if you want to do that.
Naming Conventions
The actual names for each intrinsic are generally a flaming dumpster of
letters that only make sense after you've learned all the names. They're
very bad for learning what things do. Accordingly, safe_arch
uses very
verbose naming that (hopefully) improves the new-user experience.
- Function names start with the primary "verb" of the operation, and then
any adverbs go after that. This makes for slightly awkward English but
helps the list of all the functions sort a little better.
- Eg:
add_i32_m128i
andadd_i16_saturating_m128i
- Eg:
- Function names end with the register type they're most associated with. I
say "most" because while most operations only work with a single register
type at a time there are occasional operations that use more than one
register type.
- Eg:
and_m128
(form128
) andand_m128d
(form128d
)
- Eg:
- If a function operates on just the lowest data lane it generally has
_s
after the register type, because it's a "scalar" operation. The higher lanes are generally just copied forward, or taken from a secondary argument, or something. Details vary.- Eg:
sqrt_m128
(all lanes) andsqrt_m128_s
(low lane only)
- Eg:
Of course, people can't even always agree on what words mean. The common verb names for this crate, and their conventions, are as follows:
load
: Reads memory into a register (deref&Foo
toFoo
).store
: Writes a register to memory (writesFoo
to a&mut Foo
).set
: Packs values into a register (works like[1, 2, 3, 4]
to build an array).splat
: Modifies either a "load" or set". The input is copied as many times as possible across the bits of the output register size (works like[1_i32; LEN]
array building).extract
: Get an individual lane out of a SIMD register (works likereg[i]
). The lane to get has to be a const value.insert
: Duplicate a register and then replace the value of a specific lane (works likelet mut reg2 = reg.copied(); reg[i] = new;
). The lane to overwrite has to be a const value.cast
: change data types while preserving the bit pattern (like howtransmute
would do it).convert
: change data types while trying to stick close to the numeric value (which might change the bits, like howas
would do it).
This crate is pre-1.0 and if you feel that an operation should have a better name to improve the crate's consistency please file an issue.
Current Support
x86
/x86_64
(Intel, AMD, etc)- 128-bit:
sse
,sse2
,sse3
,ssse3
,sse4.1
,sse4.2
- 256-bit:
avx
,avx2
- Other:
adx
,aes
,bmi1
,bmi2
,fma
,lzcnt
,pclmulqdq
,popcnt
,rdrand
,rdseed
- 128-bit:
Compile Time CPU Target Features
At the time of me writing this, Rust enables the sse
and sse2
CPU
features by default for all i686
(x86) and x86_64
builds. Those CPU
features are built into the design of x86_64
, and you'd need a super old
x86
CPU for it to not support at least sse
and sse2
, so they're a safe
bet for the language to enable all the time. In fact, because the standard
library is compiled with them enabled, simply trying to disable those
features would actually cause ABI issues and fill your program with UB
(link).
If you want additional CPU features available at compile time you'll have to
enable them with an additional arg to rustc
. For a feature named name
you pass -C target-feature=+name
, such as -C target-feature=+sse3
for
sse3
.
You can alternately enable all target features of the current CPU with -C target-cpu=native
. This is primarily of use if you're building a program
you'll only run on your own system.
It's sometimes hard to know if your target platform will support a given
feature set, but the Steam Hardware Survey is generally
taken as a guide to what you can expect people to have available. If you
click "Other Settings" it'll expand into a list of CPU target features and
how common they are. These days, it seems that sse3
can be safely assumed,
and ssse3
, sse4.1
, and sse4.2
are pretty safe bets as well. The stuff
above 128-bit isn't as common yet, give it another few years.
Please note that executing a program on a CPU that doesn't support the target features it was compiles for is Undefined Behavior.
Currently, Rust doesn't actually support an easy way for you to check that a
feature enabled at compile time is actually available at runtime. There is
the "feature_detected" family of macros, but if you
enable a feature they will evaluate to a constant true
instead of actually
deferring the check for the feature to runtime. This means that, if you
did want a check at the start of your program, to confirm that all the
assumed features are present and error out when the assumptions don't hold,
you can't use that macro. You gotta use CPUID and check manually. rip.
Hopefully we can make that process easier in a future version of this crate.
A Note On Working With Cfg
There's two main ways to use cfg
:
- Via an attribute placed on an item, block, or expression:
#[cfg(debug_assertions)] println!("hello");
- Via a macro used within an expression position:
if cfg!(debug_assertions) { println!("hello"); }
The difference might seem small but it's actually very important:
- The attribute form will include code or not before deciding if all the items named and so forth really exist or not. This means that code that is configured via attribute can safely name things that don't always exist as long as the things they name do exist whenever that code is configured into the build.
- The macro form will include the configured code no matter what, and then
the macro resolves to a constant
true
orfalse
and the compiler uses dead code elimination to cut out the path not taken.
This crate uses cfg
via the attribute, so the functions it exposes don't
exist at all when the appropriate CPU target features aren't enabled.
Accordingly, if you plan to call this crate or not depending on what
features are enabled in the build you'll also need to control your use of
this crate via cfg attribute, not cfg macro.
Macros
aes_key_gen_assist_m128i | aes ? |
blend_i32_m128i | avx2 Blends the |
blend_imm_i16_m128i | Blends the |
blend_imm_i16_m256i | avx2 Blends the |
blend_imm_i32_m256i | avx2 Blends the |
blend_imm_m128d | Blends the lanes according to the immediate mask. |
blend_imm_m128 | Blends the lanes according to the immediate mask. |
blend_imm_m256d | avx Blends the |
blend_imm_m256 | avx Blends the |
byte_shl_u128_imm_m128i | Shifts all bits in the entire register left by a number of bytes. |
byte_shl_u128_imm_m256i | avx2 Shifts each |
byte_shr_u128_imm_m128i | Shifts all bits in the entire register right by a number of bytes. |
byte_shr_u128_imm_m256i | avx2 Shifts each |
cmp_op_mask_m128 | avx Compare |
cmp_op_mask_m128_s | avx Compare |
cmp_op_mask_m128d | avx Compare |
cmp_op_mask_m128d_s | avx Compare |
cmp_op_mask_m256 | avx Compare |
cmp_op_mask_m256d | avx Compare |
combined_byte_shr_imm_m128i | Counts |
combined_byte_shr_imm_m256i | Works like |
comparison_operator_translation | avx Turns a comparison operator token to the correct constant value. |
dot_product_m128d | Performs a dot product of two |
dot_product_m128 | Performs a dot product of two |
dot_product_m256 | avx This works like |
extract_f32_as_i32_bits_imm_m128 | Gets the |
extract_i16_as_i32_m128i | Gets an |
extract_i16_as_i32_m256i | avx2 Gets an |
extract_i32_from_m256i | avx Extracts an |
extract_i32_imm_m128i | Gets the |
extract_i64_from_m256i | avx Extracts an |
extract_i64_imm_m128i | Gets the |
extract_i8_as_i32_imm_m128i | Gets the |
extract_i8_as_i32_m256i | avx2 Gets an |
extract_m128_from_m256 | avx Extracts an |
extract_m128d_from_m256d | avx Extracts an |
extract_m128i_from_m256i | avx Extracts an |
extract_m128i_m256i | avx2 Gets an |
insert_f32_imm_m128 | Inserts a lane from |
insert_i16_from_i32_m128i | Inserts the low 16 bits of an |
insert_i16_to_m256i | avx Inserts an |
insert_i32_imm_m128i | Inserts a new value for the |
insert_i32_to_m256i | avx Inserts an |
insert_i64_imm_m128i | Inserts a new value for the |
insert_i64_to_m256i | avx Inserts an |
insert_i8_imm_m128i | Inserts a new value for the |
insert_i8_to_m256i | avx Inserts an |
insert_m128_to_m256 | avx Inserts an |
insert_m128d_to_m256d | avx Inserts an |
insert_m128i_to_m256i_slow_avx | avx Slowly inserts an |
insert_m128i_to_m256i | avx Inserts an |
mul_i64_carryless_m128i | pclmulqdq Performs a "carryless" multiplication of two |
multi_packed_sum_abs_diff_u8_m128i | Computes eight |
multi_packed_sum_abs_diff_u8_m256i | avx2 Computes eight |
permute_2x128_m256i | avx2 Permutes the lanes around. |
permute_f128_in_m256d | avx Permutes the lanes around. |
permute_f128_in_m256 | avx Permutes the lanes around. |
permute_i128_in_m256i | avx Permutes the lanes around. |
permute_i64_m256i | avx2 Permutes the lanes around. |
permute_m128d | avx Permutes the lanes around. |
permute_m128 | avx Permutes the lanes around. |
permute_m256 | avx Permutes the lanes around. |
permute_m256d | avx2 Permutes the lanes around. |
permute_within_m128d_m256d | avx Permutes the lanes around. |
round_m128d | Rounds each lane in the style specified. |
round_m128d_s | Rounds |
round_m128 | Rounds each lane in the style specified. |
round_m128_s | Rounds |
round_m256d | avx Rounds each lane in the style specified. |
round_m256 | avx Rounds each lane in the style specified. |
shl_i16_imm_m128i | Shifts all |
shl_i16_imm_m256i | avx2 Shifts all |
shl_i32_imm_m128i | Shifts all |
shl_i32_imm_m256i | avx2 Shifts all |
shl_i64_imm_m128i | Shifts both |
shl_i64_imm_m256i | avx2 Shifts all |
shr_i16_imm_m128i | Shifts all |
shr_i16_imm_m256i | avx2 Shifts all |
shr_i32_imm_m128i | Shifts all |
shr_i32_imm_m256i | avx2 Shifts all |
shr_u16_imm_m128i | Shifts all |
shr_u16_imm_m256i | avx2 Shifts all |
shr_u32_imm_m128i | Shifts all |
shr_u32_imm_m256i | avx2 Shifts all |
shr_u64_imm_m128i | Shifts both |
shr_u64_imm_m256i | avx2 Shifts all |
shuffle_i16_high_lanes_m128i | Shuffles the higher |
shuffle_i16_high_m256i | avx2 Shuffles the upper |
shuffle_i16_low_lanes_m128i | Shuffles the lower |
shuffle_i16_low_m256i | avx2 Shuffles the lower |
shuffle_i32_m128i | Shuffles the |
shuffle_i32_m256i | avx2 Shuffles the lanes around. |
shuffle_m128 | Shuffles the lanes around. |
shuffle_m128d | Shuffles the lanes around. |
shuffle_m256d | avx Shuffles the |
shuffle_m256 | avx Shuffles the |
string_search_for_index | sse4.2 Looks for |
string_search_for_mask | sse4.2 Looks for |
Structs
m128 | The data for a 128-bit SSE register of four |
m128d | The data for a 128-bit SSE register of two |
m128i | The data for a 128-bit SSE register of integer data. |
m256 | The data for a 256-bit AVX register of eight |
m256d | The data for a 256-bit AVX register of four |
m256i | The data for a 256-bit AVX register of integer data. |
Enums
Permute_2x128_m256i | Selects the output style of a |
Functions
abs_i16_m128i | ssse3 Lanewise absolute value with lanes as |
abs_i16_m256i | avx2 Absolute value of |
abs_i32_m128i | ssse3 Lanewise absolute value with lanes as |
abs_i32_m256i | avx2 Absolute value of |
abs_i8_m128i | ssse3 Lanewise absolute value with lanes as |
abs_i8_m256i | avx2 Absolute value of |
add_carry_u32 | adx Add two |
add_carry_u64 | adx Add two |
add_horizontal_i16_m128i | ssse3 Add horizontal pairs of |
add_horizontal_i16_m256i | avx2 Horizontal |
add_horizontal_i32_m128i | ssse3 Add horizontal pairs of |
add_horizontal_i32_m256i | avx2 Horizontal |
add_horizontal_m128d | sse3 Add each lane horizontally, pack the outputs as |
add_horizontal_m128 | sse3 Add each lane horizontally, pack the outputs as |
add_horizontal_m256d | avx Add adjacent |
add_horizontal_m256 | avx Add adjacent |
add_horizontal_saturating_i16_m128i | ssse3 Add horizontal pairs of |
add_horizontal_saturating_i16_m256i | avx2 Horizontal saturating |
add_i16_m128i | sse2 Lanewise |
add_i16_m256i | avx2 Lanewise |
add_i32_m128i | sse2 Lanewise |
add_i32_m256i | avx2 Lanewise |
add_i64_m128i | sse2 Lanewise |
add_i64_m256i | avx2 Lanewise |
add_i8_m128i | sse2 Lanewise |
add_i8_m256i | avx2 Lanewise |
add_m128 | sse Lanewise |
add_m128_s | sse Low lane |
add_m128d | sse2 Lanewise |
add_m128d_s | sse2 Lowest lane |
add_m256d | avx Lanewise |
add_m256 | avx Lanewise |
add_saturating_i16_m128i | sse2 Lanewise saturating |
add_saturating_i16_m256i | avx2 Lanewise saturating |
add_saturating_i8_m128i | sse2 Lanewise saturating |
add_saturating_i8_m256i | avx2 Lanewise saturating |
add_saturating_u16_m128i | sse2 Lanewise saturating |
add_saturating_u16_m256i | avx2 Lanewise saturating |
add_saturating_u8_m128i | sse2 Lanewise saturating |
add_saturating_u8_m256i | avx2 Lanewise saturating |
add_sub_m128d | sse3 Add the high lane and subtract the low lane. |
add_sub_m128 | sse3 Alternately, from the top, add a lane and then subtract a lane. |
add_sub_m256d | avx Alternately, from the top, add |
add_sub_m256 | avx Alternately, from the top, add |
aes_decrypt_last_m128i | aes Perform the last round of AES decryption flow on |
aes_decrypt_m128i | aes Perform one round of AES decryption flow on |
aes_encrypt_last_m128i | aes Perform the last round of AES encryption flow on |
aes_encrypt_m128i | aes Perform one round of AES encryption flow on |
aes_inv_mix_columns_m128i | aes Perform the InvMixColumns transform on |
and_m128 | sse Bitwise |
and_m128d | sse2 Bitwise |
and_m128i | sse2 Bitwise |
and_m256d | avx Bitwise |
and_m256 | avx Bitwise |
and_m256i | avx2 Bitwise |
andnot_m128 | sse Bitwise |
andnot_m128d | sse2 Bitwise |
andnot_m128i | sse2 Bitwise |
andnot_m256d | avx Bitwise |
andnot_m256 | avx Bitwise |
andnot_m256i | avx2 Bitwise |
andnot_u32 | bmi1 Bitwise |
andnot_u64 | bmi1 Bitwise |
average_u16_m128i | sse2 Lanewise average of the |
average_u16_m256i | avx2 Average |
average_u8_m128i | sse2 Lanewise average of the |
average_u8_m256i | avx2 Average |
bit_extract2_u32 | bmi1 Extract a span of bits from the |
bit_extract2_u64 | bmi1 Extract a span of bits from the |
bit_extract_u32 | bmi1 Extract a span of bits from the |
bit_extract_u64 | bmi1 Extract a span of bits from the |
bit_lowest_set_mask_u32 | bmi1 Gets the mask of all bits up to and including the lowest set bit in a |
bit_lowest_set_mask_u64 | bmi1 Gets the mask of all bits up to and including the lowest set bit in a |
bit_lowest_set_reset_u32 | bmi1 Resets (clears) the lowest set bit. |
bit_lowest_set_reset_u64 | bmi1 Resets (clears) the lowest set bit. |
bit_lowest_set_value_u32 | bmi1 Gets the value of the lowest set bit in a |
bit_lowest_set_value_u64 | bmi1 Gets the value of the lowest set bit in a |
bit_zero_high_index_u32 | bmi2 Zero out all high bits in a |
bit_zero_high_index_u64 | bmi2 Zero out all high bits in a |
blend_varying_i8_m128i | sse4.1 Blend the |
blend_varying_i8_m256i | avx2 Blend |
blend_varying_m128d | sse4.1 Blend the lanes according to a runtime varying mask. |
blend_varying_m128 | sse4.1 Blend the lanes according to a runtime varying mask. |
blend_varying_m256d | avx Blend the lanes according to a runtime varying mask. |
blend_varying_m256 | avx Blend the lanes according to a runtime varying mask. |
byte_swap_i32 | Swap the bytes of the given 32-bit value. |
byte_swap_i64 | Swap the bytes of the given 64-bit value. |
cast_from_m256_to_m256d | avx Bit-preserving cast from |
cast_from_m256_to_m256i | avx Bit-preserving cast from |
cast_from_m256d_to_m256 | avx Bit-preserving cast from |
cast_from_m256d_to_m256i | avx Bit-preserving cast from |
cast_from_m256i_to_m256d | avx Bit-preserving cast from |
cast_from_m256i_to_m256 | avx Bit-preserving cast from |
cast_to_m128_from_m128d | sse2 Bit-preserving cast to |
cast_to_m128_from_m128i | sse2 Bit-preserving cast to |
cast_to_m128d_from_m128 | sse2 Bit-preserving cast to |
cast_to_m128d_from_m128i | sse2 Bit-preserving cast to |
cast_to_m128i_from_m128d | sse2 Bit-preserving cast to |
cast_to_m128i_from_m128 | sse2 Bit-preserving cast to |
ceil_m128d | sse4.1 Round each lane to a whole number, towards positive infinity |
ceil_m128 | sse4.1 Round each lane to a whole number, towards positive infinity |
ceil_m128d_s | sse4.1 Round the low lane of |
ceil_m128_s | sse4.1 Round the low lane of |
ceil_m256d | avx Round |
ceil_m256 | avx Round |
cmp_eq_i32_m128_s | sse Low lane equality. |
cmp_eq_i32_m128d_s | sse2 Low lane |
cmp_eq_mask_i16_m128i | sse2 Lanewise |
cmp_eq_mask_i16_m256i | avx2 Compare |
cmp_eq_mask_i32_m128i | sse2 Lanewise |
cmp_eq_mask_i32_m256i | avx2 Compare |
cmp_eq_mask_i64_m128i | sse4.1 Lanewise |
cmp_eq_mask_i64_m256i | avx2 Compare |
cmp_eq_mask_i8_m128i | sse2 Lanewise |
cmp_eq_mask_i8_m256i | avx2 Compare |
cmp_eq_mask_m128 | sse Lanewise |
cmp_eq_mask_m128_s | sse Low lane |
cmp_eq_mask_m128d | sse2 Lanewise |
cmp_eq_mask_m128d_s | sse2 Low lane |
cmp_ge_i32_m128_s | sse Low lane greater than or equal to. |
cmp_ge_i32_m128d_s | sse2 Low lane |
cmp_ge_mask_m128 | sse Lanewise |
cmp_ge_mask_m128_s | sse Low lane |
cmp_ge_mask_m128d | sse2 Lanewise |
cmp_ge_mask_m128d_s | sse2 Low lane |
cmp_gt_i32_m128_s | sse Low lane greater than. |
cmp_gt_i32_m128d_s | sse2 Low lane |
cmp_gt_mask_i16_m128i | sse2 Lanewise |
cmp_gt_mask_i16_m256i | avx2 Compare |
cmp_gt_mask_i32_m128i | sse2 Lanewise |
cmp_gt_mask_i32_m256i | avx2 Compare |
cmp_gt_mask_i64_m128i | sse4.2 Lanewise |
cmp_gt_mask_i64_m256i | avx2 Compare |
cmp_gt_mask_i8_m128i | sse2 Lanewise |
cmp_gt_mask_i8_m256i | avx2 Compare |
cmp_gt_mask_m128 | sse Lanewise |
cmp_gt_mask_m128_s | sse Low lane |
cmp_gt_mask_m128d | sse2 Lanewise |
cmp_gt_mask_m128d_s | sse2 Low lane |
cmp_le_i32_m128_s | sse Low lane less than or equal to. |
cmp_le_i32_m128d_s | sse2 Low lane |
cmp_le_mask_m128 | sse Lanewise |
cmp_le_mask_m128_s | sse Low lane |
cmp_le_mask_m128d | sse2 Lanewise |
cmp_le_mask_m128d_s | sse2 Low lane |
cmp_lt_i32_m128_s | sse Low lane less than. |
cmp_lt_i32_m128d_s | sse2 Low lane |
cmp_lt_mask_i16_m128i | sse2 Lanewise |
cmp_lt_mask_i32_m128i | sse2 Lanewise |
cmp_lt_mask_i8_m128i | sse2 Lanewise |
cmp_lt_mask_m128 | sse Lanewise |
cmp_lt_mask_m128_s | sse Low lane |
cmp_lt_mask_m128d | sse2 Lanewise |
cmp_lt_mask_m128d_s | sse2 Low lane |
cmp_neq_i32_m128_s | sse Low lane not equal to. |
cmp_neq_i32_m128d_s | sse2 Low lane |
cmp_neq_mask_m128 | sse Lanewise |
cmp_neq_mask_m128_s | sse Low lane |
cmp_neq_mask_m128d | sse2 Lanewise |
cmp_neq_mask_m128d_s | sse2 Low lane |
cmp_nge_mask_m128 | sse Lanewise |
cmp_nge_mask_m128_s | sse Low lane |
cmp_nge_mask_m128d | sse2 Lanewise |
cmp_nge_mask_m128d_s | sse2 Low lane |
cmp_ngt_mask_m128 | sse Lanewise |
cmp_ngt_mask_m128_s | sse Low lane |
cmp_ngt_mask_m128d | sse2 Lanewise |
cmp_ngt_mask_m128d_s | sse2 Low lane |
cmp_nle_mask_m128 | sse Lanewise |
cmp_nle_mask_m128_s | sse Low lane |
cmp_nle_mask_m128d | sse2 Lanewise |
cmp_nle_mask_m128d_s | sse2 Low lane |
cmp_nlt_mask_m128 | sse Lanewise |
cmp_nlt_mask_m128_s | sse Low lane |
cmp_nlt_mask_m128d | sse2 Lanewise |
cmp_nlt_mask_m128d_s | sse2 Low lane |
cmp_ordinary_mask_m128 | sse Lanewise |
cmp_ordinary_mask_m128_s | sse Low lane |
cmp_ordinary_mask_m128d | sse2 Lanewise |
cmp_ordinary_mask_m128d_s | sse2 Low lane |
cmp_unord_mask_m128 | sse Lanewise |
cmp_unord_mask_m128_s | sse Low lane |
cmp_unord_mask_m128d | sse2 Lanewise |
cmp_unord_mask_m128d_s | sse2 Low lane |
convert_i16_lower2_to_i64_m128i | sse4.1 Convert the lower two |
convert_i16_lower4_to_i32_m128i | sse4.1 Convert the lower four |
convert_i16_m128i_lower4_m256i | avx2 Sign extend |
convert_i16_m128i_m256i | avx2 Sign extend |
convert_i32_lower2_to_i64_m128i | sse4.1 Convert the lower two |
convert_i32_m128i_m256i | avx2 Sign extend |
convert_i32_replace_m128_s | sse Convert |
convert_i32_replace_m128d_s | sse2 Convert |
convert_i64_replace_m128d_s | sse2 Convert |
convert_i8_lower2_to_i64_m128i | sse4.1 Convert the lower two |
convert_i8_lower4_to_i32_m128i | sse4.1 Convert the lower four |
convert_i8_lower8_to_i16_m128i | sse4.1 Convert the lower eight |
convert_i8_m128i_lower4_m256i | avx2 Sign extend the lower 4 |
convert_i8_m128i_lower8_m256i | avx2 Sign extend the lower 8 |
convert_i8_m128i_m256i | avx2 Sign extend |
convert_m128_s_replace_m128d_s | sse2 Converts the lower |
convert_m128d_s_replace_m128_s | sse2 Converts the low |
convert_to_f32_from_m256_s | avx Convert the lowest |
convert_to_f64_from_m256d_s | avx Convert the lowest |
convert_to_i32_from_m256i_s | avx Convert the lowest |
convert_to_i32_m128i_from_m256d | avx Convert |
convert_to_i32_m256i_from_m256 | avx Convert |
convert_to_m128_from_m128i | sse2 Rounds the four |
convert_to_m128_from_m128d | sse2 Rounds the two |
convert_to_m128_from_m256d | avx Convert |
convert_to_m128d_from_m128i | sse2 Rounds the lower two |
convert_to_m128d_from_m128 | sse2 Rounds the two |
convert_to_m128i_from_m128d | sse2 Rounds the two |
convert_to_m128i_from_m128 | sse2 Rounds the two |
convert_to_m128i_from_m256d | avx Convert |
convert_to_m256_from_i32_m256i | avx Convert |
convert_to_m256d_from_i32_m128i | avx Convert |
convert_to_m256d_from_m128 | avx Convert |
convert_to_m256i_from_m256 | avx Convert |
convert_u16_lower2_to_u64_m128i | sse4.1 Convert the lower two |
convert_u16_lower4_to_u32_m128i | sse4.1 Convert the lower four |
convert_u16_m128i_lower4_m256i | avx2 Zero extend lower 4 |
convert_u16_m128i_m256i | avx2 Zero extend |
convert_u32_lower2_to_u64_m128i | sse4.1 Convert the lower two |
convert_u32_m128i_m256i | avx2 Zero extend |
convert_u8_lower2_to_u64_m128i | sse4.1 Convert the lower two |
convert_u8_lower4_to_u32_m128i | sse4.1 Convert the lower four |
convert_u8_lower8_to_u16_m128i | sse4.1 Convert the lower eight |
convert_u8_m128i_lower4_m256i | avx2 Zero extend lower 4 |
convert_u8_m128i_lower8_m256i | avx2 Zero extend lower 8 |
convert_u8_m128i_m256i | avx2 Zero extend |
copy_i64_m128i_s | sse2 Copy the low |
copy_replace_low_f64_m128d | sse2 Copies the |
crc32_u8 | sse4.2 Accumulates the |
crc32_u16 | sse4.2 Accumulates the |
crc32_u32 | sse4.2 Accumulates the |
crc32_u64 | sse4.2 Accumulates the |
div_m128 | sse Lanewise |
div_m128_s | sse Low lane |
div_m128d | sse2 Lanewise |
div_m128d_s | sse2 Lowest lane |
div_m256d | avx Lanewise |
div_m256 | avx Lanewise |
duplicate_even_lanes_m128 | sse3 Duplicate the odd lanes to the even lanes. |
duplicate_even_lanes_m256 | avx Duplicate the even-indexed lanes to the odd lanes. |
duplicate_low_lane_m128d_s | sse3 Copy the low lane of the input to both lanes of the output. |
duplicate_odd_lanes_m128 | sse3 Duplicate the odd lanes to the even lanes. |
duplicate_odd_lanes_m256d | avx Duplicate the odd-indexed lanes to the even lanes. |
duplicate_odd_lanes_m256 | avx Duplicate the odd-indexed lanes to the even lanes. |
floor_m128d | sse4.1 Round each lane to a whole number, towards negative infinity |
floor_m128 | sse4.1 Round each lane to a whole number, towards negative infinity |
floor_m128d_s | sse4.1 Round the low lane of |
floor_m128_s | sse4.1 Round the low lane of |
floor_m256d | avx Round |
floor_m256 | avx Round |
fused_mul_add_m128 | fma Lanewise fused |
fused_mul_add_m128_s | fma Low lane fused |
fused_mul_add_m128d | fma Lanewise fused |
fused_mul_add_m128d_s | fma Low lane fused |
fused_mul_add_m256 | fma Lanewise fused |
fused_mul_add_m256d | fma Lanewise fused |
fused_mul_addsub_m128 | fma Lanewise fused |
fused_mul_addsub_m128d | fma Lanewise fused |
fused_mul_addsub_m256 | fma Lanewise fused |
fused_mul_addsub_m256d | fma Lanewise fused |
fused_mul_neg_add_m128 | fma Lanewise fused |
fused_mul_neg_add_m128_s | fma Low lane |
fused_mul_neg_add_m128d | fma Lanewise fused |
fused_mul_neg_add_m128d_s | fma Low lane |
fused_mul_neg_add_m256 | fma Lanewise fused |
fused_mul_neg_add_m256d | fma Lanewise fused |
fused_mul_neg_sub_m128 | fma Lanewise fused |
fused_mul_neg_sub_m128_s | fma Low lane fused |
fused_mul_neg_sub_m128d | fma Lanewise fused |
fused_mul_neg_sub_m128d_s | fma Low lane fused |
fused_mul_neg_sub_m256 | fma Lanewise fused |
fused_mul_neg_sub_m256d | fma Lanewise fused |
fused_mul_sub_m128 | fma Lanewise fused |
fused_mul_sub_m128_s | fma Low lane fused |
fused_mul_sub_m128d | fma Lanewise fused |
fused_mul_sub_m128d_s | fma Low lane fused |
fused_mul_sub_m256 | fma Lanewise fused |
fused_mul_sub_m256d | fma Lanewise fused |
fused_mul_subadd_m128 | fma Lanewise fused |
fused_mul_subadd_m128d | fma Lanewise fused |
fused_mul_subadd_m256 | fma Lanewise fused |
fused_mul_subadd_m256d | fma Lanewise fused |
get_f32_from_m128_s | sse Gets the low lane as an individual |
get_f64_from_m128d_s | sse2 Gets the lower lane as an |
get_i32_from_m128_s | sse Converts the low lane to |
get_i32_from_m128d_s | sse2 Converts the lower lane to an |
get_i32_from_m128i_s | sse2 Converts the lower lane to an |
get_i64_from_m128d_s | sse2 Converts the lower lane to an |
get_i64_from_m128i_s | sse2 Converts the lower lane to an |
leading_zero_count_u32 | lzcnt Count the leading zeroes in a |
leading_zero_count_u64 | lzcnt Count the leading zeroes in a |
load_f32_m128_s | sse Loads the |
load_f32_splat_m128 | sse Loads the |
load_f32_splat_m256 | avx Load an |
load_f64_m128d_s | sse2 Loads the reference into the low lane of the register. |
load_f64_splat_m128d | sse2 Loads the |
load_f64_splat_m256d | avx Load an |
load_i64_m128i_s | sse2 Loads the low |
load_m128 | sse Loads the reference into a register. |
load_m128d | sse2 Loads the reference into a register. |
load_m128i | sse2 Loads the reference into a register. |
load_m256d | avx Load data from memory into a register. |
load_m256 | avx Load data from memory into a register. |
load_m256i | avx Load data from memory into a register. |
load_m128_splat_m256 | avx Load an |
load_m128d_splat_m256d | avx Load an |
load_masked_i32_m128i | avx2 Loads the reference given and zeroes any |
load_masked_i32_m256i | avx2 Loads the reference given and zeroes any |
load_masked_i64_m128i | avx2 Loads the reference given and zeroes any |
load_masked_i64_m256i | avx2 Loads the reference given and zeroes any |
load_masked_m128d | avx Load data from memory into a register according to a mask. |
load_masked_m128 | avx Load data from memory into a register according to a mask. |
load_masked_m256d | avx Load data from memory into a register according to a mask. |
load_masked_m256 | avx Load data from memory into a register according to a mask. |
load_replace_high_m128d | sse2 Loads the reference into a register, replacing the high lane. |
load_replace_low_m128d | sse2 Loads the reference into a register, replacing the low lane. |
load_reverse_m128 | sse Loads the reference into a register with reversed order. |
load_reverse_m128d | sse2 Loads the reference into a register with reversed order. |
load_unaligned_hi_lo_m256d | avx Load data from memory into a register. |
load_unaligned_hi_lo_m256 | avx Load data from memory into a register. |
load_unaligned_hi_lo_m256i | avx Load data from memory into a register. |
load_unaligned_m128 | sse Loads the reference into a register. |
load_unaligned_m128d | sse2 Loads the reference into a register. |
load_unaligned_m128i | sse2 Loads the reference into a register. |
load_unaligned_m256d | avx Load data from memory into a register. |
load_unaligned_m256 | avx Load data from memory into a register. |
load_unaligned_m256i | avx Load data from memory into a register. |
max_i16_m128i | sse2 Lanewise |
max_i16_m256i | avx2 Lanewise |
max_i32_m128i | sse4.1 Lanewise |
max_i32_m256i | avx2 Lanewise |
max_i8_m128i | sse4.1 Lanewise |
max_i8_m256i | avx2 Lanewise |
max_m128 | sse Lanewise |
max_m128_s | sse Low lane |
max_m128d | sse2 Lanewise |
max_m128d_s | sse2 Low lane |
max_m256d | avx Lanewise |
max_m256 | avx Lanewise |
max_u16_m128i | sse4.1 Lanewise |
max_u16_m256i | avx2 Lanewise |
max_u32_m128i | sse4.1 Lanewise |
max_u32_m256i | avx2 Lanewise |
max_u8_m128i | sse2 Lanewise |
max_u8_m256i | avx2 Lanewise |
min_i16_m128i | sse2 Lanewise |
min_i16_m256i | avx2 Lanewise |
min_i32_m128i | sse4.1 Lanewise |
min_i32_m256i | avx2 Lanewise |
min_i8_m128i | sse4.1 Lanewise |
min_i8_m256i | avx2 Lanewise |
min_m128 | sse Lanewise |
min_m128_s | sse Low lane |
min_m128d | sse2 Lanewise |
min_m128d_s | sse2 Low lane |
min_m256d | avx Lanewise |
min_m256 | avx Lanewise |
min_position_u16_m128i | sse4.1 Min |
min_u16_m128i | sse4.1 Lanewise |
min_u16_m256i | avx2 Lanewise |
min_u32_m128i | sse4.1 Lanewise |
min_u32_m256i | avx2 Lanewise |
min_u8_m128i | sse2 Lanewise |
min_u8_m256i | avx2 Lanewise |
move_high_low_m128 | sse Move the high lanes of |
move_low_high_m128 | sse Move the low lanes of |
move_m128_s | sse Move the low lane of |
move_mask_i8_m128i | sse2 Gathers the |
move_mask_m128 | sse Gathers the sign bit of each lane. |
move_mask_m128d | sse2 Gathers the sign bit of each lane. |
move_mask_m256d | avx Collects the sign bit of each lane into a 4-bit value. |
move_mask_m256 | avx Collects the sign bit of each lane into a 4-bit value. |
move_mask_m256i | avx2 Create an |
mul_extended_u32 | bmi2 Multiply two |
mul_extended_u64 | bmi2 Multiply two |
mul_i16_horizontal_add_m128i | sse2 Multiply |
mul_i16_horizontal_add_m256i | avx2 Multiply |
mul_i16_keep_high_m128i | sse2 Lanewise |
mul_i16_keep_high_m256i | avx2 Multiply the |
mul_i16_keep_low_m128i | sse2 Lanewise |
mul_i16_keep_low_m256i | avx2 Multiply the |
mul_i16_scale_round_m128i | ssse3 Multiply |
mul_i16_scale_round_m256i | avx2 Multiply |
mul_i32_keep_low_m128i | sse4.1 Lanewise |
mul_i32_keep_low_m256i | avx2 Multiply the |
mul_i64_low_bits_m256i | avx2 Multiply the lower |
mul_i64_widen_low_bits_m128i | sse4.1 Multiplies the lower 32 bits (only) of each |
mul_m128 | sse Lanewise |
mul_m128_s | sse Low lane |
mul_m128d | sse2 Lanewise |
mul_m128d_s | sse2 Lowest lane |
mul_m256d | avx Lanewise |
mul_m256 | avx Lanewise |
mul_u16_keep_high_m128i | sse2 Lanewise |
mul_u16_keep_high_m256i | avx2 Multiply the |
mul_u64_low_bits_m256i | avx2 Multiply the lower |
mul_u64_widen_low_bits_m128i | sse2 Multiplies the lower 32 bits (only) of each |
mul_u8i8_add_horizontal_saturating_m128i | ssse3 This is dumb and weird. |
mul_u8i8_add_horizontal_saturating_m256i | avx2 This is dumb and weird. |
or_m128 | sse Bitwise |
or_m128d | sse2 Bitwise |
or_m128i | sse2 Bitwise |
or_m256d | avx Bitwise |
or_m256 | avx Bitwise |
or_m256i | avx2 Bitwise |
pack_i16_to_i8_m128i | sse2 Saturating convert |
pack_i16_to_i8_m256i | avx2 Saturating convert |
pack_i16_to_u8_m128i | sse2 Saturating convert |
pack_i16_to_u8_m256i | avx2 Saturating convert |
pack_i32_to_i16_m128i | sse2 Saturating convert |
pack_i32_to_i16_m256i | avx2 Saturating convert |
pack_i32_to_u16_m128i | sse4.1 Saturating convert |
pack_i32_to_u16_m256i | avx2 Saturating convert |
permute_i32_m256i | avx2 Permutes the 32-bit integer lanes. |
permute_m256 | avx2 Permutes the |
permute_varying_m128d | avx Permute with a runtime varying pattern. |
permute_varying_m128 | avx Permute with a runtime varying pattern. |
permute_varying_m256d | avx Permute with a runtime varying pattern. |
permute_varying_m256 | avx Permute with a runtime varying pattern. |
population_count_i32 | popcnt Count the number of bits set within an |
population_count_i64 | popcnt Count the number of bits set within an |
population_deposit_u32 | bmi2 Deposit contiguous low bits from a |
population_deposit_u64 | bmi2 Deposit contiguous low bits from a |
population_extract_u32 | bmi2 Extract bits from a |
population_extract_u64 | bmi2 Extract bits from a |
rdrand_u16 | rdrand Try to obtain a random |
rdrand_u32 | rdrand Try to obtain a random |
rdrand_u64 | rdrand Try to obtain a random |
rdseed_u16 | rdseed Try to obtain a random |
rdseed_u32 | rdseed Try to obtain a random |
rdseed_u64 | rdseed Try to obtain a random |
read_timestamp_counter | Reads the CPU's timestamp counter value. |
read_timestamp_counter_p | Reads the CPU's timestamp counter value and store the processor signature. |
reciprocal_m128 | sse Lanewise |
reciprocal_m128_s | sse Low lane |
reciprocal_m256 | avx Reciprocal of |
reciprocal_sqrt_m128 | sse Lanewise |
reciprocal_sqrt_m128_s | sse Low lane |
reciprocal_sqrt_m256 | avx Reciprocal of |
set_i16_m128i | sse2 Sets the args into an |
set_i16_m256i | avx Set |
set_i32_m128i_s | sse2 Set an |
set_i32_m128i | sse2 Sets the args into an |
set_i32_m256i | avx Set |
set_i64_m128i_s | sse2 Set an |
set_i64_m128i | sse2 Sets the args into an |
set_i8_m128i | sse2 Sets the args into an |
set_i8_m256i | avx Set |
set_m128 | sse Sets the args into an |
set_m128_s | sse Sets the args into an |
set_m128d | sse2 Sets the args into an |
set_m128d_s | sse2 Sets the args into the low lane of a |
set_m256d | avx Set |
set_m256 | avx Set |
set_m128d_m256d | avx Set |
set_m128i_m256i | avx Set |
set_reversed_i16_m128i | sse2 Sets the args into an |
set_reversed_i16_m256i | avx Set |
set_reversed_i32_m128i | sse2 Sets the args into an |
set_reversed_i32_m256i | avx Set |
set_reversed_i8_m128i | sse2 Sets the args into an |
set_reversed_i8_m256i | avx Set |
set_reversed_m128 | sse Sets the args into an |
set_reversed_m128d | sse2 Sets the args into an |
set_reversed_m256d | avx Set |
set_reversed_m256 | avx Set |
set_reversed_m128d_m256d | avx Set |
set_reversed_m128i_m256i | avx Set |
set_splat_i16_m128i | sse2 Splats the |
set_splat_i16_m256i | avx Splat an |
set_splat_i16_m128i_s_m256i | avx2 Sets the lowest |
set_splat_i32_m128i | sse2 Splats the |
set_splat_i32_m256i | avx Splat an |
set_splat_i32_m128i_s_m256i | avx2 Sets the lowest |
set_splat_i64_m128i | sse2 Splats the |
set_splat_i64_m128i_s_m256i | avx2 Sets the lowest |
set_splat_i8_m128i | sse2 Splats the |
set_splat_i8_m256i | avx Splat an |
set_splat_i8_m128i_s_m256i | avx2 Sets the lowest |
set_splat_m128 | sse Splats the value to all lanes. |
set_splat_m128d | sse2 Splats the args into both lanes of the |
set_splat_m256d | avx Splat an |
set_splat_m256 | avx Splat an |
set_splat_m128_s_m256 | avx2 Sets the lowest lane of an |
set_splat_m128d_s_m256d | avx2 Sets the lowest lane of an |
shl_i16_m128i | sse2 Shift each |
shl_i16_m256i | avx2 Lanewise |
shl_i32_each_m256i | avx2 Lanewise |
shl_i32_m128i | sse2 Shift each |
shl_i32_m256i | avx2 Lanewise |
shl_i64_each_m256i | avx2 Lanewise |
shl_i64_m128i | sse2 Shift each |
shl_i64_m256i | avx2 Lanewise |
shl_u32_each_m128i | avx2 Shift |
shl_u64_each_m128i | avx2 Shift |
shr_i16_m128i | sse2 Shift each |
shr_i16_m256i | avx2 Lanewise |
shr_i32_each_m128i | avx2 Shift |
shr_i32_each_m256i | avx2 Lanewise |
shr_i32_m128i | sse2 Shift each |
shr_i32_m256i | avx2 Lanewise |
shr_u16_m128i | sse2 Shift each |
shr_u16_m256i | avx2 Lanewise |
shr_u32_each_m128i | avx2 Shift |
shr_u32_each_m256i | avx2 Lanewise |
shr_u32_m128i | sse2 Shift each |
shr_u32_m256i | avx2 Lanewise |
shr_u64_each_m128i | avx2 Shift |
shr_u64_each_m256i | avx2 Lanewise |
shr_u64_m128i | sse2 Shift each |
shr_u64_m256i | avx2 Lanewise |
shuffle_i8_m128i | ssse3 Shuffles the |
shuffle_i8_m256i | avx2 Shuffle |
sign_apply_i16_m128i | ssse3 Applies the sign of |
sign_apply_i16_m256i | avx2 Lanewise |
sign_apply_i32_m128i | ssse3 Applies the sign of |
sign_apply_i32_m256i | avx2 Lanewise |
sign_apply_i8_m128i | ssse3 Applies the sign of |
sign_apply_i8_m256i | avx2 Lanewise |
splat_i16_m128i_s_m128i | avx2 Splat the lowest 16-bit lane across the entire 128 bits. |
splat_i32_m128i_s_m128i | avx2 Splat the lowest 32-bit lane across the entire 128 bits. |
splat_i64_m128i_s_m128i | avx2 Splat the lowest 64-bit lane across the entire 128 bits. |
splat_i8_m128i_s_m128i | avx2 Splat the lowest 8-bit lane across the entire 128 bits. |
splat_m128_s_m128 | avx2 Splat the lowest |
splat_m128d_s_m128d | avx2 Splat the lower |
splat_m128i_m256i | avx2 Splat the 128-bits across 256-bits. |
sqrt_m128 | sse Lanewise |
sqrt_m128_s | sse Low lane |
sqrt_m128d | sse2 Lanewise |
sqrt_m128d_s | sse2 Low lane |
sqrt_m256d | avx Lanewise |
sqrt_m256 | avx Lanewise |
store_high_m128d_s | sse2 Stores the high lane value to the reference given. |
store_i64_m128i_s | sse2 Stores the value to the reference given. |
store_m128 | sse Stores the value to the reference given. |
store_m128_s | sse Stores the low lane value to the reference given. |
store_m128d | sse2 Stores the value to the reference given. |
store_m128d_s | sse2 Stores the low lane value to the reference given. |
store_m128i | sse2 Stores the value to the reference given. |
store_m256d | avx Store data from a register into memory. |
store_m256 | avx Store data from a register into memory. |
store_m256i | avx Store data from a register into memory. |
store_masked_i32_m128i | avx2 Stores the |
store_masked_i32_m256i | avx2 Stores the |
store_masked_i64_m128i | avx2 Stores the |
store_masked_i64_m256i | avx2 Stores the |
store_masked_m128d | avx Store data from a register into memory according to a mask. |
store_masked_m128 | avx Store data from a register into memory according to a mask. |
store_masked_m256d | avx Store data from a register into memory according to a mask. |
store_masked_m256 | avx Store data from a register into memory according to a mask. |
store_reverse_m128 | sse Stores the value to the reference given in reverse order. |
store_reversed_m128d | sse2 Stores the value to the reference given. |
store_splat_m128 | sse Stores the low lane value to all lanes of the reference given. |
store_splat_m128d | sse2 Stores the low lane value to all lanes of the reference given. |
store_unaligned_hi_lo_m256d | avx Store data from a register into memory. |
store_unaligned_hi_lo_m256 | avx Store data from a register into memory. |
store_unaligned_hi_lo_m256i | avx Store data from a register into memory. |
store_unaligned_m128 | sse Stores the value to the reference given. |
store_unaligned_m128d | sse2 Stores the value to the reference given. |
store_unaligned_m128i | sse2 Stores the value to the reference given. |
store_unaligned_m256d | avx Store data from a register into memory. |
store_unaligned_m256 | avx Store data from a register into memory. |
store_unaligned_m256i | avx Store data from a register into memory. |
sub_horizontal_i16_m128i | ssse3 Subtract horizontal pairs of |
sub_horizontal_i16_m256i | avx2 Horizontal |
sub_horizontal_i32_m128i | ssse3 Subtract horizontal pairs of |
sub_horizontal_i32_m256i | avx2 Horizontal |
sub_horizontal_m128d | sse3 Subtract each lane horizontally, pack the outputs as |
sub_horizontal_m128 | sse3 Subtract each lane horizontally, pack the outputs as |
sub_horizontal_m256d | avx Subtract adjacent |
sub_horizontal_m256 | avx Subtract adjacent |
sub_horizontal_saturating_i16_m128i | ssse3 Subtract horizontal pairs of |
sub_horizontal_saturating_i16_m256i | avx2 Horizontal saturating |
sub_i16_m128i | sse2 Lanewise |
sub_i16_m256i | avx2 Lanewise |
sub_i32_m128i | sse2 Lanewise |
sub_i32_m256i | avx2 Lanewise |
sub_i64_m128i | sse2 Lanewise |
sub_i64_m256i | avx2 Lanewise |
sub_i8_m128i | sse2 Lanewise |
sub_i8_m256i | avx2 Lanewise |
sub_m128 | sse Lanewise |
sub_m128_s | sse Low lane |
sub_m128d | sse2 Lanewise |
sub_m128d_s | sse2 Lowest lane |
sub_m256d | avx Lanewise |
sub_m256 | avx Lanewise |
sub_saturating_i16_m128i | sse2 Lanewise saturating |
sub_saturating_i16_m256i | avx2 Lanewise saturating |
sub_saturating_i8_m128i | sse2 Lanewise saturating |
sub_saturating_i8_m256i | avx2 Lanewise saturating |
sub_saturating_u16_m128i | sse2 Lanewise saturating |
sub_saturating_u16_m256i | avx2 Lanewise saturating |
sub_saturating_u8_m128i | sse2 Lanewise saturating |
sub_saturating_u8_m256i | avx2 Lanewise saturating |
sum_of_u8_abs_diff_m128i | sse2 Compute "sum of |
sum_of_u8_abs_diff_m256i | avx2 Compute "sum of |
test_all_ones_m128i | sse4.1 Tests if all bits are 1. |
test_all_zeroes_m128i | sse4.1 Returns if all masked bits are 0, |
test_mixed_ones_and_zeroes_m128i | sse4.1 Returns if, among the masked bits, there's both 0s and 1s |
trailing_zero_count_u32 | bmi1 Counts the number of trailing zero bits in a |
trailing_zero_count_u64 | bmi1 Counts the number of trailing zero bits in a |
transpose_four_m128 | sse Transpose four |
truncate_m128_to_m128i | sse2 Truncate the |
truncate_m128d_to_m128i | sse2 Truncate the |
truncate_to_i32_m128d_s | sse2 Truncate the lower lane into an |
truncate_to_i64_m128d_s | sse2 Truncate the lower lane into an |
unpack_hi_m256d | avx Unpack and interleave the high lanes. |
unpack_hi_m256 | avx Unpack and interleave the high lanes. |
unpack_high_i16_m128i | sse2 Unpack and interleave high |
unpack_high_i16_m256i | avx2 Unpack and interleave high |
unpack_high_i32_m128i | sse2 Unpack and interleave high |
unpack_high_i32_m256i | avx2 Unpack and interleave high |
unpack_high_i64_m128i | sse2 Unpack and interleave high |
unpack_high_i64_m256i | avx2 Unpack and interleave high |
unpack_high_i8_m128i | sse2 Unpack and interleave high |
unpack_high_i8_m256i | avx2 Unpack and interleave high |
unpack_high_m128 | sse Unpack and interleave high lanes of |
unpack_high_m128d | sse2 Unpack and interleave high lanes of |
unpack_lo_m256d | avx Unpack and interleave the high lanes. |
unpack_lo_m256 | avx Unpack and interleave the high lanes. |
unpack_low_i16_m128i | sse2 Unpack and interleave low |
unpack_low_i16_m256i | avx2 Unpack and interleave low |
unpack_low_i32_m128i | sse2 Unpack and interleave low |
unpack_low_i32_m256i | avx2 Unpack and interleave low |
unpack_low_i64_m128i | sse2 Unpack and interleave low |
unpack_low_i64_m256i | avx2 Unpack and interleave low |
unpack_low_i8_m128i | sse2 Unpack and interleave low |
unpack_low_i8_m256i | avx2 Unpack and interleave low |
unpack_low_m128 | sse Unpack and interleave low lanes of |
unpack_low_m128d | sse2 Unpack and interleave low lanes of |
xor_m128 | sse Bitwise |
xor_m128d | sse2 Bitwise |
xor_m128i | sse2 Bitwise |
xor_m256d | avx Bitwise |
xor_m256 | avx Bitwise |
xor_m256i | avx2 Bitwise |
zero_extend_m128d | avx Zero extend an |
zero_extend_m128 | avx Zero extend an |
zero_extend_m128i | avx Zero extend an |
zeroed_m128 | sse All lanes zero. |
zeroed_m128i | sse2 All lanes zero. |
zeroed_m128d | sse2 Both lanes zero. |
zeroed_m256d | avx A zeroed |
zeroed_m256 | avx A zeroed |
zeroed_m256i | avx A zeroed |