Crate safe_arch

Source
Expand description

A crate that safely exposes arch intrinsics via #[cfg()].

safe_arch lets you safely use CPU intrinsics. Those things in the core::arch modules. It works purely via #[cfg()] and compile time CPU feature declaration. If you want to check for a feature at runtime and then call an intrinsic or use a fallback path based on that then this crate is sadly not for you.

SIMD register types are “newtype’d” so that better trait impls can be given to them, but the inner value is a pub field so feel free to just grab it out if you need to. Trait impls of the newtypes include: Default (zeroed), From/Into of appropriate data types, and appropriate operator overloading.

  • Most intrinsics (like addition and multiplication) are totally safe to use as long as the CPU feature is available. In this case, what you get is 1:1 with the actual intrinsic.
  • Some intrinsics take a pointer of an assumed minimum alignment and validity span. For these, the safe_arch function takes a reference of an appropriate type to uphold safety.
    • Try the bytemuck crate (and turn on the bytemuck feature of this crate) if you want help safely casting between reference types.
  • Some intrinsics are not safe unless you’re very careful about how you use them, such as the streaming operations requiring you to use them in combination with an appropriate memory fence. Those operations aren’t exposed here.
  • Some intrinsics mess with the processor state, such as changing the floating point flags, saving and loading special register state, and so on. LLVM doesn’t really support you messing with that within a high level language, so those operations aren’t exposed here. Use assembly or something if you want to do that.

§Naming Conventions

The safe_arch crate does not simply use the “official” names for each intrinsic, because the official names are generally poor. Instead, the operations have been given better names that makes things hopefully easier to understand then you’re reading the code.

For a full explanation of the naming used, see the Naming Conventions page.

§Current Support

  • x86 / x86_64 (Intel, AMD, etc)
    • 128-bit: sse, sse2, sse3, ssse3, sse4.1, sse4.2
    • 256-bit: avx, avx2
    • Other: adx, aes, bmi1, bmi2, fma, lzcnt, pclmulqdq, popcnt, rdrand, rdseed

§Compile Time CPU Target Features

At the time of me writing this, Rust enables the sse and sse2 CPU features by default for all i686 (x86) and x86_64 builds. Those CPU features are built into the design of x86_64, and you’d need a super old x86 CPU for it to not support at least sse and sse2, so they’re a safe bet for the language to enable all the time. In fact, because the standard library is compiled with them enabled, simply trying to disable those features would actually cause ABI issues and fill your program with UB (link).

If you want additional CPU features available at compile time you’ll have to enable them with an additional arg to rustc. For a feature named name you pass -C target-feature=+name, such as -C target-feature=+sse3 for sse3.

You can alternately enable all target features of the current CPU with -C target-cpu=native. This is primarily of use if you’re building a program you’ll only run on your own system.

It’s sometimes hard to know if your target platform will support a given feature set, but the Steam Hardware Survey is generally taken as a guide to what you can expect people to have available. If you click “Other Settings” it’ll expand into a list of CPU target features and how common they are. These days, it seems that sse3 can be safely assumed, and ssse3, sse4.1, and sse4.2 are pretty safe bets as well. The stuff above 128-bit isn’t as common yet, give it another few years.

Please note that executing a program on a CPU that doesn’t support the target features it was compiles for is Undefined Behavior.

Currently, Rust doesn’t actually support an easy way for you to check that a feature enabled at compile time is actually available at runtime. There is the “feature_detected” family of macros, but if you enable a feature they will evaluate to a constant true instead of actually deferring the check for the feature to runtime. This means that, if you did want a check at the start of your program, to confirm that all the assumed features are present and error out when the assumptions don’t hold, you can’t use that macro. You gotta use CPUID and check manually. rip. Hopefully we can make that process easier in a future version of this crate.

§A Note On Working With Cfg

There’s two main ways to use cfg:

  • Via an attribute placed on an item, block, or expression:
    • #[cfg(debug_assertions)] println!("hello");
  • Via a macro used within an expression position:
    • if cfg!(debug_assertions) { println!("hello"); }

The difference might seem small but it’s actually very important:

  • The attribute form will include code or not before deciding if all the items named and so forth really exist or not. This means that code that is configured via attribute can safely name things that don’t always exist as long as the things they name do exist whenever that code is configured into the build.
  • The macro form will include the configured code no matter what, and then the macro resolves to a constant true or false and the compiler uses dead code elimination to cut out the path not taken.

This crate uses cfg via the attribute, so the functions it exposes don’t exist at all when the appropriate CPU target features aren’t enabled. Accordingly, if you plan to call this crate or not depending on what features are enabled in the build you’ll also need to control your use of this crate via cfg attribute, not cfg macro.

Modules§

naming_conventions
An explanation of the crate’s naming conventions.

Macros§

cmp_opavx
Turns a comparison operator token to the correct constant value.
round_opavx
Turns a round operator token to the correct constant value.

Structs§

m128
The data for a 128-bit SSE register of four f32 lanes.
m256
The data for a 256-bit AVX register of eight f32 lanes.
m128d
The data for a 128-bit SSE register of two f64 values.
m128i
The data for a 128-bit SSE register of integer data.
m256d
The data for a 256-bit AVX register of four f64 values.
m256i
The data for a 256-bit AVX register of integer data.

Constants§

STR_CMP_BIT_MASK
Return the bitwise mask of matches.
STR_CMP_EQ_ANY
Matches when any haystack character equals any needle character, regardless of position.
STR_CMP_EQ_EACH
Matches when a character position in the needle is equal to the character at the same position in the haystack.
STR_CMP_EQ_ORDERED
Matches when the complete needle string is a substring somewhere in the haystack.
STR_CMP_FIRST_MATCH
Return the index of the first match found.
STR_CMP_I8
string segment elements are i8 values
STR_CMP_I16
string segment elements are i16 values
STR_CMP_LAST_MATCH
Return the index of the last match found.
STR_CMP_RANGES
Interprets consecutive pairs of characters in the needle as (low..=high) ranges to compare each haystack character to.
STR_CMP_U8
string segment elements are u8 values
STR_CMP_U16
string segment elements are u16 values
STR_CMP_UNIT_MASK
Return the lanewise mask of matches.

Functions§

abs_i8_m128issse3
Lanewise absolute value with lanes as i8.
abs_i8_m256iavx2
Absolute value of i8 lanes.
abs_i16_m128issse3
Lanewise absolute value with lanes as i16.
abs_i16_m256iavx2
Absolute value of i16 lanes.
abs_i32_m128issse3
Lanewise absolute value with lanes as i32.
abs_i32_m256iavx2
Absolute value of i32 lanes.
add_carry_u32adx
Add two u32 with a carry value.
add_carry_u64adx
Add two u64 with a carry value.
add_horizontal_i16_m128issse3
Add horizontal pairs of i16 values, pack the outputs as a then b.
add_horizontal_i16_m256iavx2
Horizontal a + b with lanes as i16.
add_horizontal_i32_m128issse3
Add horizontal pairs of i32 values, pack the outputs as a then b.
add_horizontal_i32_m256iavx2
Horizontal a + b with lanes as i32.
add_horizontal_m128sse3
Add each lane horizontally, pack the outputs as a then b.
add_horizontal_m256avx
Add adjacent f32 lanes.
add_horizontal_m128dsse3
Add each lane horizontally, pack the outputs as a then b.
add_horizontal_m256davx
Add adjacent f64 lanes.
add_horizontal_saturating_i16_m128issse3
Add horizontal pairs of i16 values, saturating, pack the outputs as a then b.
add_horizontal_saturating_i16_m256iavx2
Horizontal saturating a + b with lanes as i16.
add_i8_m128isse2
Lanewise a + b with lanes as i8.
add_i8_m256iavx2
Lanewise a + b with lanes as i8.
add_i16_m128isse2
Lanewise a + b with lanes as i16.
add_i16_m256iavx2
Lanewise a + b with lanes as i16.
add_i32_m128isse2
Lanewise a + b with lanes as i32.
add_i32_m256iavx2
Lanewise a + b with lanes as i32.
add_i64_m128isse2
Lanewise a + b with lanes as i64.
add_i64_m256iavx2
Lanewise a + b with lanes as i64.
add_m128sse
Lanewise a + b.
add_m256avx
Lanewise a + b with f32 lanes.
add_m128_ssse
Low lane a + b, other lanes unchanged.
add_m128dsse2
Lanewise a + b.
add_m128d_ssse2
Lowest lane a + b, high lane unchanged.
add_m256davx
Lanewise a + b with f64 lanes.
add_saturating_i8_m128isse2
Lanewise saturating a + b with lanes as i8.
add_saturating_i8_m256iavx2
Lanewise saturating a + b with lanes as i8.
add_saturating_i16_m128isse2
Lanewise saturating a + b with lanes as i16.
add_saturating_i16_m256iavx2
Lanewise saturating a + b with lanes as i16.
add_saturating_u8_m128isse2
Lanewise saturating a + b with lanes as u8.
add_saturating_u8_m256iavx2
Lanewise saturating a + b with lanes as u8.
add_saturating_u16_m128isse2
Lanewise saturating a + b with lanes as u16.
add_saturating_u16_m256iavx2
Lanewise saturating a + b with lanes as u16.
addsub_m128sse3
Alternately, from the top, add a lane and then subtract a lane.
addsub_m256avx
Alternately, from the top, add f32 then sub f32.
addsub_m128dsse3
Add the high lane and subtract the low lane.
addsub_m256davx
Alternately, from the top, add f64 then sub f64.
aes_decrypt_last_m128iaes
Perform the last round of an AES decryption flow on a using the round_key.
aes_decrypt_m128iaes
Perform one round of an AES decryption flow on a using the round_key.
aes_encrypt_last_m128iaes
Perform the last round of an AES encryption flow on a using the round_key.
aes_encrypt_m128iaes
Perform one round of an AES encryption flow on a using the round_key.
aes_inv_mix_columns_m128iaes
Perform the InvMixColumns transform on a.
aes_key_gen_assist_m128iaes
Assist in expanding an AES cipher key.
average_u8_m128isse2
Lanewise average of the u8 values.
average_u8_m256iavx2
Average u8 lanes.
average_u16_m128isse2
Lanewise average of the u16 values.
average_u16_m256iavx2
Average u16 lanes.
bit_extract2_u32bmi1
Extract a span of bits from the u32, control value style.
bit_extract2_u64bmi1
Extract a span of bits from the u64, control value style.
bit_extract_u32bmi1
Extract a span of bits from the u32, start and len style.
bit_extract_u64bmi1
Extract a span of bits from the u64, start and len style.
bit_lowest_set_mask_u32bmi1
Gets the mask of all bits up to and including the lowest set bit in a u32.
bit_lowest_set_mask_u64bmi1
Gets the mask of all bits up to and including the lowest set bit in a u64.
bit_lowest_set_reset_u32bmi1
Resets (clears) the lowest set bit.
bit_lowest_set_reset_u64bmi1
Resets (clears) the lowest set bit.
bit_lowest_set_value_u32bmi1
Gets the value of the lowest set bit in a u32.
bit_lowest_set_value_u64bmi1
Gets the value of the lowest set bit in a u64.
bit_zero_high_index_u32bmi2
Zero out all high bits in a u32 starting at the index given.
bit_zero_high_index_u64bmi2
Zero out all high bits in a u64 starting at the index given.
bitand_m128sse
Bitwise a & b.
bitand_m256avx
Bitwise a & b.
bitand_m128dsse2
Bitwise a & b.
bitand_m128isse2
Bitwise a & b.
bitand_m256davx
Bitwise a & b.
bitand_m256iavx2
Bitwise a & b.
bitandnot_m128sse
Bitwise (!a) & b.
bitandnot_m256avx
Bitwise (!a) & b.
bitandnot_m128dsse2
Bitwise (!a) & b.
bitandnot_m128isse2
Bitwise (!a) & b.
bitandnot_m256davx
Bitwise (!a) & b.
bitandnot_m256iavx2
Bitwise (!a) & b.
bitandnot_u32bmi1
Bitwise (!a) & b for u32
bitandnot_u64bmi1
Bitwise (!a) & b for u64
bitor_m128sse
Bitwise a | b.
bitor_m256avx
Bitwise a | b.
bitor_m128dsse2
Bitwise a | b.
bitor_m128isse2
Bitwise a | b.
bitor_m256davx
Bitwise a | b.
bitor_m256iavx2
Bitwise a | b
bitxor_m128sse
Bitwise a ^ b.
bitxor_m256avx
Bitwise a ^ b.
bitxor_m128dsse2
Bitwise a ^ b.
bitxor_m128isse2
Bitwise a ^ b.
bitxor_m256davx
Bitwise a ^ b.
bitxor_m256iavx2
Bitwise a ^ b.
blend_imm_i16_m128isse4.1
Blends the i16 lanes according to the immediate mask.
blend_imm_i16_m256iavx2
Blends the i16 lanes according to the immediate value.
blend_imm_i32_m128iavx2
Blends the i32 lanes in a and b into a single value.
blend_imm_i32_m256iavx2
Blends the i32 lanes according to the immediate value.
blend_imm_m128sse4.1
Blends the lanes according to the immediate mask.
blend_imm_m128dsse4.1
Blends the i16 lanes according to the immediate mask.
blend_m256avx
Blends the f32 lanes according to the immediate mask.
blend_m256davx
Blends the f64 lanes according to the immediate mask.
blend_varying_i8_m128isse4.1
Blend the i8 lanes according to a runtime varying mask.
blend_varying_i8_m256iavx2
Blend i8 lanes according to a runtime varying mask.
blend_varying_m128sse4.1
Blend the lanes according to a runtime varying mask.
blend_varying_m256avx
Blend the lanes according to a runtime varying mask.
blend_varying_m128dsse4.1
Blend the lanes according to a runtime varying mask.
blend_varying_m256davx
Blend the lanes according to a runtime varying mask.
byte_shl_imm_u128_m128isse2
Shifts all bits in the entire register left by a number of bytes.
byte_shl_imm_u128_m256iavx2
Shifts each u128 lane left by a number of bytes.
byte_shr_imm_u128_m128isse2
Shifts all bits in the entire register right by a number of bytes.
byte_shr_imm_u128_m256iavx2
Shifts each u128 lane right by a number of bytes.
byte_swap_i32
Swap the bytes of the given 32-bit value.
byte_swap_i64
Swap the bytes of the given 64-bit value.
cast_to_m128_from_m256avx
Bit-preserving cast to m128 from m256.
cast_to_m128_from_m128dsse2
Bit-preserving cast to m128 from m128d
cast_to_m128_from_m128isse2
Bit-preserving cast to m128 from m128i
cast_to_m128d_from_m128sse2
Bit-preserving cast to m128d from m128
cast_to_m128d_from_m128isse2
Bit-preserving cast to m128d from m128i
cast_to_m128d_from_m256davx
Bit-preserving cast to m128d from m256d.
cast_to_m128i_from_m128sse2
Bit-preserving cast to m128i from m128
cast_to_m128i_from_m128dsse2
Bit-preserving cast to m128i from m128d
cast_to_m128i_from_m256iavx
Bit-preserving cast to m128i from m256i.
cast_to_m256_from_m256davx
Bit-preserving cast to m256 from m256d.
cast_to_m256_from_m256iavx
Bit-preserving cast to m256 from m256i.
cast_to_m256d_from_m256avx
Bit-preserving cast to m256i from m256.
cast_to_m256d_from_m256iavx
Bit-preserving cast to m256d from m256i.
cast_to_m256i_from_m256avx
Bit-preserving cast to m256i from m256.
cast_to_m256i_from_m256davx
Bit-preserving cast to m256i from m256d.
ceil_m128sse4.1
Round each lane to a whole number, towards positive infinity.
ceil_m256avx
Round f32 lanes towards positive infinity.
ceil_m128_ssse4.1
Round the low lane of b toward positive infinity, other lanes a.
ceil_m128dsse4.1
Round each lane to a whole number, towards positive infinity.
ceil_m128d_ssse4.1
Round the low lane of b toward positive infinity, high lane is a.
ceil_m256davx
Round f64 lanes towards positive infinity.
cmp_eq_i32_m128_ssse
Low lane equality.
cmp_eq_i32_m128d_ssse2
Low lane f64 equal to.
cmp_eq_mask_i8_m128isse2
Lanewise a == b with lanes as i8.
cmp_eq_mask_i8_m256iavx2
Compare i8 lanes for equality, mask output.
cmp_eq_mask_i16_m128isse2
Lanewise a == b with lanes as i16.
cmp_eq_mask_i16_m256iavx2
Compare i16 lanes for equality, mask output.
cmp_eq_mask_i32_m128isse2
Lanewise a == b with lanes as i32.
cmp_eq_mask_i32_m256iavx2
Compare i32 lanes for equality, mask output.
cmp_eq_mask_i64_m128isse4.1
Lanewise a == b with lanes as i64.
cmp_eq_mask_i64_m256iavx2
Compare i64 lanes for equality, mask output.
cmp_eq_mask_m128sse
Lanewise a == b.
cmp_eq_mask_m128_ssse
Low lane a == b, other lanes unchanged.
cmp_eq_mask_m128dsse2
Lanewise a == b, mask output.
cmp_eq_mask_m128d_ssse2
Low lane a == b, other lanes unchanged.
cmp_ge_i32_m128_ssse
Low lane greater than or equal to.
cmp_ge_i32_m128d_ssse2
Low lane f64 greater than or equal to.
cmp_ge_mask_m128sse
Lanewise a >= b.
cmp_ge_mask_m128_ssse
Low lane a >= b, other lanes unchanged.
cmp_ge_mask_m128dsse2
Lanewise a >= b.
cmp_ge_mask_m128d_ssse2
Low lane a >= b, other lanes unchanged.
cmp_gt_i32_m128_ssse
Low lane greater than.
cmp_gt_i32_m128d_ssse2
Low lane f64 greater than.
cmp_gt_mask_i8_m128isse2
Lanewise a > b with lanes as i8.
cmp_gt_mask_i8_m256iavx2
Compare i8 lanes for a > b, mask output.
cmp_gt_mask_i16_m128isse2
Lanewise a > b with lanes as i16.
cmp_gt_mask_i16_m256iavx2
Compare i16 lanes for a > b, mask output.
cmp_gt_mask_i32_m128isse2
Lanewise a > b with lanes as i32.
cmp_gt_mask_i32_m256iavx2
Compare i32 lanes for a > b, mask output.
cmp_gt_mask_i64_m128isse4.2
Lanewise a > b with lanes as i64.
cmp_gt_mask_i64_m256iavx2
Compare i64 lanes for a > b, mask output.
cmp_gt_mask_m128sse
Lanewise a > b.
cmp_gt_mask_m128_ssse
Low lane a > b, other lanes unchanged.
cmp_gt_mask_m128dsse2
Lanewise a > b.
cmp_gt_mask_m128d_ssse2
Low lane a > b, other lanes unchanged.
cmp_le_i32_m128_ssse
Low lane less than or equal to.
cmp_le_i32_m128d_ssse2
Low lane f64 less than or equal to.
cmp_le_mask_m128sse
Lanewise a <= b.
cmp_le_mask_m128_ssse
Low lane a <= b, other lanes unchanged.
cmp_le_mask_m128dsse2
Lanewise a <= b.
cmp_le_mask_m128d_ssse2
Low lane a <= b, other lanes unchanged.
cmp_lt_i32_m128_ssse
Low lane less than.
cmp_lt_i32_m128d_ssse2
Low lane f64 less than.
cmp_lt_mask_i8_m128isse2
Lanewise a < b with lanes as i8.
cmp_lt_mask_i16_m128isse2
Lanewise a < b with lanes as i16.
cmp_lt_mask_i32_m128isse2
Lanewise a < b with lanes as i32.
cmp_lt_mask_m128sse
Lanewise a < b.
cmp_lt_mask_m128_ssse
Low lane a < b, other lanes unchanged.
cmp_lt_mask_m128dsse2
Lanewise a < b.
cmp_lt_mask_m128d_ssse2
Low lane a < b, other lane unchanged.
cmp_neq_i32_m128_ssse
Low lane not equal to.
cmp_neq_i32_m128d_ssse2
Low lane f64 less than.
cmp_neq_mask_m128sse
Lanewise a != b.
cmp_neq_mask_m128_ssse
Low lane a != b, other lanes unchanged.
cmp_neq_mask_m128dsse2
Lanewise a != b.
cmp_neq_mask_m128d_ssse2
Low lane a != b, other lane unchanged.
cmp_nge_mask_m128sse
Lanewise !(a >= b).
cmp_nge_mask_m128_ssse
Low lane !(a >= b), other lanes unchanged.
cmp_nge_mask_m128dsse2
Lanewise !(a >= b).
cmp_nge_mask_m128d_ssse2
Low lane !(a >= b), other lane unchanged.
cmp_ngt_mask_m128sse
Lanewise !(a > b).
cmp_ngt_mask_m128_ssse
Low lane !(a > b), other lanes unchanged.
cmp_ngt_mask_m128dsse2
Lanewise !(a > b).
cmp_ngt_mask_m128d_ssse2
Low lane !(a > b), other lane unchanged.
cmp_nle_mask_m128sse
Lanewise !(a <= b).
cmp_nle_mask_m128_ssse
Low lane !(a <= b), other lanes unchanged.
cmp_nle_mask_m128dsse2
Lanewise !(a <= b).
cmp_nle_mask_m128d_ssse2
Low lane !(a <= b), other lane unchanged.
cmp_nlt_mask_m128sse
Lanewise !(a < b).
cmp_nlt_mask_m128_ssse
Low lane !(a < b), other lanes unchanged.
cmp_nlt_mask_m128dsse2
Lanewise !(a < b).
cmp_nlt_mask_m128d_ssse2
Low lane !(a < b), other lane unchanged.
cmp_op_mask_m128avx
Compare f32 lanes according to the operation specified, mask output.
cmp_op_mask_m256avx
Compare f32 lanes according to the operation specified, mask output.
cmp_op_mask_m128_savx
Compare f32 lanes according to the operation specified, mask output.
cmp_op_mask_m128davx
Compare f64 lanes according to the operation specified, mask output.
cmp_op_mask_m128d_savx
Compare f64 lanes according to the operation specified, mask output.
cmp_op_mask_m256davx
Compare f64 lanes according to the operation specified, mask output.
cmp_ordered_mask_m128sse
Lanewise (!a.is_nan()) & (!b.is_nan()).
cmp_ordered_mask_m128_ssse
Low lane (!a.is_nan()) & (!b.is_nan()), other lanes unchanged.
cmp_ordered_mask_m128dsse2
Lanewise (!a.is_nan()) & (!b.is_nan()).
cmp_ordered_mask_m128d_ssse2
Low lane (!a.is_nan()) & (!b.is_nan()), other lane unchanged.
cmp_unord_mask_m128sse
Lanewise a.is_nan() | b.is_nan().
cmp_unord_mask_m128_ssse
Low lane a.is_nan() | b.is_nan(), other lanes unchanged.
cmp_unord_mask_m128dsse2
Lanewise a.is_nan() | b.is_nan().
cmp_unord_mask_m128d_ssse2
Low lane a.is_nan() | b.is_nan(), other lane unchanged.
combined_byte_shr_imm_m128issse3
Counts $a as the high bytes and $b as the low bytes then performs a byte shift to the right by the immediate value.
combined_byte_shr_imm_m256iavx2
Works like combined_byte_shr_imm_m128i, but twice as wide.
convert_i32_replace_m128_ssse
Convert i32 to f32 and replace the low lane of the input.
convert_i32_replace_m128d_ssse2
Convert i32 to f64 and replace the low lane of the input.
convert_i64_replace_m128_ssse
Convert i64 to f32 and replace the low lane of the input.
convert_i64_replace_m128d_ssse2
Convert i64 to f64 and replace the low lane of the input.
convert_m128_s_replace_m128d_ssse2
Converts the lower f32 to f64 and replace the low lane of the input
convert_m128d_s_replace_m128_ssse2
Converts the low f64 to f32 and replaces the low lane of the input.
convert_to_f32_from_m256_savx
Convert the lowest f32 lane to a single f32.
convert_to_f64_from_m256d_savx
Convert the lowest f64 lane to a single f64.
convert_to_i16_m128i_from_lower2_i16_m128isse4.1
Convert the lower two i64 lanes to two i32 lanes.
convert_to_i16_m128i_from_lower8_i8_m128isse4.1
Convert the lower eight i8 lanes to eight i16 lanes.
convert_to_i16_m256i_from_i8_m128iavx2
Convert i8 values to i16 values.
convert_to_i16_m256i_from_lower4_u8_m128iavx2
Convert lower 4 u8 values to i16 values.
convert_to_i16_m256i_from_lower8_u8_m128iavx2
Convert lower 8 u8 values to i16 values.
convert_to_i16_m256i_from_u8_m128iavx2
Convert u8 values to i16 values.
convert_to_i32_from_m256i_savx
Convert the lowest i32 lane to a single i32.
convert_to_i32_m128i_from_lower4_i8_m128isse4.1
Convert the lower four i8 lanes to four i32 lanes.
convert_to_i32_m128i_from_lower4_i16_m128isse4.1
Convert the lower four i16 lanes to four i32 lanes.
convert_to_i32_m128i_from_m128sse2
Rounds the f32 lanes to i32 lanes.
convert_to_i32_m128i_from_m128dsse2
Rounds the two f64 lanes to the low two i32 lanes.
convert_to_i32_m128i_from_m256davx
Convert f64 lanes to be i32 lanes.
convert_to_i32_m256i_from_i16_m128iavx2
Convert i16 values to i32 values.
convert_to_i32_m256i_from_lower8_i8_m128iavx2
Convert the lower 8 i8 values to i32 values.
convert_to_i32_m256i_from_m256avx
Convert f32 lanes to be i32 lanes.
convert_to_i32_m256i_from_u16_m128iavx2
Convert u16 values to i32 values.
convert_to_i64_m128i_from_lower2_i8_m128isse4.1
Convert the lower two i8 lanes to two i64 lanes.
convert_to_i64_m128i_from_lower2_i32_m128isse4.1
Convert the lower two i32 lanes to two i64 lanes.
convert_to_i64_m256i_from_i32_m128iavx2
Convert i32 values to i64 values.
convert_to_i64_m256i_from_lower4_i8_m128iavx2
Convert the lower 4 i8 values to i64 values.
convert_to_i64_m256i_from_lower4_i16_m128iavx2
Convert i16 values to i64 values.
convert_to_i64_m256i_from_lower4_u16_m128iavx2
Convert u16 values to i64 values.
convert_to_i64_m256i_from_u32_m128iavx2
Convert u32 values to i64 values.
convert_to_m128_from_i32_m128isse2
Rounds the four i32 lanes to four f32 lanes.
convert_to_m128_from_m128dsse2
Rounds the two f64 lanes to the low two f32 lanes.
convert_to_m128_from_m256davx
Convert f64 lanes to be f32 lanes.
convert_to_m128d_from_lower2_i32_m128isse2
Rounds the lower two i32 lanes to two f64 lanes.
convert_to_m128d_from_lower2_m128sse2
Rounds the two f64 lanes to the low two f32 lanes.
convert_to_m256_from_i32_m256iavx
Convert i32 lanes to be f32 lanes.
convert_to_m256d_from_i32_m128iavx
Convert i32 lanes to be f64 lanes.
convert_to_m256d_from_m128avx
Convert f32 lanes to be f64 lanes.
convert_to_u16_m128i_from_lower8_u8_m128isse4.1
Convert the lower eight u8 lanes to eight u16 lanes.
convert_to_u32_m128i_from_lower4_u8_m128isse4.1
Convert the lower four u8 lanes to four u32 lanes.
convert_to_u32_m128i_from_lower4_u16_m128isse4.1
Convert the lower four u16 lanes to four u32 lanes.
convert_to_u64_m128i_from_lower2_u8_m128isse4.1
Convert the lower two u8 lanes to two u64 lanes.
convert_to_u64_m128i_from_lower2_u16_m128isse4.1
Convert the lower two u16 lanes to two u64 lanes.
convert_to_u64_m128i_from_lower2_u32_m128isse4.1
Convert the lower two u32 lanes to two u64 lanes.
convert_truncate_to_i32_m128i_from_m256davx
Convert f64 lanes to i32 lanes with truncation.
convert_truncate_to_i32_m256i_from_m256avx
Convert f32 lanes to i32 lanes with truncation.
copy_i64_m128i_ssse2
Copy the low i64 lane to a new register, upper bits 0.
copy_replace_low_f64_m128dsse2
Copies the a value and replaces the low lane with the low b value.
crc32_u8sse4.2
Accumulates the u8 into a running CRC32 value.
crc32_u16sse4.2
Accumulates the u16 into a running CRC32 value.
crc32_u32sse4.2
Accumulates the u32 into a running CRC32 value.
crc32_u64sse4.2
Accumulates the u64 into a running CRC32 value.
div_m128sse
Lanewise a / b.
div_m256avx
Lanewise a / b with f32.
div_m128_ssse
Low lane a / b, other lanes unchanged.
div_m128dsse2
Lanewise a / b.
div_m128d_ssse2
Lowest lane a / b, high lane unchanged.
div_m256davx
Lanewise a / b with f64.
dot_product_m128sse4.1
Performs a dot product of two m128 registers.
dot_product_m256avx
This works like dot_product_m128, but twice as wide.
dot_product_m128dsse4.1
Performs a dot product of two m128d registers.
duplicate_even_lanes_m128sse3
Duplicate the odd lanes to the even lanes.
duplicate_even_lanes_m256avx
Duplicate the even-indexed lanes to the odd lanes.
duplicate_low_lane_m128d_ssse3
Copy the low lane of the input to both lanes of the output.
duplicate_odd_lanes_m128sse3
Duplicate the odd lanes to the even lanes.
duplicate_odd_lanes_m256avx
Duplicate the odd-indexed lanes to the even lanes.
duplicate_odd_lanes_m256davx
Duplicate the odd-indexed lanes to the even lanes.
extract_f32_as_i32_bits_imm_m128sse4.1
Gets the f32 lane requested. Returns as an i32 bit pattern.
extract_i8_as_i32_imm_m128isse4.1
Gets the i8 lane requested. Only the lowest 4 bits are considered.
extract_i8_as_i32_m256iavx2
Gets an i8 value out of an m256i, returns as i32.
extract_i16_as_i32_m128isse2
Gets an i16 value out of an m128i, returns as i32.
extract_i16_as_i32_m256iavx2
Gets an i16 value out of an m256i, returns as i32.
extract_i32_from_m256iavx
Extracts an i32 lane from m256i
extract_i32_imm_m128isse4.1
Gets the i32 lane requested. Only the lowest 2 bits are considered.
extract_i64_from_m256iavx
Extracts an i64 lane from m256i
extract_i64_imm_m128isse4.1
Gets the i64 lane requested. Only the lowest bit is considered.
extract_m128_from_m256avx
Extracts an m128 from m256
extract_m128d_from_m256davx
Extracts an m128d from m256d
extract_m128i_from_m256iavx
Extracts an m128i from m256i
extract_m128i_m256iavx2
Gets an m128i value out of an m256i.
floor_m128sse4.1
Round each lane to a whole number, towards negative infinity
floor_m256avx
Round f32 lanes towards negative infinity.
floor_m128_ssse4.1
Round the low lane of b toward negative infinity, other lanes a.
floor_m128dsse4.1
Round each lane to a whole number, towards negative infinity
floor_m128d_ssse4.1
Round the low lane of b toward negative infinity, high lane is a.
floor_m256davx
Round f64 lanes towards negative infinity.
fused_mul_add_m128fma
Lanewise fused (a * b) + c
fused_mul_add_m256fma
Lanewise fused (a * b) + c
fused_mul_add_m128_sfma
Low lane fused (a * b) + c, other lanes unchanged
fused_mul_add_m128dfma
Lanewise fused (a * b) + c
fused_mul_add_m128d_sfma
Low lane fused (a * b) + c, other lanes unchanged
fused_mul_add_m256dfma
Lanewise fused (a * b) + c
fused_mul_addsub_m128fma
Lanewise fused (a * b) addsub c (adds odd lanes and subtracts even lanes)
fused_mul_addsub_m256fma
Lanewise fused (a * b) addsub c (adds odd lanes and subtracts even lanes)
fused_mul_addsub_m128dfma
Lanewise fused (a * b) addsub c (adds odd lanes and subtracts even lanes)
fused_mul_addsub_m256dfma
Lanewise fused (a * b) addsub c (adds odd lanes and subtracts even lanes)
fused_mul_neg_add_m128fma
Lanewise fused -(a * b) + c
fused_mul_neg_add_m256fma
Lanewise fused -(a * b) + c
fused_mul_neg_add_m128_sfma
Low lane -(a * b) + c, other lanes unchanged.
fused_mul_neg_add_m128dfma
Lanewise fused -(a * b) + c
fused_mul_neg_add_m128d_sfma
Low lane -(a * b) + c, other lanes unchanged.
fused_mul_neg_add_m256dfma
Lanewise fused -(a * b) + c
fused_mul_neg_sub_m128fma
Lanewise fused -(a * b) - c
fused_mul_neg_sub_m256fma
Lanewise fused -(a * b) - c
fused_mul_neg_sub_m128_sfma
Low lane fused -(a * b) - c, other lanes unchanged.
fused_mul_neg_sub_m128dfma
Lanewise fused -(a * b) - c
fused_mul_neg_sub_m128d_sfma
Low lane fused -(a * b) - c, other lanes unchanged.
fused_mul_neg_sub_m256dfma
Lanewise fused -(a * b) - c
fused_mul_sub_m128fma
Lanewise fused (a * b) - c
fused_mul_sub_m256fma
Lanewise fused (a * b) - c
fused_mul_sub_m128_sfma
Low lane fused (a * b) - c, other lanes unchanged.
fused_mul_sub_m128dfma
Lanewise fused (a * b) - c
fused_mul_sub_m128d_sfma
Low lane fused (a * b) - c, other lanes unchanged.
fused_mul_sub_m256dfma
Lanewise fused (a * b) - c
fused_mul_subadd_m128fma
Lanewise fused (a * b) subadd c (subtracts odd lanes and adds even lanes)
fused_mul_subadd_m256fma
Lanewise fused (a * b) subadd c (subtracts odd lanes and adds even lanes)
fused_mul_subadd_m128dfma
Lanewise fused (a * b) subadd c (subtracts odd lanes and adds even lanes)
fused_mul_subadd_m256dfma
Lanewise fused (a * b) subadd c (subtracts odd lanes and adds even lanes)
get_f32_from_m128_ssse
Gets the low lane as an individual f32 value.
get_f64_from_m128d_ssse2
Gets the lower lane as an f64 value.
get_i32_from_m128_ssse
Converts the low lane to i32 and extracts as an individual value.
get_i32_from_m128d_ssse2
Converts the lower lane to an i32 value.
get_i32_from_m128i_ssse2
Converts the lower lane to an i32 value.
get_i64_from_m128_ssse
Converts the low lane to i64 and extracts as an individual value.
get_i64_from_m128d_ssse2
Converts the lower lane to an i64 value.
get_i64_from_m128i_ssse2
Converts the lower lane to an i64 value.
insert_f32_imm_m128sse4.1
Inserts a lane from $b into $a, optionally at a new position.
insert_i8_imm_m128isse4.1
Inserts a new value for the i64 lane specified.
insert_i8_to_m256iavx
Inserts an i8 to m256i
insert_i16_from_i32_m128isse2
Inserts the low 16 bits of an i32 value into an m128i.
insert_i16_to_m256iavx
Inserts an i16 to m256i
insert_i32_imm_m128isse4.1
Inserts a new value for the i32 lane specified.
insert_i32_to_m256iavx
Inserts an i32 to m256i
insert_i64_imm_m128isse4.1
Inserts a new value for the i64 lane specified.
insert_i64_to_m256iavx
Inserts an i64 to m256i
insert_m128_to_m256avx
Inserts an m128 to m256
insert_m128d_to_m256davx
Inserts an m128d to m256d
insert_m128i_to_m256iavx2
Inserts an m128i to an m256i at the high or low position.
insert_m128i_to_m256i_slow_avxavx
Slowly inserts an m128i to m256i.
leading_zero_count_u32lzcnt
Count the leading zeroes in a u32.
leading_zero_count_u64lzcnt
Count the leading zeroes in a u64.
load_f32_m128_ssse
Loads the f32 reference into the low lane of the register.
load_f32_splat_m128sse
Loads the f32 reference into all lanes of a register.
load_f32_splat_m256avx
Load an f32 and splat it to all lanes of an m256d
load_f64_m128d_ssse2
Loads the reference into the low lane of the register.
load_f64_splat_m128dsse2
Loads the f64 reference into all lanes of a register.
load_f64_splat_m256davx
Load an f64 and splat it to all lanes of an m256d
load_i64_m128i_ssse2
Loads the low i64 into a register.
load_m128sse
Loads the reference into a register.
load_m256avx
Load data from memory into a register.
load_m128_splat_m256avx
Load an m128 and splat it to the lower and upper half of an m256
load_m128dsse2
Loads the reference into a register.
load_m128d_splat_m256davx
Load an m128d and splat it to the lower and upper half of an m256d
load_m128isse2
Loads the reference into a register.
load_m256davx
Load data from memory into a register.
load_m256iavx
Load data from memory into a register.
load_masked_i32_m128iavx2
Loads the reference given and zeroes any i32 lanes not in the mask.
load_masked_i32_m256iavx2
Loads the reference given and zeroes any i32 lanes not in the mask.
load_masked_i64_m128iavx2
Loads the reference given and zeroes any i64 lanes not in the mask.
load_masked_i64_m256iavx2
Loads the reference given and zeroes any i64 lanes not in the mask.
load_masked_m128avx
Load data from memory into a register according to a mask.
load_masked_m256avx
Load data from memory into a register according to a mask.
load_masked_m128davx
Load data from memory into a register according to a mask.
load_masked_m256davx
Load data from memory into a register according to a mask.
load_replace_high_m128dsse2
Loads the reference into a register, replacing the high lane.
load_replace_low_m128dsse2
Loads the reference into a register, replacing the low lane.
load_reverse_m128sse
Loads the reference into a register with reversed order.
load_reverse_m128dsse2
Loads the reference into a register with reversed order.
load_unaligned_hi_lo_m256avx
Load data from memory into a register.
load_unaligned_hi_lo_m256davx
Load data from memory into a register.
load_unaligned_hi_lo_m256iavx
Load data from memory into a register.
load_unaligned_m128sse
Loads the reference into a register.
load_unaligned_m256avx
Load data from memory into a register.
load_unaligned_m128dsse2
Loads the reference into a register.
load_unaligned_m128isse2
Loads the reference into a register.
load_unaligned_m256davx
Load data from memory into a register.
load_unaligned_m256iavx
Load data from memory into a register.
max_i8_m128isse4.1
Lanewise max(a, b) with lanes as i8.
max_i8_m256iavx2
Lanewise max(a, b) with lanes as i8.
max_i16_m128isse2
Lanewise max(a, b) with lanes as i16.
max_i16_m256iavx2
Lanewise max(a, b) with lanes as i16.
max_i32_m128isse4.1
Lanewise max(a, b) with lanes as i32.
max_i32_m256iavx2
Lanewise max(a, b) with lanes as i32.
max_m128sse
Lanewise max(a, b).
max_m256avx
Lanewise max(a, b).
max_m128_ssse
Low lane max(a, b), other lanes unchanged.
max_m128dsse2
Lanewise max(a, b).
max_m128d_ssse2
Low lane max(a, b), other lanes unchanged.
max_m256davx
Lanewise max(a, b).
max_u8_m128isse2
Lanewise max(a, b) with lanes as u8.
max_u8_m256iavx2
Lanewise max(a, b) with lanes as u8.
max_u16_m128isse4.1
Lanewise max(a, b) with lanes as u16.
max_u16_m256iavx2
Lanewise max(a, b) with lanes as u16.
max_u32_m128isse4.1
Lanewise max(a, b) with lanes as u32.
max_u32_m256iavx2
Lanewise max(a, b) with lanes as u32.
min_i8_m128isse4.1
Lanewise min(a, b) with lanes as i8.
min_i8_m256iavx2
Lanewise min(a, b) with lanes as i8.
min_i16_m128isse2
Lanewise min(a, b) with lanes as i16.
min_i16_m256iavx2
Lanewise min(a, b) with lanes as i16.
min_i32_m128isse4.1
Lanewise min(a, b) with lanes as i32.
min_i32_m256iavx2
Lanewise min(a, b) with lanes as i32.
min_m128sse
Lanewise min(a, b).
min_m256avx
Lanewise min(a, b).
min_m128_ssse
Low lane min(a, b), other lanes unchanged.
min_m128dsse2
Lanewise min(a, b).
min_m128d_ssse2
Low lane min(a, b), other lanes unchanged.
min_m256davx
Lanewise min(a, b).
min_position_u16_m128isse4.1
Min u16 value, position, and other lanes zeroed.
min_u8_m128isse2
Lanewise min(a, b) with lanes as u8.
min_u8_m256iavx2
Lanewise min(a, b) with lanes as u8.
min_u16_m128isse4.1
Lanewise min(a, b) with lanes as u16.
min_u16_m256iavx2
Lanewise min(a, b) with lanes as u16.
min_u32_m128isse4.1
Lanewise min(a, b) with lanes as u32.
min_u32_m256iavx2
Lanewise min(a, b) with lanes as u32.
move_high_low_m128sse
Move the high lanes of b to the low lanes of a, other lanes unchanged.
move_low_high_m128sse
Move the low lanes of b to the high lanes of a, other lanes unchanged.
move_m128_ssse
Move the low lane of b to a, other lanes unchanged.
move_mask_i8_m128isse2
Gathers the i8 sign bit of each lane.
move_mask_i8_m256iavx2
Create an i32 mask of each sign bit in the i8 lanes.
move_mask_m128sse
Gathers the sign bit of each lane.
move_mask_m256avx
Collects the sign bit of each lane into a 4-bit value.
move_mask_m128dsse2
Gathers the sign bit of each lane.
move_mask_m256davx
Collects the sign bit of each lane into a 4-bit value.
mul_32_m128isse4.1
Lanewise a * b with 32-bit lanes.
mul_extended_u32bmi2
Multiply two u32, outputting the low bits and storing the high bits in the reference.
mul_extended_u64bmi2
Multiply two u64, outputting the low bits and storing the high bits in the reference.
mul_i16_horizontal_add_m128isse2
Multiply i16 lanes producing i32 values, horizontal add pairs of i32 values to produce the final output.
mul_i16_horizontal_add_m256iavx2
Multiply i16 lanes producing i32 values, horizontal add pairs of i32 values to produce the final output.
mul_i16_keep_high_m128isse2
Lanewise a * b with lanes as i16, keep the high bits of the i32 intermediates.
mul_i16_keep_high_m256iavx2
Multiply the i16 lanes and keep the high half of each 32-bit output.
mul_i16_keep_low_m128isse2
Lanewise a * b with lanes as i16, keep the low bits of the i32 intermediates.
mul_i16_keep_low_m256iavx2
Multiply the i16 lanes and keep the low half of each 32-bit output.
mul_i16_scale_round_m128issse3
Multiply i16 lanes into i32 intermediates, keep the high 18 bits, round by adding 1, right shift by 1.
mul_i16_scale_round_m256iavx2
Multiply i16 lanes into i32 intermediates, keep the high 18 bits, round by adding 1, right shift by 1.
mul_i32_keep_low_m256iavx2
Multiply the i32 lanes and keep the low half of each 64-bit output.
mul_i64_carryless_m128ipclmulqdq
Performs a “carryless” multiplication of two i64 values.
mul_i64_low_bits_m256iavx2
Multiply the lower i32 within each i64 lane, i64 output.
mul_m128sse
Lanewise a * b.
mul_m256avx
Lanewise a * b with f32 lanes.
mul_m128_ssse
Low lane a * b, other lanes unchanged.
mul_m128dsse2
Lanewise a * b.
mul_m128d_ssse2
Lowest lane a * b, high lane unchanged.
mul_m256davx
Lanewise a * b with f64 lanes.
mul_u8i8_add_horizontal_saturating_m128issse3
This is dumb and weird.
mul_u8i8_add_horizontal_saturating_m256iavx2
This is dumb and weird.
mul_u16_keep_high_m128isse2
Lanewise a * b with lanes as u16, keep the high bits of the u32 intermediates.
mul_u16_keep_high_m256iavx2
Multiply the u16 lanes and keep the high half of each 32-bit output.
mul_u64_low_bits_m256iavx2
Multiply the lower u32 within each u64 lane, u64 output.
mul_widen_i32_odd_m128isse4.1
Multiplies the odd i32 lanes and gives the widened (i64) results.
mul_widen_u32_odd_m128isse2
Multiplies the odd u32 lanes and gives the widened (u64) results.
multi_packed_sum_abs_diff_u8_m128isse4.1
Computes eight u16 “sum of absolute difference” values according to the bytes selected.
multi_packed_sum_abs_diff_u8_m256iavx2
Computes eight u16 “sum of absolute difference” values according to the bytes selected.
pack_i16_to_i8_m128isse2
Saturating convert i16 to i8, and pack the values.
pack_i16_to_i8_m256iavx2
Saturating convert i16 to i8, and pack the values.
pack_i16_to_u8_m128isse2
Saturating convert i16 to u8, and pack the values.
pack_i16_to_u8_m256iavx2
Saturating convert i16 to u8, and pack the values.
pack_i32_to_i16_m128isse2
Saturating convert i32 to i16, and pack the values.
pack_i32_to_i16_m256iavx2
Saturating convert i32 to i16, and pack the values.
pack_i32_to_u16_m128isse4.1
Saturating convert i32 to u16, and pack the values.
pack_i32_to_u16_m256iavx2
Saturating convert i32 to u16, and pack the values.
permute2z_m256avx
Shuffle 128 bits of floating point data at a time from $a and $b using an immediate control value.
permute2z_m256davx
Shuffle 128 bits of floating point data at a time from a and b using an immediate control value.
permute2z_m256iavx
Slowly swizzle 128 bits of integer data from a and b using an immediate control value.
permute_m128avx
Shuffle the f32 lanes from a using an immediate control value.
permute_m256avx
Shuffle the f32 lanes in a using an immediate control value.
permute_m128davx
Shuffle the f64 lanes in a using an immediate control value.
permute_m256davx
Shuffle the f64 lanes from a together using an immediate control value.
population_count_i32popcnt
Count the number of bits set within an i32
population_count_i64popcnt
Count the number of bits set within an i64
population_deposit_u32bmi2
Deposit contiguous low bits from a u32 according to a mask.
population_deposit_u64bmi2
Deposit contiguous low bits from a u64 according to a mask.
population_extract_u32bmi2
Extract bits from a u32 according to a mask.
population_extract_u64bmi2
Extract bits from a u64 according to a mask.
prefetch_et0sse
Fetches the cache line containing addr into all levels of the cache hierarchy, anticipating write
prefetch_et1sse
Fetches into L2 and higher, anticipating write
prefetch_ntasse
Fetch data using the non-temporal access (NTA) hint. It may be a place closer than main memory but outside of the cache hierarchy. This is used to reduce access latency without polluting the cache.
prefetch_t0sse
Fetches the cache line containing addr into all levels of the cache hierarchy.
prefetch_t1sse
Fetches into L2 and higher.
prefetch_t2sse
Fetches into L3 and higher or an implementation-specific choice (e.g., L2 if there is no L3).
rdrand_u16rdrand
Try to obtain a random u16 from the hardware RNG.
rdrand_u32rdrand
Try to obtain a random u32 from the hardware RNG.
rdrand_u64rdrand
Try to obtain a random u64 from the hardware RNG.
rdseed_u16rdseed
Try to obtain a random u16 from the hardware RNG.
rdseed_u32rdseed
Try to obtain a random u32 from the hardware RNG.
rdseed_u64rdseed
Try to obtain a random u64 from the hardware RNG.
read_timestamp_counter
Reads the CPU’s timestamp counter value.
read_timestamp_counter_p
Reads the CPU’s timestamp counter value and store the processor signature.
reciprocal_m128sse
Lanewise 1.0 / a approximation.
reciprocal_m256avx
Reciprocal of f32 lanes.
reciprocal_m128_ssse
Low lane 1.0 / a approximation, other lanes unchanged.
reciprocal_sqrt_m128sse
Lanewise 1.0 / sqrt(a) approximation.
reciprocal_sqrt_m256avx
Reciprocal of f32 lanes.
reciprocal_sqrt_m128_ssse
Low lane 1.0 / sqrt(a) approximation, other lanes unchanged.
round_m128sse4.1
Rounds each lane in the style specified.
round_m256avx
Rounds each lane in the style specified.
round_m128_ssse4.1
Rounds $b low as specified, other lanes use $a.
round_m128dsse4.1
Rounds each lane in the style specified.
round_m128d_ssse4.1
Rounds $b low as specified, keeps $a high.
round_m256davx
Rounds each lane in the style specified.
search_explicit_str_for_indexsse4.2
Search for needle in `haystack, with explicit string length.
search_explicit_str_for_masksse4.2
Search for needle in `haystack, with explicit string length.
search_implicit_str_for_indexsse4.2
Search for needle in `haystack, with implicit string length.
search_implicit_str_for_masksse4.2
Search for needle in `haystack, with implicit string length.
set_i8_m128isse2
Sets the args into an m128i, first arg is the high lane.
set_i8_m256iavx
Set i8 args into an m256i lane.
set_i16_m128isse2
Sets the args into an m128i, first arg is the high lane.
set_i16_m256iavx
Set i16 args into an m256i lane.
set_i32_m128isse2
Sets the args into an m128i, first arg is the high lane.
set_i32_m128i_ssse2
Set an i32 as the low 32-bit lane of an m128i, other lanes blank.
set_i32_m256iavx
Set i32 args into an m256i lane.
set_i64_m128isse2
Sets the args into an m128i, first arg is the high lane.
set_i64_m128i_ssse2
Set an i64 as the low 64-bit lane of an m128i, other lanes blank.
set_i64_m256iavx
Set i64 args into an m256i lane.
set_m128sse
Sets the args into an m128, first arg is the high lane.
set_m256avx
Set f32 args into an m256 lane.
set_m128_m256avx
Set m128 args into an m256.
set_m128_ssse
Sets the args into an m128, first arg is the high lane.
set_m128dsse2
Sets the args into an m128d, first arg is the high lane.
set_m128d_m256davx
Set m128d args into an m256d.
set_m128d_ssse2
Sets the args into the low lane of a m128d.
set_m128i_m256iavx
Set m128i args into an m256i.
set_m256davx
Set f64 args into an m256d lane.
set_reversed_i8_m128isse2
Sets the args into an m128i, first arg is the low lane.
set_reversed_i8_m256iavx
Set i8 args into an m256i lane.
set_reversed_i16_m128isse2
Sets the args into an m128i, first arg is the low lane.
set_reversed_i16_m256iavx
Set i16 args into an m256i lane.
set_reversed_i32_m128isse2
Sets the args into an m128i, first arg is the low lane.
set_reversed_i32_m256iavx
Set i32 args into an m256i lane.
set_reversed_i64_m256iavx
Set i64 args into an m256i lane.
set_reversed_m128sse
Sets the args into an m128, first arg is the low lane.
set_reversed_m256avx
Set f32 args into an m256 lane.
set_reversed_m128_m256avx
Set m128 args into an m256.
set_reversed_m128dsse2
Sets the args into an m128d, first arg is the low lane.
set_reversed_m128d_m256davx
Set m128d args into an m256d.
set_reversed_m128i_m256iavx
Set m128i args into an m256i.
set_reversed_m256davx
Set f64 args into an m256d lane.
set_splat_i8_m128isse2
Splats the i8 to all lanes of the m128i.
set_splat_i8_m128i_s_m256iavx2
Sets the lowest i8 lane of an m128i as all lanes of an m256i.
set_splat_i8_m256iavx
Splat an i8 arg into an m256i lane.
set_splat_i16_m128isse2
Splats the i16 to all lanes of the m128i.
set_splat_i16_m128i_s_m256iavx2
Sets the lowest i16 lane of an m128i as all lanes of an m256i.
set_splat_i16_m256iavx
Splat an i16 arg into an m256i lane.
set_splat_i32_m128isse2
Splats the i32 to all lanes of the m128i.
set_splat_i32_m128i_s_m256iavx2
Sets the lowest i32 lane of an m128i as all lanes of an m256i.
set_splat_i32_m256iavx
Splat an i32 arg into an m256i lane.
set_splat_i64_m128isse2
Splats the i64 to both lanes of the m128i.
set_splat_i64_m128i_s_m256iavx2
Sets the lowest i64 lane of an m128i as all lanes of an m256i.
set_splat_i64_m256iavx
Splat an i64 arg into an m256i lane.
set_splat_m128sse
Splats the value to all lanes.
set_splat_m256avx
Splat an f32 arg into an m256 lane.
set_splat_m128_s_m256avx2
Sets the lowest lane of an m128 as all lanes of an m256.
set_splat_m128dsse2
Splats the args into both lanes of the m128d.
set_splat_m128d_s_m256davx2
Sets the lowest lane of an m128d as all lanes of an m256d.
set_splat_m256davx
Splat an f64 arg into an m256d lane.
shl_all_u16_m128isse2
Shift all u16 lanes to the left by the count in the lower u64 lane.
shl_all_u16_m256iavx2
Lanewise u16 shift left by the lower u64 lane of count.
shl_all_u32_m128isse2
Shift all u32 lanes to the left by the count in the lower u64 lane.
shl_all_u32_m256iavx2
Shift all u32 lanes left by the lower u64 lane of count.
shl_all_u64_m128isse2
Shift all u64 lanes to the left by the count in the lower u64 lane.
shl_all_u64_m256iavx2
Shift all u64 lanes left by the lower u64 lane of count.
shl_each_u32_m128iavx2
Shift u32 values to the left by count bits.
shl_each_u32_m256iavx2
Lanewise u32 shift left by the matching i32 lane in count.
shl_each_u64_m128iavx2
Shift u64 values to the left by count bits.
shl_each_u64_m256iavx2
Lanewise u64 shift left by the matching u64 lane in count.
shl_imm_u16_m128isse2
Shifts all u16 lanes left by an immediate.
shl_imm_u16_m256iavx2
Shifts all u16 lanes left by an immediate.
shl_imm_u32_m128isse2
Shifts all u32 lanes left by an immediate.
shl_imm_u32_m256iavx2
Shifts all u32 lanes left by an immediate.
shl_imm_u64_m128isse2
Shifts both u64 lanes left by an immediate.
shl_imm_u64_m256iavx2
Shifts all u64 lanes left by an immediate.
shr_all_i16_m128isse2
Shift each i16 lane to the right by the count in the lower i64 lane.
shr_all_i16_m256iavx2
Lanewise i16 shift right by the lower i64 lane of count.
shr_all_i32_m128isse2
Shift each i32 lane to the right by the count in the lower i64 lane.
shr_all_i32_m256iavx2
Lanewise i32 shift right by the lower i64 lane of count.
shr_all_u16_m128isse2
Shift each u16 lane to the right by the count in the lower u64 lane.
shr_all_u16_m256iavx2
Lanewise u16 shift right by the lower u64 lane of count.
shr_all_u32_m128isse2
Shift each u32 lane to the right by the count in the lower u64 lane.
shr_all_u32_m256iavx2
Lanewise u32 shift right by the lower u64 lane of count.
shr_all_u64_m128isse2
Shift each u64 lane to the right by the count in the lower u64 lane.
shr_all_u64_m256iavx2
Lanewise u64 shift right by the lower u64 lane of count.
shr_each_i32_m128iavx2
Shift i32 values to the right by count bits.
shr_each_i32_m256iavx2
Lanewise i32 shift right by the matching i32 lane in count.
shr_each_u32_m128iavx2
Shift u32 values to the left by count bits.
shr_each_u32_m256iavx2
Lanewise u32 shift right by the matching u32 lane in count.
shr_each_u64_m128iavx2
Shift u64 values to the left by count bits.
shr_each_u64_m256iavx2
Lanewise u64 shift right by the matching i64 lane in count.
shr_imm_i16_m128isse2
Shifts all i16 lanes right by an immediate.
shr_imm_i16_m256iavx2
Shifts all i16 lanes left by an immediate.
shr_imm_i32_m128isse2
Shifts all i32 lanes right by an immediate.
shr_imm_i32_m256iavx2
Shifts all i32 lanes left by an immediate.
shr_imm_u16_m128isse2
Shifts all u16 lanes right by an immediate.
shr_imm_u16_m256iavx2
Shifts all u16 lanes right by an immediate.
shr_imm_u32_m128isse2
Shifts all u32 lanes right by an immediate.
shr_imm_u32_m256iavx2
Shifts all u32 lanes right by an immediate.
shr_imm_u64_m128isse2
Shifts both u64 lanes right by an immediate.
shr_imm_u64_m256iavx2
Shifts all u64 lanes right by an immediate.
shuffle_abi_f32_all_m128sse
Shuffle the f32 lanes from $a and $b together using an immediate control value.
shuffle_abi_f64_all_m128dsse2
Shuffle the f64 lanes from $a and $b together using an immediate control value.
shuffle_abi_i128z_all_m256iavx2
Shuffle 128 bits of integer data from $a and $b using an immediate control value.
shuffle_ai_f32_all_m128isse2
Shuffle the i32 lanes in $a using an immediate control value.
shuffle_ai_f64_all_m256davx2
Shuffle the f64 lanes from $a using an immediate control value.
shuffle_ai_i16_h64all_m128isse2
Shuffle the high i16 lanes in $a using an immediate control value.
shuffle_ai_i16_h64half_m256iavx2
Shuffle the high i16 lanes in $a using an immediate control value.
shuffle_ai_i16_l64all_m128isse2
Shuffle the low i16 lanes in $a using an immediate control value.
shuffle_ai_i16_l64half_m256iavx2
Shuffle the low i16 lanes in $a using an immediate control value.
shuffle_ai_i32_half_m256iavx2
Shuffle the i32 lanes in a using an immediate control value.
shuffle_ai_i64_all_m256iavx2
Shuffle the f64 lanes in $a using an immediate control value.
shuffle_av_f32_all_m128avx
Shuffle f32 values in a using i32 values in v.
shuffle_av_f32_half_m256avx
Shuffle f32 values in a using i32 values in v.
shuffle_av_f64_all_m128davx
Shuffle f64 lanes in a using bit 1 of the i64 lanes in v
shuffle_av_f64_half_m256davx
Shuffle f64 lanes in a using bit 1 of the i64 lanes in v.
shuffle_av_i8z_all_m128issse3
Shuffle i8 lanes in a using i8 values in v.
shuffle_av_i8z_half_m256iavx2
Shuffle i8 lanes in a using i8 values in v.
shuffle_av_i32_all_m256avx2
Shuffle f32 lanes in a using i32 values in v.
shuffle_av_i32_all_m256iavx2
Shuffle i32 lanes in a using i32 values in v.
shuffle_m256avx
Shuffle the f32 lanes from a and b together using an immediate control value.
shuffle_m256davx
Shuffle the f64 lanes from a and b together using an immediate control value.
sign_apply_i8_m128issse3
Applies the sign of i8 values in b to the values in a.
sign_apply_i8_m256iavx2
Lanewise a * signum(b) with lanes as i8
sign_apply_i16_m128issse3
Applies the sign of i16 values in b to the values in a.
sign_apply_i16_m256iavx2
Lanewise a * signum(b) with lanes as i16
sign_apply_i32_m128issse3
Applies the sign of i32 values in b to the values in a.
sign_apply_i32_m256iavx2
Lanewise a * signum(b) with lanes as i32
splat_i8_m128i_s_m128iavx2
Splat the lowest 8-bit lane across the entire 128 bits.
splat_i16_m128i_s_m128iavx2
Splat the lowest 16-bit lane across the entire 128 bits.
splat_i32_m128i_s_m128iavx2
Splat the lowest 32-bit lane across the entire 128 bits.
splat_i64_m128i_s_m128iavx2
Splat the lowest 64-bit lane across the entire 128 bits.
splat_m128_s_m128avx2
Splat the lowest f32 across all four lanes.
splat_m128d_s_m128davx2
Splat the lower f64 across both lanes of m128d.
splat_m128i_m256iavx2
Splat the 128-bits across 256-bits.
sqrt_m128sse
Lanewise sqrt(a).
sqrt_m256avx
Lanewise sqrt on f64 lanes.
sqrt_m128_ssse
Low lane sqrt(a), other lanes unchanged.
sqrt_m128dsse2
Lanewise sqrt(a).
sqrt_m128d_ssse2
Low lane sqrt(b), upper lane is unchanged from a.
sqrt_m256davx
Lanewise sqrt on f64 lanes.
store_high_m128d_ssse2
Stores the high lane value to the reference given.
store_i64_m128i_ssse2
Stores the value to the reference given.
store_m128sse
Stores the value to the reference given.
store_m256avx
Store data from a register into memory.
store_m128_ssse
Stores the low lane value to the reference given.
store_m128dsse2
Stores the value to the reference given.
store_m128d_ssse2
Stores the low lane value to the reference given.
store_m128isse2
Stores the value to the reference given.
store_m256davx
Store data from a register into memory.
store_m256iavx
Store data from a register into memory.
store_masked_i32_m128iavx2
Stores the i32 masked lanes given to the reference.
store_masked_i32_m256iavx2
Stores the i32 masked lanes given to the reference.
store_masked_i64_m128iavx2
Stores the i32 masked lanes given to the reference.
store_masked_i64_m256iavx2
Stores the i32 masked lanes given to the reference.
store_masked_m128avx
Store data from a register into memory according to a mask.
store_masked_m256avx
Store data from a register into memory according to a mask.
store_masked_m128davx
Store data from a register into memory according to a mask.
store_masked_m256davx
Store data from a register into memory according to a mask.
store_reverse_m128sse
Stores the value to the reference given in reverse order.
store_reversed_m128dsse2
Stores the value to the reference given.
store_splat_m128sse
Stores the low lane value to all lanes of the reference given.
store_splat_m128dsse2
Stores the low lane value to all lanes of the reference given.
store_unaligned_hi_lo_m256avx
Store data from a register into memory.
store_unaligned_hi_lo_m256davx
Store data from a register into memory.
store_unaligned_hi_lo_m256iavx
Store data from a register into memory.
store_unaligned_m128sse
Stores the value to the reference given.
store_unaligned_m256avx
Store data from a register into memory.
store_unaligned_m128dsse2
Stores the value to the reference given.
store_unaligned_m128isse2
Stores the value to the reference given.
store_unaligned_m256davx
Store data from a register into memory.
store_unaligned_m256iavx
Store data from a register into memory.
sub_horizontal_i16_m128issse3
Subtract horizontal pairs of i16 values, pack the outputs as a then b.
sub_horizontal_i16_m256iavx2
Horizontal a - b with lanes as i16.
sub_horizontal_i32_m128issse3
Subtract horizontal pairs of i32 values, pack the outputs as a then b.
sub_horizontal_i32_m256iavx2
Horizontal a - b with lanes as i32.
sub_horizontal_m128sse3
Subtract each lane horizontally, pack the outputs as a then b.
sub_horizontal_m256avx
Subtract adjacent f32 lanes.
sub_horizontal_m128dsse3
Subtract each lane horizontally, pack the outputs as a then b.
sub_horizontal_m256davx
Subtract adjacent f64 lanes.
sub_horizontal_saturating_i16_m128issse3
Subtract horizontal pairs of i16 values, saturating, pack the outputs as a then b.
sub_horizontal_saturating_i16_m256iavx2
Horizontal saturating a - b with lanes as i16.
sub_i8_m128isse2
Lanewise a - b with lanes as i8.
sub_i8_m256iavx2
Lanewise a - b with lanes as i8.
sub_i16_m128isse2
Lanewise a - b with lanes as i16.
sub_i16_m256iavx2
Lanewise a - b with lanes as i16.
sub_i32_m128isse2
Lanewise a - b with lanes as i32.
sub_i32_m256iavx2
Lanewise a - b with lanes as i32.
sub_i64_m128isse2
Lanewise a - b with lanes as i64.
sub_i64_m256iavx2
Lanewise a - b with lanes as i64.
sub_m128sse
Lanewise a - b.
sub_m256avx
Lanewise a - b with f32 lanes.
sub_m128_ssse
Low lane a - b, other lanes unchanged.
sub_m128dsse2
Lanewise a - b.
sub_m128d_ssse2
Lowest lane a - b, high lane unchanged.
sub_m256davx
Lanewise a - b with f64 lanes.
sub_saturating_i8_m128isse2
Lanewise saturating a - b with lanes as i8.
sub_saturating_i8_m256iavx2
Lanewise saturating a - b with lanes as i8.
sub_saturating_i16_m128isse2
Lanewise saturating a - b with lanes as i16.
sub_saturating_i16_m256iavx2
Lanewise saturating a - b with lanes as i16.
sub_saturating_u8_m128isse2
Lanewise saturating a - b with lanes as u8.
sub_saturating_u8_m256iavx2
Lanewise saturating a - b with lanes as u8.
sub_saturating_u16_m128isse2
Lanewise saturating a - b with lanes as u16.
sub_saturating_u16_m256iavx2
Lanewise saturating a - b with lanes as u16.
sum_of_u8_abs_diff_m128isse2
Compute “sum of u8 absolute differences”.
sum_of_u8_abs_diff_m256iavx2
Compute “sum of u8 absolute differences”.
test_all_ones_m128isse4.1
Tests if all bits are 1.
test_all_zeroes_m128isse4.1
Returns if all masked bits are 0, (a & mask) as u128 == 0
test_mixed_ones_and_zeroes_m128isse4.1
Returns if, among the masked bits, there’s both 0s and 1s
testc_m128avx
Compute the bitwise of sign bit NOT of a and then AND with b, returns 1 if the result is zero, otherwise 0.
testc_m256avx
Compute the bitwise of sign bit NOT of a and then AND with b, returns 1 if the result is zero, otherwise 0.
testc_m128davx
Compute the bitwise of sign bit NOT of a and then AND with b, returns 1 if the result is zero, otherwise 0.
testc_m128isse4.1
Compute the bitwise NOT of a and then AND with b, returns 1 if the result is zero, otherwise 0.
testc_m256davx
Compute the bitwise of sign bit NOT of a and then AND with b, returns 1 if the result is zero, otherwise 0.
testc_m256iavx
Compute the bitwise NOT of a and then AND with b, returns 1 if the result is zero, otherwise 0.
testz_m128avx
Computes the bitwise AND of 256 bits in a and b, returns 1 if the result is zero, otherwise 0.
testz_m256avx
Computes the bitwise AND of 256 bits in a and b, returns 1 if the result is zero, otherwise 0.
testz_m128davx
Computes the bitwise of sign bitAND of 256 bits in a and b, returns 1 if the result is zero, otherwise 0.
testz_m128isse4.1
Computes the bitwise AND of 256 bits in a and b, returns 1 if the result is zero, otherwise 0.
testz_m256davx
Computes the bitwise of sign bit AND of 256 bits in a and b, returns 1 if the result is zero, otherwise 0.
testz_m256iavx
Computes the bitwise of sign bit AND of 256 bits in a and b, returns 1 if the result is zero, otherwise 0.
trailing_zero_count_u32bmi1
Counts the number of trailing zero bits in a u32.
trailing_zero_count_u64bmi1
Counts the number of trailing zero bits in a u64.
transpose_four_m128sse
Transpose four m128 as if they were a 4x4 matrix.
truncate_m128_to_m128isse2
Truncate the f32 lanes to i32 lanes.
truncate_m128d_to_m128isse2
Truncate the f64 lanes to the lower i32 lanes (upper i32 lanes 0).
truncate_to_i32_m128d_ssse2
Truncate the lower lane into an i32.
truncate_to_i64_m128d_ssse2
Truncate the lower lane into an i64.
unpack_hi_m256avx
Unpack and interleave the high lanes.
unpack_hi_m256davx
Unpack and interleave the high lanes.
unpack_high_i8_m128isse2
Unpack and interleave high i8 lanes of a and b.
unpack_high_i8_m256iavx2
Unpack and interleave high i8 lanes of a and b.
unpack_high_i16_m128isse2
Unpack and interleave high i16 lanes of a and b.
unpack_high_i16_m256iavx2
Unpack and interleave high i16 lanes of a and b.
unpack_high_i32_m128isse2
Unpack and interleave high i32 lanes of a and b.
unpack_high_i32_m256iavx2
Unpack and interleave high i32 lanes of a and b.
unpack_high_i64_m128isse2
Unpack and interleave high i64 lanes of a and b.
unpack_high_i64_m256iavx2
Unpack and interleave high i64 lanes of a and b.
unpack_high_m128sse
Unpack and interleave high lanes of a and b.
unpack_high_m128dsse2
Unpack and interleave high lanes of a and b.
unpack_lo_m256avx
Unpack and interleave the high lanes.
unpack_lo_m256davx
Unpack and interleave the high lanes.
unpack_low_i8_m128isse2
Unpack and interleave low i8 lanes of a and b.
unpack_low_i8_m256iavx2
Unpack and interleave low i8 lanes of a and b.
unpack_low_i16_m128isse2
Unpack and interleave low i16 lanes of a and b.
unpack_low_i16_m256iavx2
Unpack and interleave low i16 lanes of a and b.
unpack_low_i32_m128isse2
Unpack and interleave low i32 lanes of a and b.
unpack_low_i32_m256iavx2
Unpack and interleave low i32 lanes of a and b.
unpack_low_i64_m128isse2
Unpack and interleave low i64 lanes of a and b.
unpack_low_i64_m256iavx2
Unpack and interleave low i64 lanes of a and b.
unpack_low_m128sse
Unpack and interleave low lanes of a and b.
unpack_low_m128dsse2
Unpack and interleave low lanes of a and b.
zero_extend_m128avx
Zero extend an m128 to m256
zero_extend_m128davx
Zero extend an m128d to m256d
zero_extend_m128iavx
Zero extend an m128i to m256i
zeroed_m128sse
All lanes zero.
zeroed_m256avx
A zeroed m256
zeroed_m128dsse2
Both lanes zero.
zeroed_m128isse2
All lanes zero.
zeroed_m256davx
A zeroed m256d
zeroed_m256iavx
A zeroed m256i