Crate safe_arch

Expand description

A crate that safely exposes arch intrinsics via #[cfg()].

safe_arch lets you safely use CPU intrinsics. Those things in the core::arch modules. It works purely via #[cfg()] and compile time CPU feature declaration. If you want to check for a feature at runtime and then call an intrinsic or use a fallback path based on that then this crate is sadly not for you.

SIMD register types are “newtype’d” so that better trait impls can be given to them, but the inner value is a pub field so feel free to just grab it out if you need to. Trait impls of the newtypes include: Default (zeroed), From/Into of appropriate data types, and appropriate operator overloading.

Most intrinsics (like addition and multiplication) are totally safe to use as long as the CPU feature is available. In this case, what you get is 1:1 with the actual intrinsic.
Some intrinsics take a pointer of an assumed minimum alignment and validity span. For these, the safe_arch function takes a reference of an appropriate type to uphold safety.
- Try the bytemuck crate (and turn on the bytemuck feature of this crate) if you want help safely casting between reference types.
Some intrinsics are not safe unless you’re very careful about how you use them, such as the streaming operations requiring you to use them in combination with an appropriate memory fence. Those operations aren’t exposed here.
Some intrinsics mess with the processor state, such as changing the floating point flags, saving and loading special register state, and so on. LLVM doesn’t really support you messing with that within a high level language, so those operations aren’t exposed here. Use assembly or something if you want to do that.

§Naming Conventions

The safe_arch crate does not simply use the “official” names for each intrinsic, because the official names are generally poor. Instead, the operations have been given better names that makes things hopefully easier to understand then you’re reading the code.

For a full explanation of the naming used, see the Naming Conventions page.

§Current Support

x86 / x86_64 (Intel, AMD, etc)
- 128-bit: sse, sse2, sse3, ssse3, sse4.1, sse4.2
- 256-bit: avx, avx2
- Other: adx, aes, bmi1, bmi2, fma, lzcnt, pclmulqdq, popcnt, rdrand, rdseed

§Compile Time CPU Target Features

At the time of me writing this, Rust enables the sse and sse2 CPU features by default for all i686 (x86) and x86_64 builds. Those CPU features are built into the design of x86_64, and you’d need a super old x86 CPU for it to not support at least sse and sse2, so they’re a safe bet for the language to enable all the time. In fact, because the standard library is compiled with them enabled, simply trying to disable those features would actually cause ABI issues and fill your program with UB (link).

If you want additional CPU features available at compile time you’ll have to enable them with an additional arg to rustc. For a feature named name you pass -C target-feature=+name, such as -C target-feature=+sse3 for sse3.

You can alternately enable all target features of the current CPU with -C target-cpu=native. This is primarily of use if you’re building a program you’ll only run on your own system.

It’s sometimes hard to know if your target platform will support a given feature set, but the Steam Hardware Survey is generally taken as a guide to what you can expect people to have available. If you click “Other Settings” it’ll expand into a list of CPU target features and how common they are. These days, it seems that sse3 can be safely assumed, and ssse3, sse4.1, and sse4.2 are pretty safe bets as well. The stuff above 128-bit isn’t as common yet, give it another few years.

Please note that executing a program on a CPU that doesn’t support the target features it was compiles for is Undefined Behavior.

Currently, Rust doesn’t actually support an easy way for you to check that a feature enabled at compile time is actually available at runtime. There is the “feature_detected” family of macros, but if you enable a feature they will evaluate to a constant true instead of actually deferring the check for the feature to runtime. This means that, if you did want a check at the start of your program, to confirm that all the assumed features are present and error out when the assumptions don’t hold, you can’t use that macro. You gotta use CPUID and check manually. rip. Hopefully we can make that process easier in a future version of this crate.

§A Note On Working With Cfg

There’s two main ways to use cfg:

Via an attribute placed on an item, block, or expression:
- #[cfg(debug_assertions)] println!("hello");
Via a macro used within an expression position:
- if cfg!(debug_assertions) { println!("hello"); }

The difference might seem small but it’s actually very important:

The attribute form will include code or not before deciding if all the items named and so forth really exist or not. This means that code that is configured via attribute can safely name things that don’t always exist as long as the things they name do exist whenever that code is configured into the build.
The macro form will include the configured code no matter what, and then the macro resolves to a constant true or false and the compiler uses dead code elimination to cut out the path not taken.

This crate uses cfg via the attribute, so the functions it exposes don’t exist at all when the appropriate CPU target features aren’t enabled. Accordingly, if you plan to call this crate or not depending on what features are enabled in the build you’ll also need to control your use of this crate via cfg attribute, not cfg macro.

Modules§

naming_conventions: An explanation of the crate’s naming conventions.

Macros§

cmp_opavx: Turns a comparison operator token to the correct constant value.
round_opavx: Turns a round operator token to the correct constant value.

Structs§

m128: The data for a 128-bit SSE register of four f32 lanes.
m256: The data for a 256-bit AVX register of eight f32 lanes.
m512: The data for a 512-bit AVX-512 register of sixteen f32 lanes.
m128d: The data for a 128-bit SSE register of two f64 values.
m128i: The data for a 128-bit SSE register of integer data.
m256d: The data for a 256-bit AVX register of four f64 values.
m256i: The data for a 256-bit AVX register of integer data.
m512d: The data for a 512-bit AVX-512 register of eight f64 values.
m512i: The data for a 512-bit AVX-512 register of integer data.

Constants§

STR_CMP_BIT_MASK: Return the bitwise mask of matches.
STR_CMP_EQ_ANY: Matches when any haystack character equals any needle character, regardless of position.
STR_CMP_EQ_EACH: Matches when a character position in the needle is equal to the character at the same position in the haystack.
STR_CMP_EQ_ORDERED: Matches when the complete needle string is a substring somewhere in the haystack.
STR_CMP_FIRST_MATCH: Return the index of the first match found.
STR_CMP_I8: string segment elements are i8 values
STR_CMP_I16: string segment elements are i16 values
STR_CMP_LAST_MATCH: Return the index of the last match found.
STR_CMP_RANGES: Interprets consecutive pairs of characters in the needle as (low..=high) ranges to compare each haystack character to.
STR_CMP_U8: string segment elements are u8 values
STR_CMP_U16: string segment elements are u16 values
STR_CMP_UNIT_MASK: Return the lanewise mask of matches.

Functions§

abs_i8_m128issse3: Lanewise absolute value with lanes as i8.
abs_i8_m256iavx2: Absolute value of i8 lanes.
abs_i16_m128issse3: Lanewise absolute value with lanes as i16.
abs_i16_m256iavx2: Absolute value of i16 lanes.
abs_i32_m128issse3: Lanewise absolute value with lanes as i32.
abs_i32_m256iavx2: Absolute value of i32 lanes.
add_carry_u32adx: Add two u32 with a carry value.
add_carry_u64adx: Add two u64 with a carry value.
add_horizontal_i16_m128issse3: Add horizontal pairs of i16 values, pack the outputs as a then b.
add_horizontal_i16_m256iavx2: Horizontal a + b with lanes as i16.
add_horizontal_i32_m128issse3: Add horizontal pairs of i32 values, pack the outputs as a then b.
add_horizontal_i32_m256iavx2: Horizontal a + b with lanes as i32.
add_horizontal_m128sse3: Add each lane horizontally, pack the outputs as a then b.
add_horizontal_m256avx: Add adjacent f32 lanes.
add_horizontal_m128dsse3: Add each lane horizontally, pack the outputs as a then b.
add_horizontal_m256davx: Add adjacent f64 lanes.
add_horizontal_saturating_i16_m128issse3: Add horizontal pairs of i16 values, saturating, pack the outputs as a then b.
add_horizontal_saturating_i16_m256iavx2: Horizontal saturating a + b with lanes as i16.
add_i8_m128isse2: Lanewise a + b with lanes as i8.
add_i8_m256iavx2: Lanewise a + b with lanes as i8.
add_i16_m128isse2: Lanewise a + b with lanes as i16.
add_i16_m256iavx2: Lanewise a + b with lanes as i16.
add_i32_m128isse2: Lanewise a + b with lanes as i32.
add_i32_m256iavx2: Lanewise a + b with lanes as i32.
add_i64_m128isse2: Lanewise a + b with lanes as i64.
add_i64_m256iavx2: Lanewise a + b with lanes as i64.
add_m128sse: Lanewise a + b.
add_m256avx: Lanewise a + b with f32 lanes.
add_m128_ssse: Low lane a + b, other lanes unchanged.
add_m128dsse2: Lanewise a + b.
add_m128d_ssse2: Lowest lane a + b, high lane unchanged.
add_m256davx: Lanewise a + b with f64 lanes.
add_saturating_i8_m128isse2: Lanewise saturating a + b with lanes as i8.
add_saturating_i8_m256iavx2: Lanewise saturating a + b with lanes as i8.
add_saturating_i16_m128isse2: Lanewise saturating a + b with lanes as i16.
add_saturating_i16_m256iavx2: Lanewise saturating a + b with lanes as i16.
add_saturating_u8_m128isse2: Lanewise saturating a + b with lanes as u8.
add_saturating_u8_m256iavx2: Lanewise saturating a + b with lanes as u8.
add_saturating_u16_m128isse2: Lanewise saturating a + b with lanes as u16.
add_saturating_u16_m256iavx2: Lanewise saturating a + b with lanes as u16.
addsub_m128sse3: Alternately, from the top, add a lane and then subtract a lane.
addsub_m256avx: Alternately, from the top, add f32 then sub f32.
addsub_m128dsse3: Add the high lane and subtract the low lane.
addsub_m256davx: Alternately, from the top, add f64 then sub f64.
aes_decrypt_last_m128iaes: Perform the last round of an AES decryption flow on a using the round_key.
aes_decrypt_m128iaes: Perform one round of an AES decryption flow on a using the round_key.
aes_encrypt_last_m128iaes: Perform the last round of an AES encryption flow on a using the round_key.
aes_encrypt_m128iaes: Perform one round of an AES encryption flow on a using the round_key.
aes_inv_mix_columns_m128iaes: Perform the InvMixColumns transform on a.
aes_key_gen_assist_m128iaes: Assist in expanding an AES cipher key.
average_u8_m128isse2: Lanewise average of the u8 values.
average_u8_m256iavx2: Average u8 lanes.
average_u16_m128isse2: Lanewise average of the u16 values.
average_u16_m256iavx2: Average u16 lanes.
bit_extract2_u32bmi1: Extract a span of bits from the u32, control value style.
bit_extract2_u64bmi1: Extract a span of bits from the u64, control value style.
bit_extract_u32bmi1: Extract a span of bits from the u32, start and len style.
bit_extract_u64bmi1: Extract a span of bits from the u64, start and len style.
bit_lowest_set_mask_u32bmi1: Gets the mask of all bits up to and including the lowest set bit in a u32.
bit_lowest_set_mask_u64bmi1: Gets the mask of all bits up to and including the lowest set bit in a u64.
bit_lowest_set_reset_u32bmi1: Resets (clears) the lowest set bit.
bit_lowest_set_reset_u64bmi1: Resets (clears) the lowest set bit.
bit_lowest_set_value_u32bmi1: Gets the value of the lowest set bit in a u32.
bit_lowest_set_value_u64bmi1: Gets the value of the lowest set bit in a u64.
bit_zero_high_index_u32bmi2: Zero out all high bits in a u32 starting at the index given.
bit_zero_high_index_u64bmi2: Zero out all high bits in a u64 starting at the index given.
bitand_m128sse: Bitwise a & b.
bitand_m256avx: Bitwise a & b.
bitand_m128dsse2: Bitwise a & b.
bitand_m128isse2: Bitwise a & b.
bitand_m256davx: Bitwise a & b.
bitand_m256iavx2: Bitwise a & b.
bitandnot_m128sse: Bitwise (!a) & b.
bitandnot_m256avx: Bitwise (!a) & b.
bitandnot_m128dsse2: Bitwise (!a) & b.
bitandnot_m128isse2: Bitwise (!a) & b.
bitandnot_m256davx: Bitwise (!a) & b.
bitandnot_m256iavx2: Bitwise (!a) & b.
bitandnot_u32bmi1: Bitwise (!a) & b for u32
bitandnot_u64bmi1: Bitwise (!a) & b for u64
bitor_m128sse: Bitwise a | b.
bitor_m256avx: Bitwise a | b.
bitor_m128dsse2: Bitwise a | b.
bitor_m128isse2: Bitwise a | b.
bitor_m256davx: Bitwise a | b.
bitor_m256iavx2: Bitwise a | b
bitxor_m128sse: Bitwise a ^ b.
bitxor_m256avx: Bitwise a ^ b.
bitxor_m128dsse2: Bitwise a ^ b.
bitxor_m128isse2: Bitwise a ^ b.
bitxor_m256davx: Bitwise a ^ b.
bitxor_m256iavx2: Bitwise a ^ b.
blend_imm_i16_m128isse4.1: Blends the i16 lanes according to the immediate mask.
blend_imm_i16_m256iavx2: Blends the i16 lanes according to the immediate value.
blend_imm_i32_m128iavx2: Blends the i32 lanes in a and b into a single value.
blend_imm_i32_m256iavx2: Blends the i32 lanes according to the immediate value.
blend_imm_m128sse4.1: Blends the lanes according to the immediate mask.
blend_imm_m128dsse4.1: Blends the i16 lanes according to the immediate mask.
blend_m256avx: Blends the f32 lanes according to the immediate mask.
blend_m256davx: Blends the f64 lanes according to the immediate mask.
blend_varying_i8_m128isse4.1: Blend the i8 lanes according to a runtime varying mask.
blend_varying_i8_m256iavx2: Blend i8 lanes according to a runtime varying mask.
blend_varying_m128sse4.1: Blend the lanes according to a runtime varying mask.
blend_varying_m256avx: Blend the lanes according to a runtime varying mask.
blend_varying_m128dsse4.1: Blend the lanes according to a runtime varying mask.
blend_varying_m256davx: Blend the lanes according to a runtime varying mask.
byte_shl_imm_u128_m128isse2: Shifts all bits in the entire register left by a number of bytes.
byte_shl_imm_u128_m256iavx2: Shifts each u128 lane left by a number of bytes.
byte_shr_imm_u128_m128isse2: Shifts all bits in the entire register right by a number of bytes.
byte_shr_imm_u128_m256iavx2: Shifts each u128 lane right by a number of bytes.
byte_swap_i32: Swap the bytes of the given 32-bit value.
byte_swap_i64x86-64: Swap the bytes of the given 64-bit value.
cast_to_m128_from_m256avx: Bit-preserving cast to m128 from m256.
cast_to_m128_from_m128dsse2: Bit-preserving cast to m128 from m128d
cast_to_m128_from_m128isse2: Bit-preserving cast to m128 from m128i
cast_to_m128d_from_m128sse2: Bit-preserving cast to m128d from m128
cast_to_m128d_from_m128isse2: Bit-preserving cast to m128d from m128i
cast_to_m128d_from_m256davx: Bit-preserving cast to m128d from m256d.
cast_to_m128i_from_m128sse2: Bit-preserving cast to m128i from m128
cast_to_m128i_from_m128dsse2: Bit-preserving cast to m128i from m128d
cast_to_m128i_from_m256iavx: Bit-preserving cast to m128i from m256i.
cast_to_m256_from_m256davx: Bit-preserving cast to m256 from m256d.
cast_to_m256_from_m256iavx: Bit-preserving cast to m256 from m256i.
cast_to_m256d_from_m256avx: Bit-preserving cast to m256i from m256.
cast_to_m256d_from_m256iavx: Bit-preserving cast to m256d from m256i.
cast_to_m256i_from_m256avx: Bit-preserving cast to m256i from m256.
cast_to_m256i_from_m256davx: Bit-preserving cast to m256i from m256d.
ceil_m128sse4.1: Round each lane to a whole number, towards positive infinity.
ceil_m256avx: Round f32 lanes towards positive infinity.
ceil_m128_ssse4.1: Round the low lane of b toward positive infinity, other lanes a.
ceil_m128dsse4.1: Round each lane to a whole number, towards positive infinity.
ceil_m128d_ssse4.1: Round the low lane of b toward positive infinity, high lane is a.
ceil_m256davx: Round f64 lanes towards positive infinity.
cmp_eq_i32_m128_ssse: Low lane equality.
cmp_eq_i32_m128d_ssse2: Low lane f64 equal to.
cmp_eq_mask_i8_m128isse2: Lanewise a == b with lanes as i8.
cmp_eq_mask_i8_m256iavx2: Compare i8 lanes for equality, mask output.
cmp_eq_mask_i16_m128isse2: Lanewise a == b with lanes as i16.
cmp_eq_mask_i16_m256iavx2: Compare i16 lanes for equality, mask output.
cmp_eq_mask_i32_m128isse2: Lanewise a == b with lanes as i32.
cmp_eq_mask_i32_m256iavx2: Compare i32 lanes for equality, mask output.
cmp_eq_mask_i64_m128isse4.1: Lanewise a == b with lanes as i64.
cmp_eq_mask_i64_m256iavx2: Compare i64 lanes for equality, mask output.
cmp_eq_mask_m128sse: Lanewise a == b.
cmp_eq_mask_m128_ssse: Low lane a == b, other lanes unchanged.
cmp_eq_mask_m128dsse2: Lanewise a == b, mask output.
cmp_eq_mask_m128d_ssse2: Low lane a == b, other lanes unchanged.
cmp_ge_i32_m128_ssse: Low lane greater than or equal to.
cmp_ge_i32_m128d_ssse2: Low lane f64 greater than or equal to.
cmp_ge_mask_m128sse: Lanewise a >= b.
cmp_ge_mask_m128_ssse: Low lane a >= b, other lanes unchanged.
cmp_ge_mask_m128dsse2: Lanewise a >= b.
cmp_ge_mask_m128d_ssse2: Low lane a >= b, other lanes unchanged.
cmp_gt_i32_m128_ssse: Low lane greater than.
cmp_gt_i32_m128d_ssse2: Low lane f64 greater than.
cmp_gt_mask_i8_m128isse2: Lanewise a > b with lanes as i8.
cmp_gt_mask_i8_m256iavx2: Compare i8 lanes for a > b, mask output.
cmp_gt_mask_i16_m128isse2: Lanewise a > b with lanes as i16.
cmp_gt_mask_i16_m256iavx2: Compare i16 lanes for a > b, mask output.
cmp_gt_mask_i32_m128isse2: Lanewise a > b with lanes as i32.
cmp_gt_mask_i32_m256iavx2: Compare i32 lanes for a > b, mask output.
cmp_gt_mask_i64_m128isse4.2: Lanewise a > b with lanes as i64.
cmp_gt_mask_i64_m256iavx2: Compare i64 lanes for a > b, mask output.
cmp_gt_mask_m128sse: Lanewise a > b.
cmp_gt_mask_m128_ssse: Low lane a > b, other lanes unchanged.
cmp_gt_mask_m128dsse2: Lanewise a > b.
cmp_gt_mask_m128d_ssse2: Low lane a > b, other lanes unchanged.
cmp_le_i32_m128_ssse: Low lane less than or equal to.
cmp_le_i32_m128d_ssse2: Low lane f64 less than or equal to.
cmp_le_mask_m128sse: Lanewise a <= b.
cmp_le_mask_m128_ssse: Low lane a <= b, other lanes unchanged.
cmp_le_mask_m128dsse2: Lanewise a <= b.
cmp_le_mask_m128d_ssse2: Low lane a <= b, other lanes unchanged.
cmp_lt_i32_m128_ssse: Low lane less than.
cmp_lt_i32_m128d_ssse2: Low lane f64 less than.
cmp_lt_mask_i8_m128isse2: Lanewise a < b with lanes as i8.
cmp_lt_mask_i16_m128isse2: Lanewise a < b with lanes as i16.
cmp_lt_mask_i32_m128isse2: Lanewise a < b with lanes as i32.
cmp_lt_mask_m128sse: Lanewise a < b.
cmp_lt_mask_m128_ssse: Low lane a < b, other lanes unchanged.
cmp_lt_mask_m128dsse2: Lanewise a < b.
cmp_lt_mask_m128d_ssse2: Low lane a < b, other lane unchanged.
cmp_neq_i32_m128_ssse: Low lane not equal to.
cmp_neq_i32_m128d_ssse2: Low lane f64 less than.
cmp_neq_mask_m128sse: Lanewise a != b.
cmp_neq_mask_m128_ssse: Low lane a != b, other lanes unchanged.
cmp_neq_mask_m128dsse2: Lanewise a != b.
cmp_neq_mask_m128d_ssse2: Low lane a != b, other lane unchanged.
cmp_nge_mask_m128sse: Lanewise !(a >= b).
cmp_nge_mask_m128_ssse: Low lane !(a >= b), other lanes unchanged.
cmp_nge_mask_m128dsse2: Lanewise !(a >= b).
cmp_nge_mask_m128d_ssse2: Low lane !(a >= b), other lane unchanged.
cmp_ngt_mask_m128sse: Lanewise !(a > b).
cmp_ngt_mask_m128_ssse: Low lane !(a > b), other lanes unchanged.
cmp_ngt_mask_m128dsse2: Lanewise !(a > b).
cmp_ngt_mask_m128d_ssse2: Low lane !(a > b), other lane unchanged.
cmp_nle_mask_m128sse: Lanewise !(a <= b).
cmp_nle_mask_m128_ssse: Low lane !(a <= b), other lanes unchanged.
cmp_nle_mask_m128dsse2: Lanewise !(a <= b).
cmp_nle_mask_m128d_ssse2: Low lane !(a <= b), other lane unchanged.
cmp_nlt_mask_m128sse: Lanewise !(a < b).
cmp_nlt_mask_m128_ssse: Low lane !(a < b), other lanes unchanged.
cmp_nlt_mask_m128dsse2: Lanewise !(a < b).
cmp_nlt_mask_m128d_ssse2: Low lane !(a < b), other lane unchanged.
cmp_op_mask_m128avx: Compare f32 lanes according to the operation specified, mask output.
cmp_op_mask_m256avx: Compare f32 lanes according to the operation specified, mask output.
cmp_op_mask_m128_savx: Compare f32 lanes according to the operation specified, mask output.
cmp_op_mask_m128davx: Compare f64 lanes according to the operation specified, mask output.
cmp_op_mask_m128d_savx: Compare f64 lanes according to the operation specified, mask output.
cmp_op_mask_m256davx: Compare f64 lanes according to the operation specified, mask output.
cmp_ordered_mask_m128sse: Lanewise (!a.is_nan()) & (!b.is_nan()).
cmp_ordered_mask_m128_ssse: Low lane (!a.is_nan()) & (!b.is_nan()), other lanes unchanged.
cmp_ordered_mask_m128dsse2: Lanewise (!a.is_nan()) & (!b.is_nan()).
cmp_ordered_mask_m128d_ssse2: Low lane (!a.is_nan()) & (!b.is_nan()), other lane unchanged.
cmp_unord_mask_m128sse: Lanewise a.is_nan() | b.is_nan().
cmp_unord_mask_m128_ssse: Low lane a.is_nan() | b.is_nan(), other lanes unchanged.
cmp_unord_mask_m128dsse2: Lanewise a.is_nan() | b.is_nan().
cmp_unord_mask_m128d_ssse2: Low lane a.is_nan() | b.is_nan(), other lane unchanged.
combined_byte_shr_imm_m128issse3: Counts $a as the high bytes and $b as the low bytes then performs a byte shift to the right by the immediate value.
combined_byte_shr_imm_m256iavx2: Works like combined_byte_shr_imm_m128i, but twice as wide.
convert_i32_replace_m128_ssse: Convert i32 to f32 and replace the low lane of the input.
convert_i32_replace_m128d_ssse2: Convert i32 to f64 and replace the low lane of the input.
convert_i64_replace_m128_ssse: Convert i64 to f32 and replace the low lane of the input.
convert_i64_replace_m128d_ssse2: Convert i64 to f64 and replace the low lane of the input.
convert_m128_s_replace_m128d_ssse2: Converts the lower f32 to f64 and replace the low lane of the input
convert_m128d_s_replace_m128_ssse2: Converts the low f64 to f32 and replaces the low lane of the input.
convert_to_f32_from_m256_savx: Convert the lowest f32 lane to a single f32.
convert_to_f64_from_m256d_savx: Convert the lowest f64 lane to a single f64.
convert_to_i16_m128i_from_lower2_i16_m128isse4.1: Convert the lower two i64 lanes to two i32 lanes.
convert_to_i16_m128i_from_lower8_i8_m128isse4.1: Convert the lower eight i8 lanes to eight i16 lanes.
convert_to_i16_m256i_from_i8_m128iavx2: Convert i8 values to i16 values.
convert_to_i16_m256i_from_lower4_u8_m128iavx2: Convert lower 4 u8 values to i16 values.
convert_to_i16_m256i_from_lower8_u8_m128iavx2: Convert lower 8 u8 values to i16 values.
convert_to_i16_m256i_from_u8_m128iavx2: Convert u8 values to i16 values.
convert_to_i32_from_m256i_savx: Convert the lowest i32 lane to a single i32.
convert_to_i32_m128i_from_lower4_i8_m128isse4.1: Convert the lower four i8 lanes to four i32 lanes.
convert_to_i32_m128i_from_lower4_i16_m128isse4.1: Convert the lower four i16 lanes to four i32 lanes.
convert_to_i32_m128i_from_m128sse2: Rounds the f32 lanes to i32 lanes.
convert_to_i32_m128i_from_m128dsse2: Rounds the two f64 lanes to the low two i32 lanes.
convert_to_i32_m128i_from_m256davx: Convert f64 lanes to be i32 lanes.
convert_to_i32_m256i_from_i16_m128iavx2: Convert i16 values to i32 values.
convert_to_i32_m256i_from_lower8_i8_m128iavx2: Convert the lower 8 i8 values to i32 values.
convert_to_i32_m256i_from_m256avx: Convert f32 lanes to be i32 lanes.
convert_to_i32_m256i_from_u16_m128iavx2: Convert u16 values to i32 values.
convert_to_i64_m128i_from_lower2_i8_m128isse4.1: Convert the lower two i8 lanes to two i64 lanes.
convert_to_i64_m128i_from_lower2_i32_m128isse4.1: Convert the lower two i32 lanes to two i64 lanes.
convert_to_i64_m256i_from_i32_m128iavx2: Convert i32 values to i64 values.
convert_to_i64_m256i_from_lower4_i8_m128iavx2: Convert the lower 4 i8 values to i64 values.
convert_to_i64_m256i_from_lower4_i16_m128iavx2: Convert i16 values to i64 values.
convert_to_i64_m256i_from_lower4_u16_m128iavx2: Convert u16 values to i64 values.
convert_to_i64_m256i_from_u32_m128iavx2: Convert u32 values to i64 values.
convert_to_m128_from_i32_m128isse2: Rounds the four i32 lanes to four f32 lanes.
convert_to_m128_from_m128dsse2: Rounds the two f64 lanes to the low two f32 lanes.
convert_to_m128_from_m256davx: Convert f64 lanes to be f32 lanes.
convert_to_m128d_from_lower2_i32_m128isse2: Rounds the lower two i32 lanes to two f64 lanes.
convert_to_m128d_from_lower2_m128sse2: Rounds the two f64 lanes to the low two f32 lanes.
convert_to_m256_from_i32_m256iavx: Convert i32 lanes to be f32 lanes.
convert_to_m256d_from_i32_m128iavx: Convert i32 lanes to be f64 lanes.
convert_to_m256d_from_m128avx: Convert f32 lanes to be f64 lanes.
convert_to_u16_m128i_from_lower8_u8_m128isse4.1: Convert the lower eight u8 lanes to eight u16 lanes.
convert_to_u32_m128i_from_lower4_u8_m128isse4.1: Convert the lower four u8 lanes to four u32 lanes.
convert_to_u32_m128i_from_lower4_u16_m128isse4.1: Convert the lower four u16 lanes to four u32 lanes.
convert_to_u64_m128i_from_lower2_u8_m128isse4.1: Convert the lower two u8 lanes to two u64 lanes.
convert_to_u64_m128i_from_lower2_u16_m128isse4.1: Convert the lower two u16 lanes to two u64 lanes.
convert_to_u64_m128i_from_lower2_u32_m128isse4.1: Convert the lower two u32 lanes to two u64 lanes.
convert_truncate_to_i32_m128i_from_m256davx: Convert f64 lanes to i32 lanes with truncation.
convert_truncate_to_i32_m256i_from_m256avx: Convert f32 lanes to i32 lanes with truncation.
copy_i64_m128i_ssse2: Copy the low i64 lane to a new register, upper bits 0.
copy_replace_low_f64_m128dsse2: Copies the a value and replaces the low lane with the low b value.
crc32_u8sse4.2: Accumulates the u8 into a running CRC32 value.
crc32_u16sse4.2: Accumulates the u16 into a running CRC32 value.
crc32_u32sse4.2: Accumulates the u32 into a running CRC32 value.
crc32_u64sse4.2: Accumulates the u64 into a running CRC32 value.
div_m128sse: Lanewise a / b.
div_m256avx: Lanewise a / b with f32.
div_m128_ssse: Low lane a / b, other lanes unchanged.
div_m128dsse2: Lanewise a / b.
div_m128d_ssse2: Lowest lane a / b, high lane unchanged.
div_m256davx: Lanewise a / b with f64.
dot_product_m128sse4.1: Performs a dot product of two m128 registers.
dot_product_m256avx: This works like dot_product_m128, but twice as wide.
dot_product_m128dsse4.1: Performs a dot product of two m128d registers.
duplicate_even_lanes_m128sse3: Duplicate the odd lanes to the even lanes.
duplicate_even_lanes_m256avx: Duplicate the even-indexed lanes to the odd lanes.
duplicate_low_lane_m128d_ssse3: Copy the low lane of the input to both lanes of the output.
duplicate_odd_lanes_m128sse3: Duplicate the odd lanes to the even lanes.
duplicate_odd_lanes_m256avx: Duplicate the odd-indexed lanes to the even lanes.
duplicate_odd_lanes_m256davx: Duplicate the odd-indexed lanes to the even lanes.
extract_f32_as_i32_bits_imm_m128sse4.1: Gets the f32 lane requested. Returns as an i32 bit pattern.
extract_i8_as_i32_imm_m128isse4.1: Gets the i8 lane requested. Only the lowest 4 bits are considered.
extract_i8_as_i32_m256iavx2: Gets an i8 value out of an m256i, returns as i32.
extract_i16_as_i32_m128isse2: Gets an i16 value out of an m128i, returns as i32.
extract_i16_as_i32_m256iavx2: Gets an i16 value out of an m256i, returns as i32.
extract_i32_from_m256iavx: Extracts an i32 lane from m256i
extract_i32_imm_m128isse4.1: Gets the i32 lane requested. Only the lowest 2 bits are considered.
extract_i64_from_m256iavx: Extracts an i64 lane from m256i
extract_i64_imm_m128isse4.1: Gets the i64 lane requested. Only the lowest bit is considered.
extract_m128_from_m256avx: Extracts an m128 from m256
extract_m128d_from_m256davx: Extracts an m128d from m256d
extract_m128i_from_m256iavx: Extracts an m128i from m256i
extract_m128i_m256iavx2: Gets an m128i value out of an m256i.
floor_m128sse4.1: Round each lane to a whole number, towards negative infinity
floor_m256avx: Round f32 lanes towards negative infinity.
floor_m128_ssse4.1: Round the low lane of b toward negative infinity, other lanes a.
floor_m128dsse4.1: Round each lane to a whole number, towards negative infinity
floor_m128d_ssse4.1: Round the low lane of b toward negative infinity, high lane is a.
floor_m256davx: Round f64 lanes towards negative infinity.
fused_mul_add_m128fma: Lanewise fused (a * b) + c
fused_mul_add_m256fma: Lanewise fused (a * b) + c
fused_mul_add_m128_sfma: Low lane fused (a * b) + c, other lanes unchanged
fused_mul_add_m128dfma: Lanewise fused (a * b) + c
fused_mul_add_m128d_sfma: Low lane fused (a * b) + c, other lanes unchanged
fused_mul_add_m256dfma: Lanewise fused (a * b) + c
fused_mul_addsub_m128fma: Lanewise fused (a * b) addsub c (adds odd lanes and subtracts even lanes)
fused_mul_addsub_m256fma: Lanewise fused (a * b) addsub c (adds odd lanes and subtracts even lanes)
fused_mul_addsub_m128dfma: Lanewise fused (a * b) addsub c (adds odd lanes and subtracts even lanes)
fused_mul_addsub_m256dfma: Lanewise fused (a * b) addsub c (adds odd lanes and subtracts even lanes)
fused_mul_neg_add_m128fma: Lanewise fused -(a * b) + c
fused_mul_neg_add_m256fma: Lanewise fused -(a * b) + c
fused_mul_neg_add_m128_sfma: Low lane -(a * b) + c, other lanes unchanged.
fused_mul_neg_add_m128dfma: Lanewise fused -(a * b) + c
fused_mul_neg_add_m128d_sfma: Low lane -(a * b) + c, other lanes unchanged.
fused_mul_neg_add_m256dfma: Lanewise fused -(a * b) + c
fused_mul_neg_sub_m128fma: Lanewise fused -(a * b) - c
fused_mul_neg_sub_m256fma: Lanewise fused -(a * b) - c
fused_mul_neg_sub_m128_sfma: Low lane fused -(a * b) - c, other lanes unchanged.
fused_mul_neg_sub_m128dfma: Lanewise fused -(a * b) - c
fused_mul_neg_sub_m128d_sfma: Low lane fused -(a * b) - c, other lanes unchanged.
fused_mul_neg_sub_m256dfma: Lanewise fused -(a * b) - c
fused_mul_sub_m128fma: Lanewise fused (a * b) - c
fused_mul_sub_m256fma: Lanewise fused (a * b) - c
fused_mul_sub_m128_sfma: Low lane fused (a * b) - c, other lanes unchanged.
fused_mul_sub_m128dfma: Lanewise fused (a * b) - c
fused_mul_sub_m128d_sfma: Low lane fused (a * b) - c, other lanes unchanged.
fused_mul_sub_m256dfma: Lanewise fused (a * b) - c
fused_mul_subadd_m128fma: Lanewise fused (a * b) subadd c (subtracts odd lanes and adds even lanes)
fused_mul_subadd_m256fma: Lanewise fused (a * b) subadd c (subtracts odd lanes and adds even lanes)
fused_mul_subadd_m128dfma: Lanewise fused (a * b) subadd c (subtracts odd lanes and adds even lanes)
fused_mul_subadd_m256dfma: Lanewise fused (a * b) subadd c (subtracts odd lanes and adds even lanes)
get_f32_from_m128_ssse: Gets the low lane as an individual f32 value.
get_f64_from_m128d_ssse2: Gets the lower lane as an f64 value.
get_i32_from_m128_ssse: Converts the low lane to i32 and extracts as an individual value.
get_i32_from_m128d_ssse2: Converts the lower lane to an i32 value.
get_i32_from_m128i_ssse2: Converts the lower lane to an i32 value.
get_i64_from_m128_ssse: Converts the low lane to i64 and extracts as an individual value.
get_i64_from_m128d_ssse2: Converts the lower lane to an i64 value.
get_i64_from_m128i_ssse2: Converts the lower lane to an i64 value.
insert_f32_imm_m128sse4.1: Inserts a lane from $b into $a, optionally at a new position.
insert_i8_imm_m128isse4.1: Inserts a new value for the i64 lane specified.
insert_i8_to_m256iavx: Inserts an i8 to m256i
insert_i16_from_i32_m128isse2: Inserts the low 16 bits of an i32 value into an m128i.
insert_i16_to_m256iavx: Inserts an i16 to m256i
insert_i32_imm_m128isse4.1: Inserts a new value for the i32 lane specified.
insert_i32_to_m256iavx: Inserts an i32 to m256i
insert_i64_imm_m128isse4.1: Inserts a new value for the i64 lane specified.
insert_i64_to_m256iavx: Inserts an i64 to m256i
insert_m128_to_m256avx: Inserts an m128 to m256
insert_m128d_to_m256davx: Inserts an m128d to m256d
insert_m128i_to_m256iavx2: Inserts an m128i to an m256i at the high or low position.
insert_m128i_to_m256i_slow_avxavx: Slowly inserts an m128i to m256i.
leading_zero_count_u32lzcnt: Count the leading zeroes in a u32.
leading_zero_count_u64lzcnt: Count the leading zeroes in a u64.
load_f32_m128_ssse: Loads the f32 reference into the low lane of the register.
load_f32_splat_m128sse: Loads the f32 reference into all lanes of a register.
load_f32_splat_m256avx: Load an f32 and splat it to all lanes of an m256d
load_f64_m128d_ssse2: Loads the reference into the low lane of the register.
load_f64_splat_m128dsse2: Loads the f64 reference into all lanes of a register.
load_f64_splat_m256davx: Load an f64 and splat it to all lanes of an m256d
load_i64_m128i_ssse2: Loads the low i64 into a register.
load_m128sse: Loads the reference into a register.
load_m256avx: Load data from memory into a register.
load_m128_splat_m256avx: Load an m128 and splat it to the lower and upper half of an m256
load_m128dsse2: Loads the reference into a register.
load_m128d_splat_m256davx: Load an m128d and splat it to the lower and upper half of an m256d
load_m128isse2: Loads the reference into a register.
load_m256davx: Load data from memory into a register.
load_m256iavx: Load data from memory into a register.
load_masked_i32_m128iavx2: Loads the reference given and zeroes any i32 lanes not in the mask.
load_masked_i32_m256iavx2: Loads the reference given and zeroes any i32 lanes not in the mask.
load_masked_i64_m128iavx2: Loads the reference given and zeroes any i64 lanes not in the mask.
load_masked_i64_m256iavx2: Loads the reference given and zeroes any i64 lanes not in the mask.
load_masked_m128avx: Load data from memory into a register according to a mask.
load_masked_m256avx: Load data from memory into a register according to a mask.
load_masked_m128davx: Load data from memory into a register according to a mask.
load_masked_m256davx: Load data from memory into a register according to a mask.
load_replace_high_m128dsse2: Loads the reference into a register, replacing the high lane.
load_replace_low_m128dsse2: Loads the reference into a register, replacing the low lane.
load_reverse_m128sse: Loads the reference into a register with reversed order.
load_reverse_m128dsse2: Loads the reference into a register with reversed order.
load_unaligned_hi_lo_m256avx: Load data from memory into a register.
load_unaligned_hi_lo_m256davx: Load data from memory into a register.
load_unaligned_hi_lo_m256iavx: Load data from memory into a register.
load_unaligned_m128sse: Loads the reference into a register.
load_unaligned_m256avx: Load data from memory into a register.
load_unaligned_m128dsse2: Loads the reference into a register.
load_unaligned_m128isse2: Loads the reference into a register.
load_unaligned_m256davx: Load data from memory into a register.
load_unaligned_m256iavx: Load data from memory into a register.
max_i8_m128isse4.1: Lanewise max(a, b) with lanes as i8.
max_i8_m256iavx2: Lanewise max(a, b) with lanes as i8.
max_i16_m128isse2: Lanewise max(a, b) with lanes as i16.
max_i16_m256iavx2: Lanewise max(a, b) with lanes as i16.
max_i32_m128isse4.1: Lanewise max(a, b) with lanes as i32.
max_i32_m256iavx2: Lanewise max(a, b) with lanes as i32.
max_m128sse: Lanewise max(a, b).
max_m256avx: Lanewise max(a, b).
max_m128_ssse: Low lane max(a, b), other lanes unchanged.
max_m128dsse2: Lanewise max(a, b).
max_m128d_ssse2: Low lane max(a, b), other lanes unchanged.
max_m256davx: Lanewise max(a, b).
max_u8_m128isse2: Lanewise max(a, b) with lanes as u8.
max_u8_m256iavx2: Lanewise max(a, b) with lanes as u8.
max_u16_m128isse4.1: Lanewise max(a, b) with lanes as u16.
max_u16_m256iavx2: Lanewise max(a, b) with lanes as u16.
max_u32_m128isse4.1: Lanewise max(a, b) with lanes as u32.
max_u32_m256iavx2: Lanewise max(a, b) with lanes as u32.
min_i8_m128isse4.1: Lanewise min(a, b) with lanes as i8.
min_i8_m256iavx2: Lanewise min(a, b) with lanes as i8.
min_i16_m128isse2: Lanewise min(a, b) with lanes as i16.
min_i16_m256iavx2: Lanewise min(a, b) with lanes as i16.
min_i32_m128isse4.1: Lanewise min(a, b) with lanes as i32.
min_i32_m256iavx2: Lanewise min(a, b) with lanes as i32.
min_m128sse: Lanewise min(a, b).
min_m256avx: Lanewise min(a, b).
min_m128_ssse: Low lane min(a, b), other lanes unchanged.
min_m128dsse2: Lanewise min(a, b).
min_m128d_ssse2: Low lane min(a, b), other lanes unchanged.
min_m256davx: Lanewise min(a, b).
min_position_u16_m128isse4.1: Min u16 value, position, and other lanes zeroed.
min_u8_m128isse2: Lanewise min(a, b) with lanes as u8.
min_u8_m256iavx2: Lanewise min(a, b) with lanes as u8.
min_u16_m128isse4.1: Lanewise min(a, b) with lanes as u16.
min_u16_m256iavx2: Lanewise min(a, b) with lanes as u16.
min_u32_m128isse4.1: Lanewise min(a, b) with lanes as u32.
min_u32_m256iavx2: Lanewise min(a, b) with lanes as u32.
move_high_low_m128sse: Move the high lanes of b to the low lanes of a, other lanes unchanged.
move_low_high_m128sse: Move the low lanes of b to the high lanes of a, other lanes unchanged.
move_m128_ssse: Move the low lane of b to a, other lanes unchanged.
move_mask_i8_m128isse2: Gathers the i8 sign bit of each lane.
move_mask_i8_m256iavx2: Create an i32 mask of each sign bit in the i8 lanes.
move_mask_m128sse: Gathers the sign bit of each lane.
move_mask_m256avx: Collects the sign bit of each lane into a 4-bit value.
move_mask_m128dsse2: Gathers the sign bit of each lane.
move_mask_m256davx: Collects the sign bit of each lane into a 4-bit value.
mul_32_m128isse4.1: Lanewise a * b with 32-bit lanes.
mul_extended_u32bmi2: Multiply two u32, outputting the low bits and storing the high bits in the reference.
mul_extended_u64bmi2: Multiply two u64, outputting the low bits and storing the high bits in the reference.
mul_i16_horizontal_add_m128isse2: Multiply i16 lanes producing i32 values, horizontal add pairs of i32 values to produce the final output.
mul_i16_horizontal_add_m256iavx2: Multiply i16 lanes producing i32 values, horizontal add pairs of i32 values to produce the final output.
mul_i16_keep_high_m128isse2: Lanewise a * b with lanes as i16, keep the high bits of the i32 intermediates.
mul_i16_keep_high_m256iavx2: Multiply the i16 lanes and keep the high half of each 32-bit output.
mul_i16_keep_low_m128isse2: Lanewise a * b with lanes as i16, keep the low bits of the i32 intermediates.
mul_i16_keep_low_m256iavx2: Multiply the i16 lanes and keep the low half of each 32-bit output.
mul_i16_scale_round_m128issse3: Multiply i16 lanes into i32 intermediates, keep the high 18 bits, round by adding 1, right shift by 1.
mul_i16_scale_round_m256iavx2: Multiply i16 lanes into i32 intermediates, keep the high 18 bits, round by adding 1, right shift by 1.
mul_i32_keep_low_m256iavx2: Multiply the i32 lanes and keep the low half of each 64-bit output.
mul_i64_carryless_m128ipclmulqdq: Performs a “carryless” multiplication of two i64 values.
mul_i64_low_bits_m256iavx2: Multiply the lower i32 within each i64 lane, i64 output.
mul_m128sse: Lanewise a * b.
mul_m256avx: Lanewise a * b with f32 lanes.
mul_m128_ssse: Low lane a * b, other lanes unchanged.
mul_m128dsse2: Lanewise a * b.
mul_m128d_ssse2: Lowest lane a * b, high lane unchanged.
mul_m256davx: Lanewise a * b with f64 lanes.
mul_u8i8_add_horizontal_saturating_m128issse3: This is dumb and weird.
mul_u8i8_add_horizontal_saturating_m256iavx2: This is dumb and weird.
mul_u16_keep_high_m128isse2: Lanewise a * b with lanes as u16, keep the high bits of the u32 intermediates.
mul_u16_keep_high_m256iavx2: Multiply the u16 lanes and keep the high half of each 32-bit output.
mul_u64_low_bits_m256iavx2: Multiply the lower u32 within each u64 lane, u64 output.
mul_widen_i32_odd_m128isse4.1: Multiplies the odd i32 lanes and gives the widened (i64) results.
mul_widen_u32_odd_m128isse2: Multiplies the odd u32 lanes and gives the widened (u64) results.
multi_packed_sum_abs_diff_u8_m128isse4.1: Computes eight u16 “sum of absolute difference” values according to the bytes selected.
multi_packed_sum_abs_diff_u8_m256iavx2: Computes eight u16 “sum of absolute difference” values according to the bytes selected.
pack_i16_to_i8_m128isse2: Saturating convert i16 to i8, and pack the values.
pack_i16_to_i8_m256iavx2: Saturating convert i16 to i8, and pack the values.
pack_i16_to_u8_m128isse2: Saturating convert i16 to u8, and pack the values.
pack_i16_to_u8_m256iavx2: Saturating convert i16 to u8, and pack the values.
pack_i32_to_i16_m128isse2: Saturating convert i32 to i16, and pack the values.
pack_i32_to_i16_m256iavx2: Saturating convert i32 to i16, and pack the values.
pack_i32_to_u16_m128isse4.1: Saturating convert i32 to u16, and pack the values.
pack_i32_to_u16_m256iavx2: Saturating convert i32 to u16, and pack the values.
permute2z_m256avx: Shuffle 128 bits of floating point data at a time from $a and $b using an immediate control value.
permute2z_m256davx: Shuffle 128 bits of floating point data at a time from a and b using an immediate control value.
permute2z_m256iavx: Slowly swizzle 128 bits of integer data from a and b using an immediate control value.
permute_m128avx: Shuffle the f32 lanes from a using an immediate control value.
permute_m256avx: Shuffle the f32 lanes in a using an immediate control value.
permute_m128davx: Shuffle the f64 lanes in a using an immediate control value.
permute_m256davx: Shuffle the f64 lanes from a together using an immediate control value.
population_count_i32popcnt: Count the number of bits set within an i32
population_count_i64popcnt: Count the number of bits set within an i64
population_deposit_u32bmi2: Deposit contiguous low bits from a u32 according to a mask.
population_deposit_u64bmi2: Deposit contiguous low bits from a u64 according to a mask.
population_extract_u32bmi2: Extract bits from a u32 according to a mask.
population_extract_u64bmi2: Extract bits from a u64 according to a mask.
prefetch_et0sse: Fetches the cache line containing addr into all levels of the cache hierarchy, anticipating write
prefetch_et1sse: Fetches into L2 and higher, anticipating write
prefetch_ntasse: Fetch data using the non-temporal access (NTA) hint. It may be a place closer than main memory but outside of the cache hierarchy. This is used to reduce access latency without polluting the cache.
prefetch_t0sse: Fetches the cache line containing addr into all levels of the cache hierarchy.
prefetch_t1sse: Fetches into L2 and higher.
prefetch_t2sse: Fetches into L3 and higher or an implementation-specific choice (e.g., L2 if there is no L3).
rdrand_u16rdrand: Try to obtain a random u16 from the hardware RNG.
rdrand_u32rdrand: Try to obtain a random u32 from the hardware RNG.
rdrand_u64rdrand: Try to obtain a random u64 from the hardware RNG.
rdseed_u16rdseed: Try to obtain a random u16 from the hardware RNG.
rdseed_u32rdseed: Try to obtain a random u32 from the hardware RNG.
rdseed_u64rdseed: Try to obtain a random u64 from the hardware RNG.
read_timestamp_counter: Reads the CPU’s timestamp counter value.
read_timestamp_counter_p: Reads the CPU’s timestamp counter value and store the processor signature.
reciprocal_m128sse: Lanewise 1.0 / a approximation.
reciprocal_m256avx: Reciprocal of f32 lanes.
reciprocal_m128_ssse: Low lane 1.0 / a approximation, other lanes unchanged.
reciprocal_sqrt_m128sse: Lanewise 1.0 / sqrt(a) approximation.
reciprocal_sqrt_m256avx: Reciprocal of f32 lanes.
reciprocal_sqrt_m128_ssse: Low lane 1.0 / sqrt(a) approximation, other lanes unchanged.
round_m128sse4.1: Rounds each lane in the style specified.
round_m256avx: Rounds each lane in the style specified.
round_m128_ssse4.1: Rounds $b low as specified, other lanes use $a.
round_m128dsse4.1: Rounds each lane in the style specified.
round_m128d_ssse4.1: Rounds $b low as specified, keeps $a high.
round_m256davx: Rounds each lane in the style specified.
search_explicit_str_for_indexsse4.2: Search for needle in `haystack, with explicit string length.
search_explicit_str_for_masksse4.2: Search for needle in `haystack, with explicit string length.
search_implicit_str_for_indexsse4.2: Search for needle in `haystack, with implicit string length.
search_implicit_str_for_masksse4.2: Search for needle in `haystack, with implicit string length.
set_i8_m128isse2: Sets the args into an m128i, first arg is the high lane.
set_i8_m256iavx: Set i8 args into an m256i lane.
set_i16_m128isse2: Sets the args into an m128i, first arg is the high lane.
set_i16_m256iavx: Set i16 args into an m256i lane.
set_i32_m128isse2: Sets the args into an m128i, first arg is the high lane.
set_i32_m128i_ssse2: Set an i32 as the low 32-bit lane of an m128i, other lanes blank.
set_i32_m256iavx: Set i32 args into an m256i lane.
set_i64_m128isse2: Sets the args into an m128i, first arg is the high lane.
set_i64_m128i_ssse2: Set an i64 as the low 64-bit lane of an m128i, other lanes blank.
set_i64_m256iavx: Set i64 args into an m256i lane.
set_m128sse: Sets the args into an m128, first arg is the high lane.
set_m256avx: Set f32 args into an m256 lane.
set_m128_m256avx: Set m128 args into an m256.
set_m128_ssse: Sets the args into an m128, first arg is the high lane.
set_m128dsse2: Sets the args into an m128d, first arg is the high lane.
set_m128d_m256davx: Set m128d args into an m256d.
set_m128d_ssse2: Sets the args into the low lane of a m128d.
set_m128i_m256iavx: Set m128i args into an m256i.
set_m256davx: Set f64 args into an m256d lane.
set_reversed_i8_m128isse2: Sets the args into an m128i, first arg is the low lane.
set_reversed_i8_m256iavx: Set i8 args into an m256i lane.
set_reversed_i16_m128isse2: Sets the args into an m128i, first arg is the low lane.
set_reversed_i16_m256iavx: Set i16 args into an m256i lane.
set_reversed_i32_m128isse2: Sets the args into an m128i, first arg is the low lane.
set_reversed_i32_m256iavx: Set i32 args into an m256i lane.
set_reversed_i64_m256iavx: Set i64 args into an m256i lane.
set_reversed_m128sse: Sets the args into an m128, first arg is the low lane.
set_reversed_m256avx: Set f32 args into an m256 lane.
set_reversed_m128_m256avx: Set m128 args into an m256.
set_reversed_m128dsse2: Sets the args into an m128d, first arg is the low lane.
set_reversed_m128d_m256davx: Set m128d args into an m256d.
set_reversed_m128i_m256iavx: Set m128i args into an m256i.
set_reversed_m256davx: Set f64 args into an m256d lane.
set_splat_i8_m128isse2: Splats the i8 to all lanes of the m128i.
set_splat_i8_m128i_s_m256iavx2: Sets the lowest i8 lane of an m128i as all lanes of an m256i.
set_splat_i8_m256iavx: Splat an i8 arg into an m256i lane.
set_splat_i16_m128isse2: Splats the i16 to all lanes of the m128i.
set_splat_i16_m128i_s_m256iavx2: Sets the lowest i16 lane of an m128i as all lanes of an m256i.
set_splat_i16_m256iavx: Splat an i16 arg into an m256i lane.
set_splat_i32_m128isse2: Splats the i32 to all lanes of the m128i.
set_splat_i32_m128i_s_m256iavx2: Sets the lowest i32 lane of an m128i as all lanes of an m256i.
set_splat_i32_m256iavx: Splat an i32 arg into an m256i lane.
set_splat_i64_m128isse2: Splats the i64 to both lanes of the m128i.
set_splat_i64_m128i_s_m256iavx2: Sets the lowest i64 lane of an m128i as all lanes of an m256i.
set_splat_i64_m256iavx: Splat an i64 arg into an m256i lane.
set_splat_m128sse: Splats the value to all lanes.
set_splat_m256avx: Splat an f32 arg into an m256 lane.
set_splat_m128_s_m256avx2: Sets the lowest lane of an m128 as all lanes of an m256.
set_splat_m128dsse2: Splats the args into both lanes of the m128d.
set_splat_m128d_s_m256davx2: Sets the lowest lane of an m128d as all lanes of an m256d.
set_splat_m256davx: Splat an f64 arg into an m256d lane.
shl_all_u16_m128isse2: Shift all u16 lanes to the left by the count in the lower u64 lane.
shl_all_u16_m256iavx2: Lanewise u16 shift left by the lower u64 lane of count.
shl_all_u32_m128isse2: Shift all u32 lanes to the left by the count in the lower u64 lane.
shl_all_u32_m256iavx2: Shift all u32 lanes left by the lower u64 lane of count.
shl_all_u64_m128isse2: Shift all u64 lanes to the left by the count in the lower u64 lane.
shl_all_u64_m256iavx2: Shift all u64 lanes left by the lower u64 lane of count.
shl_each_u32_m128iavx2: Shift u32 values to the left by count bits.
shl_each_u32_m256iavx2: Lanewise u32 shift left by the matching i32 lane in count.
shl_each_u64_m128iavx2: Shift u64 values to the left by count bits.
shl_each_u64_m256iavx2: Lanewise u64 shift left by the matching u64 lane in count.
shl_imm_u16_m128isse2: Shifts all u16 lanes left by an immediate.
shl_imm_u16_m256iavx2: Shifts all u16 lanes left by an immediate.
shl_imm_u32_m128isse2: Shifts all u32 lanes left by an immediate.
shl_imm_u32_m256iavx2: Shifts all u32 lanes left by an immediate.
shl_imm_u64_m128isse2: Shifts both u64 lanes left by an immediate.
shl_imm_u64_m256iavx2: Shifts all u64 lanes left by an immediate.
shr_all_i16_m128isse2: Shift each i16 lane to the right by the count in the lower i64 lane.
shr_all_i16_m256iavx2: Lanewise i16 shift right by the lower i64 lane of count.
shr_all_i32_m128isse2: Shift each i32 lane to the right by the count in the lower i64 lane.
shr_all_i32_m256iavx2: Lanewise i32 shift right by the lower i64 lane of count.
shr_all_u16_m128isse2: Shift each u16 lane to the right by the count in the lower u64 lane.
shr_all_u16_m256iavx2: Lanewise u16 shift right by the lower u64 lane of count.
shr_all_u32_m128isse2: Shift each u32 lane to the right by the count in the lower u64 lane.
shr_all_u32_m256iavx2: Lanewise u32 shift right by the lower u64 lane of count.
shr_all_u64_m128isse2: Shift each u64 lane to the right by the count in the lower u64 lane.
shr_all_u64_m256iavx2: Lanewise u64 shift right by the lower u64 lane of count.
shr_each_i32_m128iavx2: Shift i32 values to the right by count bits.
shr_each_i32_m256iavx2: Lanewise i32 shift right by the matching i32 lane in count.
shr_each_u32_m128iavx2: Shift u32 values to the left by count bits.
shr_each_u32_m256iavx2: Lanewise u32 shift right by the matching u32 lane in count.
shr_each_u64_m128iavx2: Shift u64 values to the left by count bits.
shr_each_u64_m256iavx2: Lanewise u64 shift right by the matching i64 lane in count.
shr_imm_i16_m128isse2: Shifts all i16 lanes right by an immediate.
shr_imm_i16_m256iavx2: Shifts all i16 lanes left by an immediate.
shr_imm_i32_m128isse2: Shifts all i32 lanes right by an immediate.
shr_imm_i32_m256iavx2: Shifts all i32 lanes left by an immediate.
shr_imm_u16_m128isse2: Shifts all u16 lanes right by an immediate.
shr_imm_u16_m256iavx2: Shifts all u16 lanes right by an immediate.
shr_imm_u32_m128isse2: Shifts all u32 lanes right by an immediate.
shr_imm_u32_m256iavx2: Shifts all u32 lanes right by an immediate.
shr_imm_u64_m128isse2: Shifts both u64 lanes right by an immediate.
shr_imm_u64_m256iavx2: Shifts all u64 lanes right by an immediate.
shuffle_abi_f32_all_m128sse: Shuffle the f32 lanes from $a and $b together using an immediate control value.
shuffle_abi_f64_all_m128dsse2: Shuffle the f64 lanes from $a and $b together using an immediate control value.
shuffle_abi_i128z_all_m256iavx2: Shuffle 128 bits of integer data from $a and $b using an immediate control value.
shuffle_ai_f32_all_m128isse2: Shuffle the i32 lanes in $a using an immediate control value.
shuffle_ai_f64_all_m256davx2: Shuffle the f64 lanes from $a using an immediate control value.
shuffle_ai_i16_h64all_m128isse2: Shuffle the high i16 lanes in $a using an immediate control value.
shuffle_ai_i16_h64half_m256iavx2: Shuffle the high i16 lanes in $a using an immediate control value.
shuffle_ai_i16_l64all_m128isse2: Shuffle the low i16 lanes in $a using an immediate control value.
shuffle_ai_i16_l64half_m256iavx2: Shuffle the low i16 lanes in $a using an immediate control value.
shuffle_ai_i32_half_m256iavx2: Shuffle the i32 lanes in a using an immediate control value.
shuffle_ai_i64_all_m256iavx2: Shuffle the f64 lanes in $a using an immediate control value.
shuffle_av_f32_all_m128avx: Shuffle f32 values in a using i32 values in v.
shuffle_av_f32_all_m256avx2: Shuffle f32 lanes in a using i32 values in v.
shuffle_av_f32_half_m256avx: Shuffle f32 values in a using i32 values in v.
shuffle_av_f64_all_m128davx: Shuffle f64 lanes in a using bit 1 of the i64 lanes in v
shuffle_av_f64_half_m256davx: Shuffle f64 lanes in a using bit 1 of the i64 lanes in v.
shuffle_av_i8z_all_m128issse3: Shuffle i8 lanes in a using i8 values in v.
shuffle_av_i8z_half_m256iavx2: Shuffle i8 lanes in a using i8 values in v.
shuffle_av_i32_all_m256iavx2: Shuffle i32 lanes in a using i32 values in v.
shuffle_m256avx: Shuffle the f32 lanes from a and b together using an immediate control value.
shuffle_m256davx: Shuffle the f64 lanes from a and b together using an immediate control value.
sign_apply_i8_m128issse3: Applies the sign of i8 values in b to the values in a.
sign_apply_i8_m256iavx2: Lanewise a * signum(b) with lanes as i8
sign_apply_i16_m128issse3: Applies the sign of i16 values in b to the values in a.
sign_apply_i16_m256iavx2: Lanewise a * signum(b) with lanes as i16
sign_apply_i32_m128issse3: Applies the sign of i32 values in b to the values in a.
sign_apply_i32_m256iavx2: Lanewise a * signum(b) with lanes as i32
splat_i8_m128i_s_m128iavx2: Splat the lowest 8-bit lane across the entire 128 bits.
splat_i16_m128i_s_m128iavx2: Splat the lowest 16-bit lane across the entire 128 bits.
splat_i32_m128i_s_m128iavx2: Splat the lowest 32-bit lane across the entire 128 bits.
splat_i64_m128i_s_m128iavx2: Splat the lowest 64-bit lane across the entire 128 bits.
splat_m128_s_m128avx2: Splat the lowest f32 across all four lanes.
splat_m128d_s_m128davx2: Splat the lower f64 across both lanes of m128d.
splat_m128i_m256iavx2: Splat the 128-bits across 256-bits.
sqrt_m128sse: Lanewise sqrt(a).
sqrt_m256avx: Lanewise sqrt on f32 lanes.
sqrt_m128_ssse: Low lane sqrt(a), other lanes unchanged.
sqrt_m128dsse2: Lanewise sqrt(a).
sqrt_m128d_ssse2: Low lane sqrt(b), upper lane is unchanged from a.
sqrt_m256davx: Lanewise sqrt on f64 lanes.
store_high_m128d_ssse2: Stores the high lane value to the reference given.
store_i64_m128i_ssse2: Stores the value to the reference given.
store_m128sse: Stores the value to the reference given.
store_m256avx: Store data from a register into memory.
store_m128_ssse: Stores the low lane value to the reference given.
store_m128dsse2: Stores the value to the reference given.
store_m128d_ssse2: Stores the low lane value to the reference given.
store_m128isse2: Stores the value to the reference given.
store_m256davx: Store data from a register into memory.
store_m256iavx: Store data from a register into memory.
store_masked_i32_m128iavx2: Stores the i32 masked lanes given to the reference.
store_masked_i32_m256iavx2: Stores the i32 masked lanes given to the reference.
store_masked_i64_m128iavx2: Stores the i32 masked lanes given to the reference.
store_masked_i64_m256iavx2: Stores the i32 masked lanes given to the reference.
store_masked_m128avx: Store data from a register into memory according to a mask.
store_masked_m256avx: Store data from a register into memory according to a mask.
store_masked_m128davx: Store data from a register into memory according to a mask.
store_masked_m256davx: Store data from a register into memory according to a mask.
store_reverse_m128sse: Stores the value to the reference given in reverse order.
store_reversed_m128dsse2: Stores the value to the reference given.
store_splat_m128sse: Stores the low lane value to all lanes of the reference given.
store_splat_m128dsse2: Stores the low lane value to all lanes of the reference given.
store_unaligned_hi_lo_m256avx: Store data from a register into memory.
store_unaligned_hi_lo_m256davx: Store data from a register into memory.
store_unaligned_hi_lo_m256iavx: Store data from a register into memory.
store_unaligned_m128sse: Stores the value to the reference given.
store_unaligned_m256avx: Store data from a register into memory.
store_unaligned_m128dsse2: Stores the value to the reference given.
store_unaligned_m128isse2: Stores the value to the reference given.
store_unaligned_m256davx: Store data from a register into memory.
store_unaligned_m256iavx: Store data from a register into memory.
sub_horizontal_i16_m128issse3: Subtract horizontal pairs of i16 values, pack the outputs as a then b.
sub_horizontal_i16_m256iavx2: Horizontal a - b with lanes as i16.
sub_horizontal_i32_m128issse3: Subtract horizontal pairs of i32 values, pack the outputs as a then b.
sub_horizontal_i32_m256iavx2: Horizontal a - b with lanes as i32.
sub_horizontal_m128sse3: Subtract each lane horizontally, pack the outputs as a then b.
sub_horizontal_m256avx: Subtract adjacent f32 lanes.
sub_horizontal_m128dsse3: Subtract each lane horizontally, pack the outputs as a then b.
sub_horizontal_m256davx: Subtract adjacent f64 lanes.
sub_horizontal_saturating_i16_m128issse3: Subtract horizontal pairs of i16 values, saturating, pack the outputs as a then b.
sub_horizontal_saturating_i16_m256iavx2: Horizontal saturating a - b with lanes as i16.
sub_i8_m128isse2: Lanewise a - b with lanes as i8.
sub_i8_m256iavx2: Lanewise a - b with lanes as i8.
sub_i16_m128isse2: Lanewise a - b with lanes as i16.
sub_i16_m256iavx2: Lanewise a - b with lanes as i16.
sub_i32_m128isse2: Lanewise a - b with lanes as i32.
sub_i32_m256iavx2: Lanewise a - b with lanes as i32.
sub_i64_m128isse2: Lanewise a - b with lanes as i64.
sub_i64_m256iavx2: Lanewise a - b with lanes as i64.
sub_m128sse: Lanewise a - b.
sub_m256avx: Lanewise a - b with f32 lanes.
sub_m128_ssse: Low lane a - b, other lanes unchanged.
sub_m128dsse2: Lanewise a - b.
sub_m128d_ssse2: Lowest lane a - b, high lane unchanged.
sub_m256davx: Lanewise a - b with f64 lanes.
sub_saturating_i8_m128isse2: Lanewise saturating a - b with lanes as i8.
sub_saturating_i8_m256iavx2: Lanewise saturating a - b with lanes as i8.
sub_saturating_i16_m128isse2: Lanewise saturating a - b with lanes as i16.
sub_saturating_i16_m256iavx2: Lanewise saturating a - b with lanes as i16.
sub_saturating_u8_m128isse2: Lanewise saturating a - b with lanes as u8.
sub_saturating_u8_m256iavx2: Lanewise saturating a - b with lanes as u8.
sub_saturating_u16_m128isse2: Lanewise saturating a - b with lanes as u16.
sub_saturating_u16_m256iavx2: Lanewise saturating a - b with lanes as u16.
sum_of_u8_abs_diff_m128isse2: Compute “sum of u8 absolute differences”.
sum_of_u8_abs_diff_m256iavx2: Compute “sum of u8 absolute differences”.
test_all_ones_m128isse4.1: Tests if all bits are 1.
test_all_zeroes_m128isse4.1: Returns if all masked bits are 0, (a & mask) as u128 == 0
test_mixed_ones_and_zeroes_m128isse4.1: Returns if, among the masked bits, there’s both 0s and 1s
testc_m128avx: Compute the bitwise of sign bit NOT of a and then AND with b, returns 1 if the result is zero, otherwise 0.
testc_m256avx: Compute the bitwise of sign bit NOT of a and then AND with b, returns 1 if the result is zero, otherwise 0.
testc_m128davx: Compute the bitwise of sign bit NOT of a and then AND with b, returns 1 if the result is zero, otherwise 0.
testc_m128isse4.1: Compute the bitwise NOT of a and then AND with b, returns 1 if the result is zero, otherwise 0.
testc_m256davx: Compute the bitwise of sign bit NOT of a and then AND with b, returns 1 if the result is zero, otherwise 0.
testc_m256iavx: Compute the bitwise NOT of a and then AND with b, returns 1 if the result is zero, otherwise 0.
testz_m128avx: Computes the bitwise AND of 256 bits in a and b, returns 1 if the result is zero, otherwise 0.
testz_m256avx: Computes the bitwise AND of 256 bits in a and b, returns 1 if the result is zero, otherwise 0.
testz_m128davx: Computes the bitwise of sign bitAND of 256 bits in a and b, returns 1 if the result is zero, otherwise 0.
testz_m128isse4.1: Computes the bitwise AND of 256 bits in a and b, returns 1 if the result is zero, otherwise 0.
testz_m256davx: Computes the bitwise of sign bit AND of 256 bits in a and b, returns 1 if the result is zero, otherwise 0.
testz_m256iavx: Computes the bitwise of sign bit AND of 256 bits in a and b, returns 1 if the result is zero, otherwise 0.
trailing_zero_count_u32bmi1: Counts the number of trailing zero bits in a u32.
trailing_zero_count_u64bmi1: Counts the number of trailing zero bits in a u64.
transpose_four_m128sse: Transpose four m128 as if they were a 4x4 matrix.
truncate_m128_to_m128isse2: Truncate the f32 lanes to i32 lanes.
truncate_m128d_to_m128isse2: Truncate the f64 lanes to the lower i32 lanes (upper i32 lanes 0).
truncate_to_i32_m128d_ssse2: Truncate the lower lane into an i32.
truncate_to_i64_m128d_ssse2: Truncate the lower lane into an i64.
unpack_hi_m256avx: Unpack and interleave the high lanes.
unpack_hi_m256davx: Unpack and interleave the high lanes.
unpack_high_i8_m128isse2: Unpack and interleave high i8 lanes of a and b.
unpack_high_i8_m256iavx2: Unpack and interleave high i8 lanes of a and b.
unpack_high_i16_m128isse2: Unpack and interleave high i16 lanes of a and b.
unpack_high_i16_m256iavx2: Unpack and interleave high i16 lanes of a and b.
unpack_high_i32_m128isse2: Unpack and interleave high i32 lanes of a and b.
unpack_high_i32_m256iavx2: Unpack and interleave high i32 lanes of a and b.
unpack_high_i64_m128isse2: Unpack and interleave high i64 lanes of a and b.
unpack_high_i64_m256iavx2: Unpack and interleave high i64 lanes of a and b.
unpack_high_m128sse: Unpack and interleave high lanes of a and b.
unpack_high_m128dsse2: Unpack and interleave high lanes of a and b.
unpack_lo_m256avx: Unpack and interleave the high lanes.
unpack_lo_m256davx: Unpack and interleave the high lanes.
unpack_low_i8_m128isse2: Unpack and interleave low i8 lanes of a and b.
unpack_low_i8_m256iavx2: Unpack and interleave low i8 lanes of a and b.
unpack_low_i16_m128isse2: Unpack and interleave low i16 lanes of a and b.
unpack_low_i16_m256iavx2: Unpack and interleave low i16 lanes of a and b.
unpack_low_i32_m128isse2: Unpack and interleave low i32 lanes of a and b.
unpack_low_i32_m256iavx2: Unpack and interleave low i32 lanes of a and b.
unpack_low_i64_m128isse2: Unpack and interleave low i64 lanes of a and b.
unpack_low_i64_m256iavx2: Unpack and interleave low i64 lanes of a and b.
unpack_low_m128sse: Unpack and interleave low lanes of a and b.
unpack_low_m128dsse2: Unpack and interleave low lanes of a and b.
zero_extend_m128avx: Zero extend an m128 to m256
zero_extend_m128davx: Zero extend an m128d to m256d
zero_extend_m128iavx: Zero extend an m128i to m256i
zeroed_m128sse: All lanes zero.
zeroed_m256avx: A zeroed m256
zeroed_m128dsse2: Both lanes zero.
zeroed_m128isse2: All lanes zero.
zeroed_m256davx: A zeroed m256d
zeroed_m256iavx: A zeroed m256i

Crate safe_arch

Crate safe_arch Copy item path

§Naming Conventions

§Current Support

§Compile Time CPU Target Features

§A Note On Working With Cfg

Modules§

Macros§

Structs§

Constants§

Functions§

Crate safe_arch