[−][src]Crate safe_arch

A crate that safely exposes arch intrinsics via #[cfg()].

safe_arch lets you safely use CPU intrinsics. Those things in the core::arch modules. It works purely via #[cfg()] and compile time CPU feature declaration. If you want to check for a feature at runtime and then call an intrinsic or use a fallback path based on that then this crate is sadly not for you.

SIMD register types are "newtype'd" so that better trait impls can be given to them, but the inner value is a pub field so feel to just grab it out if you need to. Trait impls of the newtypes include: Default (zeroed), From/Into of appropriate data types, and appropriate operator overloading.

Most intrinsics (like addition and multiplication) are totally safe to use as long as the CPU feature is available. In this case, what you get is 1:1 with the actual intrinsic.
Some intrinsics take a pointer of an assumed minimum alignment and validity span. For these, the safe_arch function takes a reference of an appropriate type to uphold safety.
- Try the bytemuck crate (and turn on the bytemuck feature of this crate) if you want help safely casting between reference types.
Some intrinsics are not safe unless you're very careful about how you use them, such as the streaming operations requiring you to use them in combination with an appropriate memory fence. Those operations aren't exposed here.
Some intrinsics mess with the processor state, such as changing the floating point flags, saving and loading special register state, and so on. LLVM doesn't really support you messing with that within a high level language, so those operations aren't exposed here. Use assembly or something if you want to do that.

Naming Conventions

The actual names for each intrinsic are generally a flaming dumpster of letters that only make sense after you've learned all the names. They're very bad for learning what things do. Accordingly, safe_arch uses very verbose naming that (hopefully) improves the new-user experience.

Function names start with the primary "verb" of the operation, and then any adverbs go after that. This makes for slightly awkward English but helps the list of all the functions sort a little better.
- Eg: add_i32_m128i and add_i16_saturating_m128i
Function names end with the register type they're most associated with.
- Eg: and_m128 (for m128) and and_m128d (for m128d)
If a function operates on just the lowest data lane it generally has _s after the register type, because it's a "scalar" operation. The higher lanes are generally just copied forward, or taken from a secondary argument, or something. Details vary.
- Eg: sqrt_m128 (all lanes) and sqrt_m128_s (low lane only)

Of course, people can't even always agree on what words mean. The common verb names for this crate, and their conventions, are as follows:

load: Reads memory into a register (deref &Foo to Foo).
store: Writes a register to memory (writes Foo to a &mut Foo).
set: Packs values into a register (works like [1, 2, 3, 4] to build an array).
splat: Copy a value as many times as possible across the bits of a register (works like [1_i32; LEN] array building).
extract: Get an individual lane out of a SIMD register (works like array access). The lane to get has to be a const value.
insert: Duplicate a register and then replace the value of a specific lane (works like let mut a2 = a.clone(); a2[i] = new;). The lane to overwrite has to be a const value.
cast: change data types while preserving the bit pattern (like how transmute would do it).
convert: change data types while trying to preserve the numeric value (which might change the bits, like how as would do it).

This crate is pre-1.0 and if you feel that an operation should have a better name to improve the crate's consistency please file an issue.

Current Support

Intel and AMD (x86 / x86_64)
- 128-bit: sse, sse2, sse3, ssse3, sse4.1, sse4.2
- 256-bit: avx
- Other: adx, aes, bmi1, bmi2, lzcnt, pclmulqdq, popcnt, rdrand, rdseed

Compile Time CPU Target Features

At the time of me writing this, Rust enables the sse and sse2 CPU features by default for all i686 (x86) and x86_64 builds. Those CPU features are built into the design of x86_64, and you'd need a super old x86 CPU for it to not support at least sse and sse2, so they're a safe bet for the language to enable all the time. In fact, because the standard library is compiled with them enabled, simply trying to disable those features would actually cause ABI issues and fill your program with UB (link).

If you want additional CPU features available at compile time you'll have to enable them with an additional arg to rustc. For a feature named name you pass -C target-feature=+name, such as -C target-feature=+sse3 for sse3.

You can alternately enable all target features of the current CPU with -C target-cpu=native. This is primarily of use if you're building a program you'll only run on your own system.

It's sometimes hard to know if your target platform will support a given feature set, but the Steam Hardware Survey is generally taken as a guide to what you can expect people to have available. If you click "Other Settings" it'll expand into a list of CPU target features and how common they are. These days, it seems that sse3 can be safely assumed, and ssse3, sse4.1, and sse4.2 are pretty safe bets as well. The stuff above 128-bit isn't as common yet, give it another few years.

Please note that executing a program on a CPU that doesn't support the target features it was compiles for is Undefined Behavior.

Currently, Rust doesn't actually support an easy way for you to check that a feature enabled at compile time is actually available at runtime. There is the "feature_detected" family of macros, but if you enable a feature they will evaluate to a constant true instead of actually deferring the check for the feature to runtime. This means that, if you did want a check at the start of your program, to confirm that all the assumed features are present and error out when the assumptions don't hold, you can't use that macro. You gotta use CPUID and check manually. rip. Hopefully we can make that process easier in a future version of this crate.

A Note On Working With Cfg

There's two main ways to use cfg:

Via an attribute placed on an item, block, or expression:
- #[cfg(debug_assertions)] println!("hello");
Via a macro used within an expression position:
- if cfg!(debug_assertions) { println!("hello"); }

The difference might seem small but it's actually very important:

The attribute form will include code or not before deciding if all the items named and so forth really exist or not. This means that code that is configured via attribute can safely name things that don't always exist as long as the things they name do exist whenever that code is configured into the build.
The macro form will include the configured code no matter what, and then the macro resolves to a constant true or false and the compiler uses dead code elimination to cut out the path not taken.

This crate uses cfg via the attribute, so the functions it exposes don't exist at all when the appropriate CPU target features aren't enabled. Accordingly, if you plan to call this crate or not depending on what features are enabled in the build you'll also need to control your use of this crate via cfg attribute, not cfg macro.

Macros

aes_key_gen_assist_m128i	`aes` ?
blend_immediate_i16_m128i	Blends the `i16` lanes according to the immediate mask.
blend_immediate_m128d	Blends the lanes according to the immediate mask.
blend_immediate_m128	Blends the lanes according to the immediate mask.
blend_immediate_m256d	`avx` Blends the `f64` lanes according to the immediate mask.
blend_immediate_m256	`avx` Blends the `f32` lanes according to the immediate mask.
byte_shift_left_u128_immediate_m128i	Shifts all bits in the entire register left by a number of bytes.
byte_shift_right_u128_immediate_m128i	Shifts all bits in the entire register right by a number of bytes.
cmp_op_mask_m128	`avx` Compare `f32` lanes according to the operation specified, mask output.
cmp_op_mask_m128_s	`avx` Compare `f32` lanes according to the operation specified, mask output.
cmp_op_mask_m128d	`avx` Compare `f64` lanes according to the operation specified, mask output.
cmp_op_mask_m128d_s	`avx` Compare `f64` lanes according to the operation specified, mask output.
cmp_op_mask_m256	`avx` Compare `f32` lanes according to the operation specified, mask output.
cmp_op_mask_m256d	`avx` Compare `f64` lanes according to the operation specified, mask output.
combined_byte_shift_right_immediate_m128i	Counts `$a` as the high bytes and `$b` as the low bytes then performs a byte shift to the right by the immediate value.
comparison_operator_translation	`avx` Turns a comparison operator token to the correct constant value.
dot_product_m128d	Performs a dot product of two `m128d` registers.
dot_product_m128	Performs a dot product of two `m128` registers.
dot_product_m256	`avx` This works like `dot_product_m128`, but twice as wide.
extract_f32_as_i32_bits_immediate_m128	Gets the `f32` lane requested. Returns as an `i32` bit pattern.
extract_i16_as_i32_m128i	Gets an `i16` value out of an `m128i`, returns as `i32`.
extract_i32_from_m256i	`avx` Extracts an `i32` lane from `m256i`
extract_i32_immediate_m128i	Gets the `i32` lane requested. Only the lowest 2 bits are considered.
extract_i64_from_m256i	`avx` Extracts an `i64` lane from `m256i`
extract_i64_immediate_m128i	Gets the `i64` lane requested. Only the lowest bit is considered.
extract_i8_as_i32_immediate_m128i	Gets the `i8` lane requested. Only the lowest 4 bits are considered.
extract_m128_from_m256	`avx` Extracts an `m128` from `m256`
extract_m128d_from_m256d	`avx` Extracts an `m128d` from `m256d`
extract_m128i_from_m256i	`avx` Extracts an `m128i` from `m256i`
insert_f32_immediate_m128	Inserts a lane from `$b` into `$a`, optionally at a new position.
insert_i16_from_i32_m128i	Inserts the low 16 bits of an `i32` value into an `m128i`.
insert_i16_to_m256i	`avx` Inserts an `i16` to `m256i`
insert_i32_immediate_m128i	Inserts a new value for the `i32` lane specified.
insert_i32_to_m256i	`avx` Inserts an `i32` to `m256i`
insert_i64_immediate_m128i	Inserts a new value for the `i64` lane specified.
insert_i64_to_m256i	`avx` Inserts an `i64` to `m256i`
insert_i8_immediate_m128i	Inserts a new value for the `i64` lane specified.
insert_i8_to_m256i	`avx` Inserts an `i8` to `m256i`
insert_m128_to_m256	`avx` Inserts an `m128` to `m256`
insert_m128d_to_m256d	`avx` Inserts an `m128d` to `m256d`
insert_m128i_to_m256i	`avx` Inserts an `m128i` to `m256i`
mul_i64_carryless_m128i	`pclmulqdq` Performs a "carryless" multiplication of two `i64` values.
multi_packed_sum_abs_diff_u8_m128i	Computes eight `u16` "sum of absolute difference" values according to the bytes selected.
permute_f128_in_m256d	`avx` Permutes the lanes around.
permute_f128_in_m256	`avx` Permutes the lanes around.
permute_i128_in_m256i	`avx` Permutes the lanes around.
permute_m128d	`avx` Permutes the lanes around.
permute_m128	`avx` Permutes the lanes around.
permute_m256d	`avx` Permutes the lanes around.
permute_m256	`avx` Permutes the lanes around.
round_m128d	Rounds each lane in the style specified.
round_m128d_s	Rounds `$b` low as specified, keeps `$a` high.
round_m128	Rounds each lane in the style specified.
round_m128_s	Rounds `$b` low as specified, other lanes use `$a`.
round_m256d	`avx` Rounds each lane in the style specified.
round_m256	`avx` Rounds each lane in the style specified.
shift_left_i16_immediate_m128i	Shifts all `i16` lanes left by an immediate.
shift_left_i32_immediate_m128i	Shifts all `i32` lanes left by an immediate.
shift_left_i64_immediate_m128i	Shifts both `i64` lanes left by an immediate.
shift_right_i16_immediate_m128i	Shifts all `i16` lanes right by an immediate.
shift_right_i32_immediate_m128i	Shifts all `i32` lanes right by an immediate.
shift_right_u16_immediate_m128i	Shifts all `u16` lanes right by an immediate.
shift_right_u32_immediate_m128i	Shifts all `u32` lanes right by an immediate.
shift_right_u64_immediate_m128i	Shifts both `u64` lanes right by an immediate.
shuffle_i16_high_lanes_m128i	Shuffles the higher `i16` lanes, low lanes unaffected.
shuffle_i16_low_lanes_m128i	Shuffles the lower `i16` lanes, high lanes unaffected.
shuffle_i32_m128i	Shuffles the `i32` lanes around.
shuffle_m128	Shuffles the lanes around.
shuffle_m128d	Shuffles the lanes around.
shuffle_m256d	`avx` Shuffles the `f64` lanes around.
shuffle_m256	`avx` Shuffles the `f32` lanes around.
string_search_for_index	`sse4.1` Looks for `$needle` in `$haystack` and gives the index of the either the first or last match.
string_search_for_mask	`sse4.1` Looks for `$needle` in `$haystack` and gives the mask of where the matches were.

Structs

m128	The data for a 128-bit SSE register of four `f32` lanes.
m128d	The data for a 128-bit SSE register of two `f64` values.
m128i	The data for a 128-bit SSE register of integer data.
m256	The data for a 256-bit SSE register of eight `f32` lanes.
m256d	The data for a 256-bit SSE register of four `f64` values.
m256i	The data for a 256-bit SSE register of integer data.

Functions

abs_i16_m128i	Lanewise absolute value with lanes as `i16`.
abs_i32_m128i	Lanewise absolute value with lanes as `i32`.
abs_i8_m128i	Lanewise absolute value with lanes as `i8`.
add_carry_u32	`adx` Add two `u32` with a carry value.
add_carry_u64	`adx` Add two `u64` with a carry value.
add_horizontal_i16_m128i	Add horizontal pairs of `i16` values, pack the outputs as `a` then `b`.
add_horizontal_i32_m128i	Add horizontal pairs of `i32` values, pack the outputs as `a` then `b`.
add_horizontal_m128d	Add each lane horizontally, pack the outputs as `a` then `b`.
add_horizontal_m128	Add each lane horizontally, pack the outputs as `a` then `b`.
add_horizontal_m256d	`avx` Add adjacent `f64` lanes.
add_horizontal_m256	`avx` Add adjacent `f32` lanes.
add_horizontal_saturating_i16_m128i	Add horizontal pairs of `i16` values, saturating, pack the outputs as `a` then `b`.
add_i16_m128i	Lanewise `a + b` with lanes as `i16`.
add_i32_m128i	Lanewise `a + b` with lanes as `i32`.
add_i64_m128i	Lanewise `a + b` with lanes as `i64`.
add_i8_m128i	Lanewise `a + b` with lanes as `i8`.
add_m128	`sse` Lanewise `a + b`.
add_m128_s	Low lane `a + b`, other lanes unchanged.
add_m128d	Lanewise `a + b`.
add_m128d_s	Lowest lane `a + b`, high lane unchanged.
add_m256d	`avx` Lanewise `a + b` with `f64` lanes.
add_m256	`avx` Lanewise `a + b` with `f32` lanes.
add_saturating_i16_m128i	Lanewise saturating `a + b` with lanes as `i16`.
add_saturating_i8_m128i	Lanewise saturating `a + b` with lanes as `i8`.
add_saturating_u16_m128i	Lanewise saturating `a + b` with lanes as `u16`.
add_saturating_u8_m128i	Lanewise saturating `a + b` with lanes as `u8`.
add_sub_m128d	Add the high lane and subtract the low lane.
add_sub_m128	Alternately, from the top, add a lane and then subtract a lane.
add_sub_m256d	`avx` Alternately, from the top, add `f64` then sub `f64`.
add_sub_m256	`avx` Alternately, from the top, add `f32` then sub `f32`.
aes_decrypt_last_m128i	`aes` Perform the last round of AES decryption flow on `a` using the `round_key`.
aes_decrypt_m128i	`aes` Perform one round of AES decryption flow on `a` using the `round_key`.
aes_encrypt_last_m128i	`aes` Perform the last round of AES encryption flow on `a` using the `round_key`.
aes_encrypt_m128i	`aes` Perform one round of AES encryption flow on `a` using the `round_key`.
aes_inv_mix_columns_m128i	`aes` Perform the InvMixColumns transform on `a`.
and_m128	Bitwise `a & b`.
and_m128d	Bitwise `a & b`.
and_m128i	Bitwise `a & b`.
and_m256d	`avx` Bitwise `a & b`.
and_m256	`avx` Bitwise `a & b`.
andnot_m128	Bitwise `(!a) & b`.
andnot_m128d	Bitwise `(!a) & b`.
andnot_m128i	Bitwise `(!a) & b`.
andnot_m256d	`avx` Bitwise `(!a) & b`.
andnot_m256	`avx` Bitwise `(!a) & b`.
andnot_u32	`bmi1` Bitwise `(!a) & b`, `u32`
andnot_u64	`bmi1` Bitwise `(!a) & b`, `u64`
average_u16_m128i	Lanewise average of the `u16` values.
average_u8_m128i	Lanewise average of the `u8` values.
bit_extract2_u32	`bmi1` Extract a span of bits from the `u32`, control value style.
bit_extract2_u64	`bmi1` Extract a span of bits from the `u64`, control value style.
bit_extract_u32	`bmi1` Extract a span of bits from the `u32`, start and len style.
bit_extract_u64	`bmi1` Extract a span of bits from the `u64`, start and len style.
bit_lowest_set_mask_u32	`bmi1` Gets the mask of all bits up to and including the lowest set bit in a `u32`.
bit_lowest_set_mask_u64	`bmi1` Gets the mask of all bits up to and including the lowest set bit in a `u64`.
bit_lowest_set_reset_u32	`bmi1` Resets (clears) the lowest set bit.
bit_lowest_set_reset_u64	`bmi1` Resets (clears) the lowest set bit.
bit_lowest_set_value_u32	`bmi1` Gets the value of the lowest set bit in a `u32`.
bit_lowest_set_value_u64	`bmi1` Gets the value of the lowest set bit in a `u64`.
bit_zero_high_index_u32	`bmi2` Zero out all high bits in a `u32` starting at the index given.
bit_zero_high_index_u64	`bmi2` Zero out all high bits in a `u64` starting at the index given.
blend_varying_i8_m128i	Blend the `i8` lanes according to a runtime varying mask.
blend_varying_m128d	Blend the lanes according to a runtime varying mask.
blend_varying_m128	Blend the lanes according to a runtime varying mask.
blend_varying_m256d	`avx` Blend the lanes according to a runtime varying mask.
blend_varying_m256	`avx` Blend the lanes according to a runtime varying mask.
cast_from_m256_to_m256d	`avx` Bit-preserving cast from `m256` to `m256i`.
cast_from_m256_to_m256i	`avx` Bit-preserving cast from `m256` to `m256i`.
cast_from_m256d_to_m256	`avx` Bit-preserving cast from `m256d` to `m256`.
cast_from_m256d_to_m256i	`avx` Bit-preserving cast from `m256d` to `m256i`.
cast_from_m256i_to_m256d	`avx` Bit-preserving cast from `m256i` to `m256d`.
cast_from_m256i_to_m256	`avx` Bit-preserving cast from `m256i` to `m256`.
cast_to_m128_from_m128d	Bit-preserving cast to `m128` from `m128d`
cast_to_m128_from_m128i	Bit-preserving cast to `m128` from `m128i`
cast_to_m128d_from_m128	Bit-preserving cast to `m128d` from `m128`
cast_to_m128d_from_m128i	Bit-preserving cast to `m128d` from `m128i`
cast_to_m128i_from_m128d	Bit-preserving cast to `m128i` from `m128d`
cast_to_m128i_from_m128	Bit-preserving cast to `m128i` from `m128`
ceil_m128d	Round each lane to a whole number, towards positive infinity
ceil_m128	Round each lane to a whole number, towards positive infinity
ceil_m128d_s	Round the low lane of `b` toward positive infinity, high lane is `a`.
ceil_m128_s	Round the low lane of `b` toward positive infinity, other lanes `a`.
ceil_m256d	`avx` Round `f64` lanes towards positive infinity.
ceil_m256	`avx` Round `f32` lanes towards positive infinity.
cmp_eq_i32_m128_s	Low lane equality.
cmp_eq_i32_m128d_s	Low lane `f64` equal to.
cmp_eq_mask_i16_m128i	Lanewise `a == b` with lanes as `i16`.
cmp_eq_mask_i32_m128i	Lanewise `a == b` with lanes as `i32`.
cmp_eq_mask_i64_m128i	Lanewise `a == b` with lanes as `i64`.
cmp_eq_mask_i8_m128i	Lanewise `a == b` with lanes as `i8`.
cmp_eq_mask_m128	Lanewise `a == b`.
cmp_eq_mask_m128_s	Low lane `a == b`, other lanes unchanged.
cmp_eq_mask_m128d	Lanewise `a == b`, mask output.
cmp_eq_mask_m128d_s	Low lane `a == b`, other lanes unchanged.
cmp_ge_i32_m128_s	Low lane greater than or equal to.
cmp_ge_i32_m128d_s	Low lane `f64` greater than or equal to.
cmp_ge_mask_m128	Lanewise `a >= b`.
cmp_ge_mask_m128_s	Low lane `a >= b`, other lanes unchanged.
cmp_ge_mask_m128d	Lanewise `a >= b`.
cmp_ge_mask_m128d_s	Low lane `a >= b`, other lanes unchanged.
cmp_gt_i32_m128_s	Low lane greater than.
cmp_gt_i32_m128d_s	Low lane `f64` greater than.
cmp_gt_mask_i16_m128i	Lanewise `a > b` with lanes as `i16`.
cmp_gt_mask_i32_m128i	Lanewise `a > b` with lanes as `i32`.
cmp_gt_mask_i64_m128i	`sse4.1` Lanewise `a > b` with lanes as `i64`.
cmp_gt_mask_i8_m128i	Lanewise `a > b` with lanes as `i8`.
cmp_gt_mask_m128	Lanewise `a > b`.
cmp_gt_mask_m128_s	Low lane `a > b`, other lanes unchanged.
cmp_gt_mask_m128d	Lanewise `a > b`.
cmp_gt_mask_m128d_s	Low lane `a > b`, other lanes unchanged.
cmp_le_i32_m128_s	Low lane less than or equal to.
cmp_le_i32_m128d_s	Low lane `f64` less than or equal to.
cmp_le_mask_m128	Lanewise `a <= b`.
cmp_le_mask_m128_s	Low lane `a <= b`, other lanes unchanged.
cmp_le_mask_m128d	Lanewise `a <= b`.
cmp_le_mask_m128d_s	Low lane `a <= b`, other lanes unchanged.
cmp_lt_i32_m128_s	Low lane less than.
cmp_lt_i32_m128d_s	Low lane `f64` less than.
cmp_lt_mask_i16_m128i	Lanewise `a < b` with lanes as `i16`.
cmp_lt_mask_i32_m128i	Lanewise `a < b` with lanes as `i32`.
cmp_lt_mask_i8_m128i	Lanewise `a < b` with lanes as `i8`.
cmp_lt_mask_m128	Lanewise `a < b`.
cmp_lt_mask_m128_s	Low lane `a < b`, other lanes unchanged.
cmp_lt_mask_m128d	Lanewise `a < b`.
cmp_lt_mask_m128d_s	Low lane `a < b`, other lane unchanged.
cmp_neq_i32_m128_s	Low lane not equal to.
cmp_neq_i32_m128d_s	Low lane `f64` less than.
cmp_neq_mask_m128	Lanewise `a != b`.
cmp_neq_mask_m128_s	Low lane `a != b`, other lanes unchanged.
cmp_neq_mask_m128d	Lanewise `a != b`.
cmp_neq_mask_m128d_s	Low lane `a != b`, other lane unchanged.
cmp_nge_mask_m128	Lanewise `!(a >= b)`.
cmp_nge_mask_m128_s	Low lane `!(a >= b)`, other lanes unchanged.
cmp_nge_mask_m128d	Lanewise `!(a >= b)`.
cmp_nge_mask_m128d_s	Low lane `!(a >= b)`, other lane unchanged.
cmp_ngt_mask_m128	Lanewise `!(a > b)`.
cmp_ngt_mask_m128_s	Low lane `!(a > b)`, other lanes unchanged.
cmp_ngt_mask_m128d	Lanewise `!(a > b)`.
cmp_ngt_mask_m128d_s	Low lane `!(a > b)`, other lane unchanged.
cmp_nle_mask_m128	Lanewise `!(a <= b)`.
cmp_nle_mask_m128_s	Low lane `!(a <= b)`, other lanes unchanged.
cmp_nle_mask_m128d	Lanewise `!(a <= b)`.
cmp_nle_mask_m128d_s	Low lane `!(a <= b)`, other lane unchanged.
cmp_nlt_mask_m128	Lanewise `!(a < b)`.
cmp_nlt_mask_m128_s	Low lane `!(a < b)`, other lanes unchanged.
cmp_nlt_mask_m128d	Lanewise `!(a < b)`.
cmp_nlt_mask_m128d_s	Low lane `!(a < b)`, other lane unchanged.
cmp_ord_mask_m128	Lanewise `(!a.is_nan()) & (!b.is_nan())`.
cmp_ord_mask_m128_s	Low lane `(!a.is_nan()) & (!b.is_nan())`, other lanes unchanged.
cmp_ord_mask_m128d	Lanewise `(!a.is_nan()) & (!b.is_nan())`.
cmp_ord_mask_m128d_s	Low lane `(!a.is_nan()) & (!b.is_nan())`, other lane unchanged.
cmp_unord_mask_m128	Lanewise `a.is_nan() \| b.is_nan()`.
cmp_unord_mask_m128_s	Low lane `a.is_nan() \| b.is_nan()`, other lanes unchanged.
cmp_unord_mask_m128d	Lanewise `a.is_nan() \| b.is_nan()`.
cmp_unord_mask_m128d_s	Low lane `a.is_nan() \| b.is_nan()`, other lane unchanged.
convert_i16_lower2_to_i64_m128i	Convert the lower two `i16` lanes to two `i32` lanes.
convert_i16_lower4_to_i32_m128i	Convert the lower four `i16` lanes to four `i32` lanes.
convert_i32_lower2_to_i64_m128i	Convert the lower two `i32` lanes to two `i64` lanes.
convert_i32_replace_m128_s	Convert `i32` to `f32` and replace the low lane of the input.
convert_i32_replace_m128d_s	Convert `i32` to `f64` and replace the low lane of the input.
convert_i64_replace_m128d_s	Convert `i64` to `f64` and replace the low lane of the input.
convert_i8_lower2_to_i64_m128i	Convert the lower two `i8` lanes to two `i64` lanes.
convert_i8_lower4_to_i32_m128i	Convert the lower four `i8` lanes to four `i32` lanes.
convert_i8_lower8_to_i16_m128i	Convert the lower eight `i8` lanes to eight `i16` lanes.
convert_m128_s_replace_m128d_s	Converts the lower `f32` to `f64` and replace the low lane of the input
convert_m128d_s_replace_m128_s	Converts the low `f64` to `f32` and replaces the low lane of the input.
convert_to_f32_from_m256_s	`avx` Convert the lowest `f64` lane to a single `f64`.
convert_to_f64_from_m256d_s	`avx` Convert the lowest `f64` lane to a single `f64`.
convert_to_i32_from_m256i_s	`avx` Convert the lowest `f64` lane to a single `f64`.
convert_to_i32_m128i_from_m256d	`avx` Convert `f64` lanes to `i32` lanes.
convert_to_i32_m256i_from_m256	`avx` Convert `f32` lanes to `i32` lanes.
convert_to_m128_from_m128i	Rounds the four `i32` lanes to four `f32` lanes.
convert_to_m128_from_m128d	Rounds the two `f64` lanes to the low two `f32` lanes.
convert_to_m128_from_m256d	`avx` Convert `f64` lanes to be `f32` lanes.
convert_to_m128d_from_m128i	Rounds the lower two `i32` lanes to two `f64` lanes.
convert_to_m128d_from_m128	Rounds the two `f64` lanes to the low two `f32` lanes.
convert_to_m128i_from_m128d	Rounds the two `f64` lanes to the low two `i32` lanes.
convert_to_m128i_from_m128	Rounds the two `f64` lanes to the low two `i32` lanes.
convert_to_m128i_from_m256d	`avx` Convert `f64` lanes to be `i32` lanes.
convert_to_m256_from_i32_m256i	`avx` Convert `i32` lanes to be `f32` lanes.
convert_to_m256d_from_i32_m128i	`avx` Convert `i32` lanes to be `f64` lanes.
convert_to_m256d_from_m128	`avx` Convert `f32` lanes to be `f64` lanes.
convert_to_m256i_from_m256	`avx` Convert `f32` lanes to be `i32` lanes.
convert_u16_lower2_to_u64_m128i	Convert the lower two `u16` lanes to two `u64` lanes.
convert_u16_lower4_to_u32_m128i	Convert the lower four `u16` lanes to four `u32` lanes.
convert_u32_lower2_to_u64_m128i	Convert the lower two `u32` lanes to two `u64` lanes.
convert_u8_lower2_to_u64_m128i	Convert the lower two `u8` lanes to two `u64` lanes.
convert_u8_lower4_to_u32_m128i	Convert the lower four `u8` lanes to four `u32` lanes.
convert_u8_lower8_to_u16_m128i	Convert the lower eight `u8` lanes to eight `u16` lanes.
copy_i64_m128i_s	Copy the low `i64` lane to a new register, upper bits 0.
copy_replace_low_f64_m128d	Copies the `a` value and replaces the low lane with the low `b` value.
crc32_u8	`sse4.1` Accumulates the `u8` into a running CRC32 value.
crc32_u16	`sse4.1` Accumulates the `u16` into a running CRC32 value.
crc32_u32	`sse4.1` Accumulates the `u32` into a running CRC32 value.
crc32_u64	`sse4.1` Accumulates the `u64` into a running CRC32 value.
div_m128	Lanewise `a / b`.
div_m128_s	Low lane `a / b`, other lanes unchanged.
div_m128d	Lanewise `a / b`.
div_m128d_s	Lowest lane `a / b`, high lane unchanged.
div_m256d	`avx` Lanewise `a / b` with `f64`.
div_m256	`avx` Lanewise `a / b` with `f32`.
duplicate_even_lanes_m128	Duplicate the odd lanes to the even lanes.
duplicate_even_lanes_m256	`avx` Duplicate the even-indexed lanes to the odd lanes.
duplicate_low_lane_m128d_s	Copy the low lane of the input to both lanes of the output.
duplicate_odd_lanes_m128	Duplicate the odd lanes to the even lanes.
duplicate_odd_lanes_m256d	`avx` Duplicate the odd-indexed lanes to the even lanes.
duplicate_odd_lanes_m256	`avx` Duplicate the odd-indexed lanes to the even lanes.
floor_m128d	Round each lane to a whole number, towards negative infinity
floor_m128	Round each lane to a whole number, towards negative infinity
floor_m128d_s	Round the low lane of `b` toward negative infinity, high lane is `a`.
floor_m128_s	Round the low lane of `b` toward negative infinity, other lanes `a`.
floor_m256d	`avx` Round `f64` lanes towards negative infinity.
floor_m256	`avx` Round `f32` lanes towards negative infinity.
get_f32_from_m128_s	Gets the low lane as an individual `f32` value.
get_f64_from_m128d_s	Gets the lower lane as an `f64` value.
get_i32_from_m128_s	Converts the low lane to `i32` and extracts as an individual value.
get_i32_from_m128d_s	Converts the lower lane to an `i32` value.
get_i32_from_m128i_s	Converts the lower lane to an `i32` value.
get_i64_from_m128d_s	Converts the lower lane to an `i64` value.
get_i64_from_m128i_s	Converts the lower lane to an `i64` value.
leading_zero_count_u32	`lzcnt` Count the leading zeroes in a `u32`.
leading_zero_count_u64	`lzcnt` Count the leading zeroes in a `u64`.
load_f32_m128_s	Loads the `f32` reference into the low lane of the register.
load_f32_splat_m128	Loads the `f32` reference into all lanes of a register.
load_f32_splat_m256	`avx` Load an `f32` and splat it to all lanes of an `m256d`
load_f64_m128d_s	Loads the reference into the low lane of the register.
load_f64_splat_m128d	Loads the `f64` reference into all lanes of a register.
load_f64_splat_m256d	`avx` Load an `f64` and splat it to all lanes of an `m256d`
load_i64_m128i_s	Loads the low `i64` into a register.
load_m128	Loads the reference into a register.
load_m128d	Loads the reference into a register.
load_m128i	Loads the reference into a register.
load_m256d	`avx` Load data from memory into a register.
load_m256	`avx` Load data from memory into a register.
load_m256i	`avx` Load data from memory into a register.
load_m128_splat_m256	`avx` Load an `m128` and splat it to the lower and upper half of an `m256`
load_m128d_splat_m256d	`avx` Load an `m128d` and splat it to the lower and upper half of an `m256d`
load_masked_m128d	`avx` Load data from memory into a register according to a mask.
load_masked_m128	`avx` Load data from memory into a register according to a mask.
load_masked_m256d	`avx` Load data from memory into a register according to a mask.
load_masked_m256	`avx` Load data from memory into a register according to a mask.
load_replace_high_m128d	Loads the reference into a register, replacing the high lane.
load_replace_low_m128d	Loads the reference into a register, replacing the low lane.
load_reverse_m128	Loads the reference into a register with reversed order.
load_reverse_m128d	Loads the reference into a register with reversed order.
load_unaligned_hi_lo_m256d	`avx` Load data from memory into a register.
load_unaligned_hi_lo_m256	`avx` Load data from memory into a register.
load_unaligned_hi_lo_m256i	`avx` Load data from memory into a register.
load_unaligned_m128	Loads the reference into a register.
load_unaligned_m128d	Loads the reference into a register.
load_unaligned_m128i	Loads the reference into a register.
load_unaligned_m256d	`avx` Load data from memory into a register.
load_unaligned_m256	`avx` Load data from memory into a register.
load_unaligned_m256i	`avx` Load data from memory into a register.
max_i16_m128i	Lanewise `max(a, b)` with lanes as `i16`.
max_i32_m128i	Lanewise `max(a, b)` with lanes as `i32`.
max_i8_m128i	Lanewise `max(a, b)` with lanes as `i8`.
max_m128	Lanewise `max(a, b)`.
max_m128_s	Low lane `max(a, b)`, other lanes unchanged.
max_m128d	Lanewise `max(a, b)`.
max_m128d_s	Low lane `max(a, b)`, other lanes unchanged.
max_m256d	`avx` Lanewise `max(a, b)`.
max_m256	`avx` Lanewise `max(a, b)`.
max_u16_m128i	Lanewise `max(a, b)` with lanes as `u16`.
max_u32_m128i	Lanewise `max(a, b)` with lanes as `u32`.
max_u8_m128i	Lanewise `max(a, b)` with lanes as `u8`.
min_i16_m128i	Lanewise `min(a, b)` with lanes as `i16`.
min_i32_m128i	Lanewise `min(a, b)` with lanes as `i32`.
min_i8_m128i	Lanewise `min(a, b)` with lanes as `i8`.
min_m128	Lanewise `min(a, b)`.
min_m128_s	Low lane `min(a, b)`, other lanes unchanged.
min_m128d	Lanewise `min(a, b)`.
min_m128d_s	Low lane `min(a, b)`, other lanes unchanged.
min_m256d	`avx` Lanewise `min(a, b)`.
min_m256	`avx` Lanewise `min(a, b)`.
min_position_u16_m128i	Min `u16` value, position, and other lanes zeroed.
min_u16_m128i	Lanewise `min(a, b)` with lanes as `u16`.
min_u32_m128i	Lanewise `min(a, b)` with lanes as `u32`.
min_u8_m128i	Lanewise `min(a, b)` with lanes as `u8`.
move_high_low_m128	Move the high lanes of `b` to the low lanes of `a`, other lanes unchanged.
move_low_high_m128	Move the low lanes of `b` to the high lanes of `a`, other lanes unchanged.
move_m128_s	Move the low lane of `b` to `a`, other lanes unchanged.
move_mask_i8_m128i	Gathers the `i8` sign bit of each lane.
move_mask_m128	Gathers the sign bit of each lane.
move_mask_m128d	Gathers the sign bit of each lane.
move_mask_m256d	`avx` Collects the sign bit of each lane into a 4-bit value.
move_mask_m256	`avx` Collects the sign bit of each lane into a 4-bit value.
mul_extended_u32	`bmi2` Multiply two `u32`, outputting the low bits and storing the high bits in the reference.
mul_extended_u64	`bmi2` Multiply two `u64`, outputting the low bits and storing the high bits in the reference.
mul_i16_horizontal_add_m128i	Multiply `i16` lanes producing `i32` values, horizontal add pairs of `i32` values to produce the final output.
mul_i16_keep_high_m128i	Lanewise `a * b` with lanes as `i16`, keep the high bits of the `i32` intermediates.
mul_i16_keep_low_m128i	Lanewise `a * b` with lanes as `i16`, keep the low bits of the `i32` intermediates.
mul_i16_scale_round_m128i	Multiply `i16` lanes into `i32` intermediates, keep the high 18 bits, round by adding 1, right shift by 1.
mul_i32_keep_low_m128i	Lanewise `a * b` with lanes as `i32`, keep the low bits of the `i64` intermediates.
mul_i64_widen_low_bits_m128i	Multiplies the lower 32 bits (only) of each `i64` lane into 64-bit `i64` values.
mul_m128	Lanewise `a * b`.
mul_m128_s	Low lane `a * b`, other lanes unchanged.
mul_m128d	Lanewise `a * b`.
mul_m128d_s	Lowest lane `a * b`, high lane unchanged.
mul_m256d	`avx` Lanewise `a * b` with `f64` lanes.
mul_m256	`avx` Lanewise `a * b` with `f32` lanes.
mul_u16_keep_high_m128i	Lanewise `a * b` with lanes as `u16`, keep the high bits of the `u32` intermediates.
mul_u64_widen_low_bits_m128i	Multiplies the lower 32 bits (only) of each `u64` lane into 64-bit `u64` values.
mul_u8i8_add_horizontal_saturating_m128i	This is dumb and weird.
or_m128	Bitwise `a \| b`.
or_m128d	Bitwise `a \| b`.
or_m128i	Bitwise `a \| b`.
or_m256d	`avx` Bitwise `a \| b`.
or_m256	`avx` Bitwise `a \| b`.
pack_i16_to_i8_m128i	Saturating convert `i16` to `i8`, and pack the values.
pack_i16_to_u8_m128i	Saturating convert `i16` to `u8`, and pack the values.
pack_i32_to_i16_m128i	Saturating convert `i32` to `i16`, and pack the values.
pack_i32_to_u16_m128i	Saturating convert `i32` to `u16`, and pack the values.
permute_varying_m128d	`avx` Permute with a runtime varying pattern.
permute_varying_m128	`avx` Permute with a runtime varying pattern.
permute_varying_m256d	`avx` Permute with a runtime varying pattern.
permute_varying_m256	`avx` Permute with a runtime varying pattern.
population_count_i32	`popcnt` Count the number of bits set within an `i32`
population_count_i64	`popcnt` Count the number of bits set within an `i64`
population_deposit_u32	`bmi2` Deposit contiguous low bits from a `u32` according to a mask.
population_deposit_u64	`bmi2` Deposit contiguous low bits from a `u64` according to a mask.
population_extract_u32	`bmi2` Extract bits from a `u32` according to a mask.
population_extract_u64	`bmi2` Extract bits from a `u64` according to a mask.
rdrand_u16	`rdrand` Try to obtain a random `u16` from the hardware RNG.
rdrand_u32	`rdrand` Try to obtain a random `u32` from the hardware RNG.
rdrand_u64	`rdrand` Try to obtain a random `u64` from the hardware RNG.
rdseed_u16	`rdseed` Try to obtain a random `u16` from the hardware RNG.
rdseed_u32	`rdseed` Try to obtain a random `u32` from the hardware RNG.
rdseed_u64	`rdseed` Try to obtain a random `u64` from the hardware RNG.
reciprocal_m128	Lanewise `1.0 / a` approximation.
reciprocal_m128_s	Low lane `1.0 / a` approximation, other lanes unchanged.
reciprocal_m256	`avx` Reciprocal of `f32` lanes.
reciprocal_sqrt_m128	Lanewise `1.0 / sqrt(a)` approximation.
reciprocal_sqrt_m128_s	Low lane `1.0 / sqrt(a)` approximation, other lanes unchanged.
reciprocal_sqrt_m256	`avx` Reciprocal of `f32` lanes.
set_i16_m128i	Sets the args into an `m128i`, first arg is the high lane.
set_i16_m256i	`avx` Set `i16` args into an `m256i` lane.
set_i32_m128i_s	Set an `i32` as the low 32-bit lane of an `m128i`, other lanes blank.
set_i32_m128i	Sets the args into an `m128i`, first arg is the high lane.
set_i32_m256i	`avx` Set `i32` args into an `m256i` lane.
set_i64_m128i_s	Set an `i64` as the low 64-bit lane of an `m128i`, other lanes blank.
set_i64_m128i	Sets the args into an `m128i`, first arg is the high lane.
set_i8_m128i	Sets the args into an `m128i`, first arg is the high lane.
set_i8_m256i	`avx` Set `i8` args into an `m256i` lane.
set_m128	Sets the args into an `m128`, first arg is the high lane.
set_m128_s	Sets the args into an `m128`, first arg is the high lane.
set_m128d	Sets the args into an `m128d`, first arg is the high lane.
set_m128d_s	Sets the args into the low lane of a `m128d`.
set_m256d	`avx` Set `f64` args into an `m256d` lane.
set_m256	`avx` Set `f32` args into an `m256` lane.
set_m128d_m256d	`avx` Set `m128d` args into an `m256d`.
set_m128i_m256i	`avx` Set `m128i` args into an `m256i`.
set_reversed_i16_m128i	Sets the args into an `m128i`, first arg is the low lane.
set_reversed_i16_m256i	`avx` Set `i16` args into an `m256i` lane.
set_reversed_i32_m128i	Sets the args into an `m128i`, first arg is the low lane.
set_reversed_i32_m256i	`avx` Set `i32` args into an `m256i` lane.
set_reversed_i8_m128i	Sets the args into an `m128i`, first arg is the low lane.
set_reversed_i8_m256i	`avx` Set `i8` args into an `m256i` lane.
set_reversed_m128	Sets the args into an `m128`, first arg is the low lane.
set_reversed_m128d	Sets the args into an `m128d`, first arg is the low lane.
set_reversed_m256d	`avx` Set `f64` args into an `m256d` lane.
set_reversed_m256	`avx` Set `f32` args into an `m256` lane.
set_reversed_m128d_m256d	`avx` Set `m128d` args into an `m256d`.
set_reversed_m128i_m256i	`avx` Set `m128i` args into an `m256i`.
set_splat_i16_m256i	`avx` Splat an `i16` arg into an `m256i` lane.
set_splat_i32_m256i	`avx` Splat an `i32` arg into an `m256i` lane.
set_splat_i8_m256i	`avx` Splat an `i8` arg into an `m256i` lane.
set_splat_m256	`avx` Splat an `f32` arg into an `m256` lane.
shift_left_i16_m128i	Shift each `i16` lane to the left by the `count` in the lower `i64` lane.
shift_left_i32_m128i	Shift each `i32` lane to the left by the `count` in the lower `i64` lane.
shift_left_i64_m128i	Shift each `i64` lane to the left by the `count` in the lower `i64` lane.
shift_right_i16_m128i	Shift each `i16` lane to the right by the `count` in the lower `i64` lane.
shift_right_i32_m128i	Shift each `i32` lane to the right by the `count` in the lower `i64` lane.
shift_right_u16_m128i	Shift each `u16` lane to the right by the `count` in the lower `i64` lane.
shift_right_u32_m128i	Shift each `u32` lane to the right by the `count` in the lower `i64` lane.
shift_right_u64_m128i	Shift each `u64` lane to the right by the `count` in the lower `i64` lane.
shuffle_i8_m128i	Shuffles the `i8` lanes according to the pattern in `b`.
sign_apply_i16_m128i	Applies the sign of `i16` values in `b` to the values in `a`.
sign_apply_i32_m128i	Applies the sign of `i32` values in `b` to the values in `a`.
sign_apply_i8_m128i	Applies the sign of `i8` values in `b` to the values in `a`.
splat_i16_m128i	Splats the `i16` to all lanes of the `m128i`.
splat_i32_m128i	Splats the `i32` to all lanes of the `m128i`.
splat_i64_m128i	Splats the `i64` to both lanes of the `m128i`.
splat_i8_m128i	Splats the `i8` to all lanes of the `m128i`.
splat_m128	Splats the value to all lanes.
splat_m128d	Splats the args into both lanes of the `m128d`.
sqrt_m128	Lanewise `sqrt(a)`.
sqrt_m128_s	Low lane `sqrt(a)`, other lanes unchanged.
sqrt_m128d	Lanewise `sqrt(a)`.
sqrt_m128d_s	Low lane `sqrt(b)`, upper lane is unchanged from `a`.
sqrt_m256d	`avx` Lanewise `sqrt` on `f64` lanes.
sqrt_m256	`avx` Lanewise `sqrt` on `f64` lanes.
store_high_m128d_s	Stores the high lane value to the reference given.
store_i64_m128i_s	Stores the value to the reference given.
store_m128	Stores the value to the reference given.
store_m128_s	Stores the low lane value to the reference given.
store_m128d	Stores the value to the reference given.
store_m128d_s	Stores the low lane value to the reference given.
store_m128i	Stores the value to the reference given.
store_m256d	`avx` Store data from a register into memory.
store_m256	`avx` Store data from a register into memory.
store_m256i	`avx` Store data from a register into memory.
store_masked_m128d	`avx` Store data from a register into memory according to a mask.
store_masked_m128	`avx` Store data from a register into memory according to a mask.
store_masked_m256d	`avx` Store data from a register into memory according to a mask.
store_masked_m256	`avx` Store data from a register into memory according to a mask.
store_reverse_m128	Stores the value to the reference given in reverse order.
store_reversed_m128d	Stores the value to the reference given.
store_splat_m128	Stores the low lane value to all lanes of the reference given.
store_splat_m128d	Stores the low lane value to all lanes of the reference given.
store_unaligned_hi_lo_m256d	`avx` Store data from a register into memory.
store_unaligned_hi_lo_m256	`avx` Store data from a register into memory.
store_unaligned_hi_lo_m256i	`avx` Store data from a register into memory.
store_unaligned_m128	Stores the value to the reference given.
store_unaligned_m128d	Stores the value to the reference given.
store_unaligned_m128i	Stores the value to the reference given.
store_unaligned_m256d	`avx` Store data from a register into memory.
store_unaligned_m256	`avx` Store data from a register into memory.
store_unaligned_m256i	`avx` Store data from a register into memory.
sub_horizontal_i16_m128i	Subtract horizontal pairs of `i16` values, pack the outputs as `a` then `b`.
sub_horizontal_i32_m128i	Subtract horizontal pairs of `i32` values, pack the outputs as `a` then `b`.
sub_horizontal_m128d	Subtract each lane horizontally, pack the outputs as `a` then `b`.
sub_horizontal_m128	Subtract each lane horizontally, pack the outputs as `a` then `b`.
sub_horizontal_m256d	`avx` Subtract adjacent `f64` lanes.
sub_horizontal_m256	`avx` Subtract adjacent `f32` lanes.
sub_horizontal_saturating_i16_m128i	Subtract horizontal pairs of `i16` values, saturating, pack the outputs as `a` then `b`.
sub_i16_m128i	Lanewise `a - b` with lanes as `i16`.
sub_i32_m128i	Lanewise `a - b` with lanes as `i32`.
sub_i64_m128i	Lanewise `a - b` with lanes as `i64`.
sub_i8_m128i	Lanewise `a - b` with lanes as `i8`.
sub_m128	Lanewise `a - b`.
sub_m128_s	Low lane `a - b`, other lanes unchanged.
sub_m128d	Lanewise `a - b`.
sub_m128d_s	Lowest lane `a - b`, high lane unchanged.
sub_m256d	`avx` Lanewise `a - b` with `f64` lanes.
sub_m256	`avx` Lanewise `a - b` with `f32` lanes.
sub_saturating_i16_m128i	Lanewise saturating `a - b` with lanes as `i16`.
sub_saturating_i8_m128i	Lanewise saturating `a - b` with lanes as `i8`.
sub_saturating_u16_m128i	Lanewise saturating `a - b` with lanes as `u16`.
sub_saturating_u8_m128i	Lanewise saturating `a - b` with lanes as `u8`.
sum_of_u8_abs_diff_m128i	Compute "sum of `u8` absolute differences".
test_all_ones_m128i	Tests if all bits are 1.
test_all_zeroes_m128i	Returns if all masked bits are 0, `(a & mask) as u128 == 0`
test_mixed_ones_and_zeroes_m128i	Returns if, among the masked bits, there's both 0s and 1s
trailing_zero_count_u32	`bmi1` Counts the number of trailing zero bits in a `u32`.
trailing_zero_count_u64	`bmi1` Counts the number of trailing zero bits in a `u64`.
transpose_four_m128	Transpose four `m128` as if they were a 4x4 matrix.
truncate_m128_to_m128i	Truncate the `f32` lanes to `i32` lanes.
truncate_m128d_to_m128i	Truncate the `f64` lanes to the lower `i32` lanes (upper `i32` lanes 0).
truncate_to_i32_m128d_s	Truncate the lower lane into an `i32`.
truncate_to_i64_m128d_s	Truncate the lower lane into an `i64`.
unpack_hi_m256d	`avx` Unpack and interleave the high lanes.
unpack_hi_m256	`avx` Unpack and interleave the high lanes.
unpack_high_i16_m128i	Unpack and interleave high `i16` lanes of `a` and `b`.
unpack_high_i32_m128i	Unpack and interleave high `i32` lanes of `a` and `b`.
unpack_high_i64_m128i	Unpack and interleave high `i64` lanes of `a` and `b`.
unpack_high_i8_m128i	Unpack and interleave high `i8` lanes of `a` and `b`.
unpack_high_m128	Unpack and interleave high lanes of `a` and `b`.
unpack_high_m128d	Unpack and interleave high lanes of `a` and `b`.
unpack_lo_m256d	`avx` Unpack and interleave the high lanes.
unpack_lo_m256	`avx` Unpack and interleave the high lanes.
unpack_low_i16_m128i	Unpack and interleave low `i16` lanes of `a` and `b`.
unpack_low_i32_m128i	Unpack and interleave low `i32` lanes of `a` and `b`.
unpack_low_i64_m128i	Unpack and interleave low `i64` lanes of `a` and `b`.
unpack_low_i8_m128i	Unpack and interleave low `i8` lanes of `a` and `b`.
unpack_low_m128	Unpack and interleave low lanes of `a` and `b`.
unpack_low_m128d	Unpack and interleave low lanes of `a` and `b`.
xor_m128	Bitwise `a ^ b`.
xor_m128d	Bitwise `a ^ b`.
xor_m128i	Bitwise `a ^ b`.
xor_m256d	`avx` Bitwise `a ^ b`.
xor_m256	`avx` Bitwise `a ^ b`.
zero_extend_m128d	`avx` Zero extend an `m128d` to `m256d`
zero_extend_m128	`avx` Zero extend an `m128` to `m256`
zero_extend_m128i	`avx` Zero extend an `m128i` to `m256i`
zeroed_m128	All lanes zero.
zeroed_m128i	All lanes zero.
zeroed_m128d	Both lanes zero.
zeroed_m256d	`avx` A zeroed `m256d`
zeroed_m256	`avx` A zeroed `m256`
zeroed_m256i	`avx` A zeroed `m256i`