[][src]Crate safe_arch

A crate that safely exposes arch intrinsics via #[cfg()].

safe_arch lets you safely use CPU intrinsics. Those things in the core::arch modules. It works purely via #[cfg()] and compile time CPU feature declaration. If you want to check for a feature at runtime and then call an intrinsic or use a fallback path based on that then this crate is sadly not for you.

SIMD register types are "newtype'd" so that better trait impls can be given to them, but the inner value is a pub field so feel to just grab it out if you need to. Trait impls of the newtypes include: Default (zeroed), From/Into of appropriate data types, and appropriate operator overloading.

  • Most intrinsics (like addition and multiplication) are totally safe to use as long as the CPU feature is available. In this case, what you get is 1:1 with the actual intrinsic.
  • Some intrinsics take a pointer of an assumed minimum alignment and validity span. For these, the safe_arch function takes a reference of an appropriate type to uphold safety.
    • Try the bytemuck crate (and turn on the bytemuck feature of this crate) if you want help safely casting between reference types.
  • Some intrinsics are not safe unless you're very careful about how you use them, such as the streaming operations requiring you to use them in combination with an appropriate memory fence. Those operations aren't exposed here.
  • Some intrinsics mess with the processor state, such as changing the floating point flags, saving and loading special register state, and so on. LLVM doesn't really support you messing with that within a high level language, so those operations aren't exposed here. Use assembly or something if you want to do that.

Naming Conventions

The actual names for each intrinsic are generally a flaming dumpster of letters that only make sense after you've learned all the names. They're very bad for learning what things do. Accordingly, safe_arch uses very verbose naming that (hopefully) improves the new-user experience.

  • Function names start with the primary "verb" of the operation, and then any adverbs go after that. This makes for slightly awkward English but helps the list of all the functions sort a little better.
    • Eg: add_i32_m128i and add_i16_saturating_m128i
  • Function names end with the register type they're most associated with.
    • Eg: and_m128 (for m128) and and_m128d (for m128d)
  • If a function operates on just the lowest data lane it generally has _s after the register type, because it's a "scalar" operation. The higher lanes are generally just copied forward, or taken from a secondary argument, or something. Details vary.
    • Eg: sqrt_m128 (all lanes) and sqrt_m128_s (low lane only)

Of course, people can't even always agree on what words mean. The common verb names for this crate, and their conventions, are as follows:

  • load: Reads memory into a register (deref &Foo to Foo).
  • store: Writes a register to memory (writes Foo to a &mut Foo).
  • set: Packs values into a register (works like [1, 2, 3, 4] to build an array).
  • splat: Copy a value as many times as possible across the bits of a register (works like [1_i32; LEN] array building).
  • extract: Get an individual lane out of a SIMD register (works like array access). The lane to get has to be a const value.
  • insert: Duplicate a register and then replace the value of a specific lane (works like let mut a2 = a.clone(); a2[i] = new;). The lane to overwrite has to be a const value.
  • cast: change data types while preserving the bit pattern (like how transmute would do it).
  • convert: change data types while trying to preserve the numeric value (which might change the bits, like how as would do it).

This crate is pre-1.0 and if you feel that an operation should have a better name to improve the crate's consistency please file an issue.

Current Support

  • Intel (x86 / x86_64)
    • 128-bit: sse, sse2, sse3, ssse3, sse4.1, sse4.2
    • 256-bit: avx
    • Other: adx, aes, bmi1, bmi2, lzcnt, pclmulqdq, popcnt, rdrand, rdseed

Compile Time CPU Target Features

At the time of me writing this, Rust enables the sse and sse2 CPU features by default for all i686 (x86) and x86_64 builds. Those CPU features are built into the design of x86_64, and you'd need a super old x86 CPU for it to not support at least sse and sse2, so they're a safe bet for the language to enable all the time. In fact, because the standard library is compiled with them enabled, simply trying to disable those features would actually cause ABI issues and fill your program with UB (link).

If you want additional CPU features available at compile time you'll have to enable them with an additional arg to rustc. For a feature named name you pass -C target-feature=+name, such as -C target-feature=+sse3 for sse3.

You can alternately enable all target features of the current CPU with -C target-cpu=native. This is primarily of use if you're building a program you'll only run on your own system.

It's sometimes hard to know if your target platform will support a given feature set, but the Steam Hardware Survey is generally taken as a guide to what you can expect people to have available. If you click "Other Settings" it'll expand into a list of CPU target features and how common they are. These days, it seems that sse3 can be safely assumed, and ssse3, sse4.1, and sse4.2 are pretty safe bets as well. The stuff above 128-bit isn't as common yet, give it another few years.

Please note that executing a program on a CPU that doesn't support the target features it was compiles for is Undefined Behavior.

Currently, Rust doesn't actually support an easy way for you to check that a feature enabled at compile time is actually available at runtime. There is the "feature_detected" family of macros, but if you enable a feature they will evaluate to a constant true instead of actually deferring the check for the feature to runtime. This means that, if you did want a check at the start of your program, to confirm that all the assumed features are present and error out when the assumptions don't hold, you can't use that macro. You gotta use CPUID and check manually. rip. Hopefully we can make that process easier in a future version of this crate.

A Note On Working With Cfg

There's two main ways to use cfg:

  • Via an attribute placed on an item, block, or expression:
    • #[cfg(debug_assertions)] println!("hello");
  • Via a macro used within an expression position:
    • if cfg!(debug_assertions) { println!("hello"); }

The difference might seem small but it's actually very important:

  • The attribute form will include code or not before deciding if all the items named and so forth really exist or not. This means that code that is configured via attribute can safely name things that don't always exist as long as the things they name do exist whenever that code is configured into the build.
  • The macro form will include the configured code no matter what, and then the macro resolves to a constant true or false and the compiler uses dead code elimination to cut out the path not taken.

This crate uses cfg via the attribute, so the functions it exposes don't exist at all when the appropriate CPU target features aren't enabled. Accordingly, if you plan to call this crate or not depending on what features are enabled in the build you'll also need to control your use of this crate via cfg attribute, not cfg macro.

Macros

aes_key_gen_assist_m128iaes

?

blend_immediate_i16_m128i

Blends the i16 lanes according to the immediate mask.

blend_immediate_m128d

Blends the lanes according to the immediate mask.

blend_immediate_m128

Blends the lanes according to the immediate mask.

blend_immediate_m256davx

Blends the f64 lanes according to the immediate mask.

blend_immediate_m256avx

Blends the f32 lanes according to the immediate mask.

byte_shift_left_u128_immediate_m128i

Shifts all bits in the entire register left by a number of bytes.

byte_shift_right_u128_immediate_m128i

Shifts all bits in the entire register right by a number of bytes.

cmp_op_mask_m128avx

Compare f32 lanes according to the operation specified, mask output.

cmp_op_mask_m128_savx

Compare f32 lanes according to the operation specified, mask output.

cmp_op_mask_m128davx

Compare f64 lanes according to the operation specified, mask output.

cmp_op_mask_m128d_savx

Compare f64 lanes according to the operation specified, mask output.

cmp_op_mask_m256avx

Compare f32 lanes according to the operation specified, mask output.

cmp_op_mask_m256davx

Compare f64 lanes according to the operation specified, mask output.

combined_byte_shift_right_immediate_m128i

Counts $a as the high bytes and $b as the low bytes then performs a byte shift to the right by the immediate value.

comparison_operator_translationavx

Turns a comparison operator token to the correct constant value.

dot_product_m128d

Performs a dot product of two m128d registers.

dot_product_m128

Performs a dot product of two m128 registers.

dot_product_m256avx

This works like dot_product_m128, but twice as wide.

extract_f32_as_i32_bits_immediate_m128

Gets the f32 lane requested. Returns as an i32 bit pattern.

extract_i16_as_i32_m128i

Gets an i16 value out of an m128i, returns as i32.

extract_i32_from_m256iavx

Extracts an i32 lane from m256i

extract_i32_immediate_m128i

Gets the i32 lane requested. Only the lowest 2 bits are considered.

extract_i64_from_m256iavx

Extracts an i64 lane from m256i

extract_i64_immediate_m128i

Gets the i64 lane requested. Only the lowest bit is considered.

extract_i8_as_i32_immediate_m128i

Gets the i8 lane requested. Only the lowest 4 bits are considered.

extract_m128_from_m256avx

Extracts an m128 from m256

extract_m128d_from_m256davx

Extracts an m128d from m256d

extract_m128i_from_m256iavx

Extracts an m128i from m256i

insert_f32_immediate_m128

Inserts a lane from $b into $a, optionally at a new position.

insert_i16_from_i32_m128i

Inserts the low 16 bits of an i32 value into an m128i.

insert_i16_to_m256iavx

Inserts an i16 to m256i

insert_i32_immediate_m128i

Inserts a new value for the i32 lane specified.

insert_i32_to_m256iavx

Inserts an i32 to m256i

insert_i64_immediate_m128i

Inserts a new value for the i64 lane specified.

insert_i64_to_m256iavx

Inserts an i64 to m256i

insert_i8_immediate_m128i

Inserts a new value for the i64 lane specified.

insert_i8_to_m256iavx

Inserts an i8 to m256i

insert_m128_to_m256avx

Inserts an m128 to m256

insert_m128d_to_m256davx

Inserts an m128d to m256d

insert_m128i_to_m256iavx

Inserts an m128i to m256i

mul_i64_carryless_m128ipclmulqdq

Performs a "carryless" multiplication of two i64 values.

multi_packed_sum_abs_diff_u8_m128i

Computes eight u16 "sum of absolute difference" values according to the bytes selected.

permute_f128_in_m256davx

Permutes the lanes around.

permute_f128_in_m256avx

Permutes the lanes around.

permute_i128_in_m256iavx

Permutes the lanes around.

permute_m128davx

Permutes the lanes around.

permute_m128avx

Permutes the lanes around.

permute_m256davx

Permutes the lanes around.

permute_m256avx

Permutes the lanes around.

round_m128d

Rounds each lane in the style specified.

round_m128d_s

Rounds $b low as specified, keeps $a high.

round_m128

Rounds each lane in the style specified.

round_m128_s

Rounds $b low as specified, other lanes use $a.

round_m256davx

Rounds each lane in the style specified.

round_m256avx

Rounds each lane in the style specified.

shift_left_i16_immediate_m128i

Shifts all i16 lanes left by an immediate.

shift_left_i32_immediate_m128i

Shifts all i32 lanes left by an immediate.

shift_left_i64_immediate_m128i

Shifts both i64 lanes left by an immediate.

shift_right_i16_immediate_m128i

Shifts all i16 lanes right by an immediate.

shift_right_i32_immediate_m128i

Shifts all i32 lanes right by an immediate.

shift_right_u16_immediate_m128i

Shifts all u16 lanes right by an immediate.

shift_right_u32_immediate_m128i

Shifts all u32 lanes right by an immediate.

shift_right_u64_immediate_m128i

Shifts both u64 lanes right by an immediate.

shuffle_i16_high_lanes_m128i

Shuffles the higher i16 lanes, low lanes unaffected.

shuffle_i16_low_lanes_m128i

Shuffles the lower i16 lanes, high lanes unaffected.

shuffle_i32_m128i

Shuffles the i32 lanes around.

shuffle_m128

Shuffles the lanes around.

shuffle_m128d

Shuffles the lanes around.

shuffle_m256davx

Shuffles the f64 lanes around.

shuffle_m256avx

Shuffles the f32 lanes around.

string_search_for_indexsse4.1

Looks for $needle in $haystack and gives the index of the either the first or last match.

string_search_for_masksse4.1

Looks for $needle in $haystack and gives the mask of where the matches were.

Structs

m128

The data for a 128-bit SSE register of four f32 lanes.

m128d

The data for a 128-bit SSE register of two f64 values.

m128i

The data for a 128-bit SSE register of integer data.

m256

The data for a 256-bit SSE register of eight f32 lanes.

m256d

The data for a 256-bit SSE register of four f64 values.

m256i

The data for a 256-bit SSE register of integer data.

Functions

abs_i16_m128i

Lanewise absolute value with lanes as i16.

abs_i32_m128i

Lanewise absolute value with lanes as i32.

abs_i8_m128i

Lanewise absolute value with lanes as i8.

add_carry_u32adx

Add two u32 with a carry value.

add_carry_u64adx

Add two u64 with a carry value.

add_horizontal_i16_m128i

Add horizontal pairs of i16 values, pack the outputs as a then b.

add_horizontal_i32_m128i

Add horizontal pairs of i32 values, pack the outputs as a then b.

add_horizontal_m128d

Add each lane horizontally, pack the outputs as a then b.

add_horizontal_m128

Add each lane horizontally, pack the outputs as a then b.

add_horizontal_m256davx

Add adjacent f64 lanes.

add_horizontal_m256avx

Add adjacent f32 lanes.

add_horizontal_saturating_i16_m128i

Add horizontal pairs of i16 values, saturating, pack the outputs as a then b.

add_i16_m128i

Lanewise a + b with lanes as i16.

add_i32_m128i

Lanewise a + b with lanes as i32.

add_i64_m128i

Lanewise a + b with lanes as i64.

add_i8_m128i

Lanewise a + b with lanes as i8.

add_m128sse

Lanewise a + b.

add_m128_s

Low lane a + b, other lanes unchanged.

add_m128d

Lanewise a + b.

add_m128d_s

Lowest lane a + b, high lane unchanged.

add_m256davx

Lanewise a + b with f64 lanes.

add_m256avx

Lanewise a + b with f32 lanes.

add_saturating_i16_m128i

Lanewise saturating a + b with lanes as i16.

add_saturating_i8_m128i

Lanewise saturating a + b with lanes as i8.

add_saturating_u16_m128i

Lanewise saturating a + b with lanes as u16.

add_saturating_u8_m128i

Lanewise saturating a + b with lanes as u8.

add_sub_m128d

Add the high lane and subtract the low lane.

add_sub_m128

Alternately, from the top, add a lane and then subtract a lane.

add_sub_m256davx

Alternately, from the top, add f64 then sub f64.

add_sub_m256avx

Alternately, from the top, add f32 then sub f32.

aes_decrypt_last_m128iaes

Perform the last round of AES decryption flow on a using the round_key.

aes_decrypt_m128iaes

Perform one round of AES decryption flow on a using the round_key.

aes_encrypt_last_m128iaes

Perform the last round of AES encryption flow on a using the round_key.

aes_encrypt_m128iaes

Perform one round of AES encryption flow on a using the round_key.

aes_inv_mix_columns_m128iaes

Perform the InvMixColumns transform on a.

and_m128

Bitwise a & b.

and_m128d

Bitwise a & b.

and_m128i

Bitwise a & b.

and_m256davx

Bitwise a & b.

and_m256avx

Bitwise a & b.

andnot_m128

Bitwise (!a) & b.

andnot_m128d

Bitwise (!a) & b.

andnot_m128i

Bitwise (!a) & b.

andnot_m256davx

Bitwise (!a) & b.

andnot_m256avx

Bitwise (!a) & b.

andnot_u32bmi1

Bitwise (!a) & b, u32

andnot_u64bmi1

Bitwise (!a) & b, u64

average_u16_m128i

Lanewise average of the u16 values.

average_u8_m128i

Lanewise average of the u8 values.

bit_extract2_u32bmi1

Extract a span of bits from the u32, control value style.

bit_extract2_u64bmi1

Extract a span of bits from the u64, control value style.

bit_extract_u32bmi1

Extract a span of bits from the u32, start and len style.

bit_extract_u64bmi1

Extract a span of bits from the u64, start and len style.

bit_lowest_set_mask_u32bmi1

Gets the mask of all bits up to and including the lowest set bit in a u32.

bit_lowest_set_mask_u64bmi1

Gets the mask of all bits up to and including the lowest set bit in a u64.

bit_lowest_set_reset_u32bmi1

Resets (clears) the lowest set bit.

bit_lowest_set_reset_u64bmi1

Resets (clears) the lowest set bit.

bit_lowest_set_value_u32bmi1

Gets the value of the lowest set bit in a u32.

bit_lowest_set_value_u64bmi1

Gets the value of the lowest set bit in a u64.

bit_zero_high_index_u32bmi2

Zero out all high bits in a u32 starting at the index given.

bit_zero_high_index_u64bmi2

Zero out all high bits in a u64 starting at the index given.

blend_varying_i8_m128i

Blend the i8 lanes according to a runtime varying mask.

blend_varying_m128d

Blend the lanes according to a runtime varying mask.

blend_varying_m128

Blend the lanes according to a runtime varying mask.

blend_varying_m256davx

Blend the lanes according to a runtime varying mask.

blend_varying_m256avx

Blend the lanes according to a runtime varying mask.

cast_from_m256_to_m256davx

Bit-preserving cast from m256 to m256i.

cast_from_m256_to_m256iavx

Bit-preserving cast from m256 to m256i.

cast_from_m256d_to_m256avx

Bit-preserving cast from m256d to m256.

cast_from_m256d_to_m256iavx

Bit-preserving cast from m256d to m256i.

cast_from_m256i_to_m256davx

Bit-preserving cast from m256i to m256d.

cast_from_m256i_to_m256avx

Bit-preserving cast from m256i to m256.

cast_to_m128_from_m128d

Bit-preserving cast to m128 from m128d

cast_to_m128_from_m128i

Bit-preserving cast to m128 from m128i

cast_to_m128d_from_m128

Bit-preserving cast to m128d from m128

cast_to_m128d_from_m128i

Bit-preserving cast to m128d from m128i

cast_to_m128i_from_m128d

Bit-preserving cast to m128i from m128d

cast_to_m128i_from_m128

Bit-preserving cast to m128i from m128

ceil_m128d

Round each lane to a whole number, towards positive infinity

ceil_m128

Round each lane to a whole number, towards positive infinity

ceil_m128d_s

Round the low lane of b toward positive infinity, high lane is a.

ceil_m128_s

Round the low lane of b toward positive infinity, other lanes a.

ceil_m256davx

Round f64 lanes towards positive infinity.

ceil_m256avx

Round f32 lanes towards positive infinity.

cmp_eq_i32_m128_s

Low lane equality.

cmp_eq_i32_m128d_s

Low lane f64 equal to.

cmp_eq_mask_i16_m128i

Lanewise a == b with lanes as i16.

cmp_eq_mask_i32_m128i

Lanewise a == b with lanes as i32.

cmp_eq_mask_i64_m128i

Lanewise a == b with lanes as i64.

cmp_eq_mask_i8_m128i

Lanewise a == b with lanes as i8.

cmp_eq_mask_m128

Lanewise a == b.

cmp_eq_mask_m128_s

Low lane a == b, other lanes unchanged.

cmp_eq_mask_m128d

Lanewise a == b, mask output.

cmp_eq_mask_m128d_s

Low lane a == b, other lanes unchanged.

cmp_ge_i32_m128_s

Low lane greater than or equal to.

cmp_ge_i32_m128d_s

Low lane f64 greater than or equal to.

cmp_ge_mask_m128

Lanewise a >= b.

cmp_ge_mask_m128_s

Low lane a >= b, other lanes unchanged.

cmp_ge_mask_m128d

Lanewise a >= b.

cmp_ge_mask_m128d_s

Low lane a >= b, other lanes unchanged.

cmp_gt_i32_m128_s

Low lane greater than.

cmp_gt_i32_m128d_s

Low lane f64 greater than.

cmp_gt_mask_i16_m128i

Lanewise a > b with lanes as i16.

cmp_gt_mask_i32_m128i

Lanewise a > b with lanes as i32.

cmp_gt_mask_i64_m128isse4.1

Lanewise a > b with lanes as i64.

cmp_gt_mask_i8_m128i

Lanewise a > b with lanes as i8.

cmp_gt_mask_m128

Lanewise a > b.

cmp_gt_mask_m128_s

Low lane a > b, other lanes unchanged.

cmp_gt_mask_m128d

Lanewise a > b.

cmp_gt_mask_m128d_s

Low lane a > b, other lanes unchanged.

cmp_le_i32_m128_s

Low lane less than or equal to.

cmp_le_i32_m128d_s

Low lane f64 less than or equal to.

cmp_le_mask_m128

Lanewise a <= b.

cmp_le_mask_m128_s

Low lane a <= b, other lanes unchanged.

cmp_le_mask_m128d

Lanewise a <= b.

cmp_le_mask_m128d_s

Low lane a <= b, other lanes unchanged.

cmp_lt_i32_m128_s

Low lane less than.

cmp_lt_i32_m128d_s

Low lane f64 less than.

cmp_lt_mask_i16_m128i

Lanewise a < b with lanes as i16.

cmp_lt_mask_i32_m128i

Lanewise a < b with lanes as i32.

cmp_lt_mask_i8_m128i

Lanewise a < b with lanes as i8.

cmp_lt_mask_m128

Lanewise a < b.

cmp_lt_mask_m128_s

Low lane a < b, other lanes unchanged.

cmp_lt_mask_m128d

Lanewise a < b.

cmp_lt_mask_m128d_s

Low lane a < b, other lane unchanged.

cmp_neq_i32_m128_s

Low lane not equal to.

cmp_neq_i32_m128d_s

Low lane f64 less than.

cmp_neq_mask_m128

Lanewise a != b.

cmp_neq_mask_m128_s

Low lane a != b, other lanes unchanged.

cmp_neq_mask_m128d

Lanewise a != b.

cmp_neq_mask_m128d_s

Low lane a != b, other lane unchanged.

cmp_nge_mask_m128

Lanewise !(a >= b).

cmp_nge_mask_m128_s

Low lane !(a >= b), other lanes unchanged.

cmp_nge_mask_m128d

Lanewise !(a >= b).

cmp_nge_mask_m128d_s

Low lane !(a >= b), other lane unchanged.

cmp_ngt_mask_m128

Lanewise !(a > b).

cmp_ngt_mask_m128_s

Low lane !(a > b), other lanes unchanged.

cmp_ngt_mask_m128d

Lanewise !(a > b).

cmp_ngt_mask_m128d_s

Low lane !(a > b), other lane unchanged.

cmp_nle_mask_m128

Lanewise !(a <= b).

cmp_nle_mask_m128_s

Low lane !(a <= b), other lanes unchanged.

cmp_nle_mask_m128d

Lanewise !(a <= b).

cmp_nle_mask_m128d_s

Low lane !(a <= b), other lane unchanged.

cmp_nlt_mask_m128

Lanewise !(a < b).

cmp_nlt_mask_m128_s

Low lane !(a < b), other lanes unchanged.

cmp_nlt_mask_m128d

Lanewise !(a < b).

cmp_nlt_mask_m128d_s

Low lane !(a < b), other lane unchanged.

cmp_ord_mask_m128

Lanewise (!a.is_nan()) & (!b.is_nan()).

cmp_ord_mask_m128_s

Low lane (!a.is_nan()) & (!b.is_nan()), other lanes unchanged.

cmp_ord_mask_m128d

Lanewise (!a.is_nan()) & (!b.is_nan()).

cmp_ord_mask_m128d_s

Low lane (!a.is_nan()) & (!b.is_nan()), other lane unchanged.

cmp_unord_mask_m128

Lanewise a.is_nan() | b.is_nan().

cmp_unord_mask_m128_s

Low lane a.is_nan() | b.is_nan(), other lanes unchanged.

cmp_unord_mask_m128d

Lanewise a.is_nan() | b.is_nan().

cmp_unord_mask_m128d_s

Low lane a.is_nan() | b.is_nan(), other lane unchanged.

convert_i16_lower2_to_i64_m128i

Convert the lower two i16 lanes to two i32 lanes.

convert_i16_lower4_to_i32_m128i

Convert the lower four i16 lanes to four i32 lanes.

convert_i32_lower2_to_i64_m128i

Convert the lower two i32 lanes to two i64 lanes.

convert_i32_replace_m128_s

Convert i32 to f32 and replace the low lane of the input.

convert_i32_replace_m128d_s

Convert i32 to f64 and replace the low lane of the input.

convert_i64_replace_m128d_s

Convert i64 to f64 and replace the low lane of the input.

convert_i8_lower2_to_i64_m128i

Convert the lower two i8 lanes to two i64 lanes.

convert_i8_lower4_to_i32_m128i

Convert the lower four i8 lanes to four i32 lanes.

convert_i8_lower8_to_i16_m128i

Convert the lower eight i8 lanes to eight i16 lanes.

convert_m128_s_replace_m128d_s

Converts the lower f32 to f64 and replace the low lane of the input

convert_m128d_s_replace_m128_s

Converts the low f64 to f32 and replaces the low lane of the input.

convert_to_f32_from_m256_savx

Convert the lowest f64 lane to a single f64.

convert_to_f64_from_m256d_savx

Convert the lowest f64 lane to a single f64.

convert_to_i32_from_m256i_savx

Convert the lowest f64 lane to a single f64.

convert_to_i32_m128i_from_m256davx

Convert f64 lanes to i32 lanes.

convert_to_i32_m256i_from_m256avx

Convert f32 lanes to i32 lanes.

convert_to_m128_from_m128i

Rounds the four i32 lanes to four f32 lanes.

convert_to_m128_from_m128d

Rounds the two f64 lanes to the low two f32 lanes.

convert_to_m128_from_m256davx

Convert f64 lanes to be f32 lanes.

convert_to_m128d_from_m128i

Rounds the lower two i32 lanes to two f64 lanes.

convert_to_m128d_from_m128

Rounds the two f64 lanes to the low two f32 lanes.

convert_to_m128i_from_m128d

Rounds the two f64 lanes to the low two i32 lanes.

convert_to_m128i_from_m128

Rounds the two f64 lanes to the low two i32 lanes.

convert_to_m128i_from_m256davx

Convert f64 lanes to be i32 lanes.

convert_to_m256_from_i32_m256iavx

Convert i32 lanes to be f32 lanes.

convert_to_m256d_from_i32_m128iavx

Convert i32 lanes to be f64 lanes.

convert_to_m256d_from_m128avx

Convert f32 lanes to be f64 lanes.

convert_to_m256i_from_m256avx

Convert f32 lanes to be i32 lanes.

convert_u16_lower2_to_u64_m128i

Convert the lower two u16 lanes to two u64 lanes.

convert_u16_lower4_to_u32_m128i

Convert the lower four u16 lanes to four u32 lanes.

convert_u32_lower2_to_u64_m128i

Convert the lower two u32 lanes to two u64 lanes.

convert_u8_lower2_to_u64_m128i

Convert the lower two u8 lanes to two u64 lanes.

convert_u8_lower4_to_u32_m128i

Convert the lower four u8 lanes to four u32 lanes.

convert_u8_lower8_to_u16_m128i

Convert the lower eight u8 lanes to eight u16 lanes.

copy_i64_m128i_s

Copy the low i64 lane to a new register, upper bits 0.

copy_replace_low_f64_m128d

Copies the a value and replaces the low lane with the low b value.

crc32_u8sse4.1

Accumulates the u8 into a running CRC32 value.

crc32_u16sse4.1

Accumulates the u16 into a running CRC32 value.

crc32_u32sse4.1

Accumulates the u32 into a running CRC32 value.

crc32_u64sse4.1

Accumulates the u64 into a running CRC32 value.

div_m128

Lanewise a / b.

div_m128_s

Low lane a / b, other lanes unchanged.

div_m128d

Lanewise a / b.

div_m128d_s

Lowest lane a / b, high lane unchanged.

div_m256davx

Lanewise a / b with f64.

div_m256avx

Lanewise a / b with f32.

duplicate_even_lanes_m128

Duplicate the odd lanes to the even lanes.

duplicate_even_lanes_m256avx

Duplicate the even-indexed lanes to the odd lanes.

duplicate_low_lane_m128d_s

Copy the low lane of the input to both lanes of the output.

duplicate_odd_lanes_m128

Duplicate the odd lanes to the even lanes.

duplicate_odd_lanes_m256davx

Duplicate the odd-indexed lanes to the even lanes.

duplicate_odd_lanes_m256avx

Duplicate the odd-indexed lanes to the even lanes.

floor_m128d

Round each lane to a whole number, towards negative infinity

floor_m128

Round each lane to a whole number, towards negative infinity

floor_m128d_s

Round the low lane of b toward negative infinity, high lane is a.

floor_m128_s

Round the low lane of b toward negative infinity, other lanes a.

floor_m256davx

Round f64 lanes towards negative infinity.

floor_m256avx

Round f32 lanes towards negative infinity.

get_f32_from_m128_s

Gets the low lane as an individual f32 value.

get_f64_from_m128d_s

Gets the lower lane as an f64 value.

get_i32_from_m128_s

Converts the low lane to i32 and extracts as an individual value.

get_i32_from_m128d_s

Converts the lower lane to an i32 value.

get_i32_from_m128i_s

Converts the lower lane to an i32 value.

get_i64_from_m128d_s

Converts the lower lane to an i64 value.

get_i64_from_m128i_s

Converts the lower lane to an i64 value.

leading_zero_count_u32lzcnt

Count the leading zeroes in a u32.

leading_zero_count_u64lzcnt

Count the leading zeroes in a u64.

load_f32_m128_s

Loads the f32 reference into the low lane of the register.

load_f32_splat_m128

Loads the f32 reference into all lanes of a register.

load_f32_splat_m256avx

Load an f32 and splat it to all lanes of an m256d

load_f64_m128d_s

Loads the reference into the low lane of the register.

load_f64_splat_m128d

Loads the f64 reference into all lanes of a register.

load_f64_splat_m256davx

Load an f64 and splat it to all lanes of an m256d

load_i64_m128i_s

Loads the low i64 into a register.

load_m128

Loads the reference into a register.

load_m128d

Loads the reference into a register.

load_m128i

Loads the reference into a register.

load_m256davx

Load data from memory into a register.

load_m256avx

Load data from memory into a register.

load_m256iavx

Load data from memory into a register.

load_m128_splat_m256avx

Load an m128 and splat it to the lower and upper half of an m256

load_m128d_splat_m256davx

Load an m128d and splat it to the lower and upper half of an m256d

load_masked_m128davx

Load data from memory into a register according to a mask.

load_masked_m128avx

Load data from memory into a register according to a mask.

load_masked_m256davx

Load data from memory into a register according to a mask.

load_masked_m256avx

Load data from memory into a register according to a mask.

load_replace_high_m128d

Loads the reference into a register, replacing the high lane.

load_replace_low_m128d

Loads the reference into a register, replacing the low lane.

load_reverse_m128

Loads the reference into a register with reversed order.

load_reverse_m128d

Loads the reference into a register with reversed order.

load_unaligned_hi_lo_m256davx

Load data from memory into a register.

load_unaligned_hi_lo_m256avx

Load data from memory into a register.

load_unaligned_hi_lo_m256iavx

Load data from memory into a register.

load_unaligned_m128

Loads the reference into a register.

load_unaligned_m128d

Loads the reference into a register.

load_unaligned_m128i

Loads the reference into a register.

load_unaligned_m256davx

Load data from memory into a register.

load_unaligned_m256avx

Load data from memory into a register.

load_unaligned_m256iavx

Load data from memory into a register.

max_i16_m128i

Lanewise max(a, b) with lanes as i16.

max_i32_m128i

Lanewise max(a, b) with lanes as i32.

max_i8_m128i

Lanewise max(a, b) with lanes as i8.

max_m128

Lanewise max(a, b).

max_m128_s

Low lane max(a, b), other lanes unchanged.

max_m128d

Lanewise max(a, b).

max_m128d_s

Low lane max(a, b), other lanes unchanged.

max_m256davx

Lanewise max(a, b).

max_m256avx

Lanewise max(a, b).

max_u16_m128i

Lanewise max(a, b) with lanes as u16.

max_u32_m128i

Lanewise max(a, b) with lanes as u32.

max_u8_m128i

Lanewise max(a, b) with lanes as u8.

min_i16_m128i

Lanewise min(a, b) with lanes as i16.

min_i32_m128i

Lanewise min(a, b) with lanes as i32.

min_i8_m128i

Lanewise min(a, b) with lanes as i8.

min_m128

Lanewise min(a, b).

min_m128_s

Low lane min(a, b), other lanes unchanged.

min_m128d

Lanewise min(a, b).

min_m128d_s

Low lane min(a, b), other lanes unchanged.

min_m256davx

Lanewise min(a, b).

min_m256avx

Lanewise min(a, b).

min_position_u16_m128i

Min u16 value, position, and other lanes zeroed.

min_u16_m128i

Lanewise min(a, b) with lanes as u16.

min_u32_m128i

Lanewise min(a, b) with lanes as u32.

min_u8_m128i

Lanewise min(a, b) with lanes as u8.

move_high_low_m128

Move the high lanes of b to the low lanes of a, other lanes unchanged.

move_low_high_m128

Move the low lanes of b to the high lanes of a, other lanes unchanged.

move_m128_s

Move the low lane of b to a, other lanes unchanged.

move_mask_i8_m128i

Gathers the i8 sign bit of each lane.

move_mask_m128

Gathers the sign bit of each lane.

move_mask_m128d

Gathers the sign bit of each lane.

move_mask_m256davx

Collects the sign bit of each lane into a 4-bit value.

move_mask_m256avx

Collects the sign bit of each lane into a 4-bit value.

mul_extended_u32bmi2

Multiply two u32, outputting the low bits and storing the high bits in the reference.

mul_extended_u64bmi2

Multiply two u64, outputting the low bits and storing the high bits in the reference.

mul_i16_horizontal_add_m128i

Multiply i16 lanes producing i32 values, horizontal add pairs of i32 values to produce the final output.

mul_i16_keep_high_m128i

Lanewise a * b with lanes as i16, keep the high bits of the i32 intermediates.

mul_i16_keep_low_m128i

Lanewise a * b with lanes as i16, keep the low bits of the i32 intermediates.

mul_i16_scale_round_m128i

Multiply i16 lanes into i32 intermediates, keep the high 18 bits, round by adding 1, right shift by 1.

mul_i32_keep_low_m128i

Lanewise a * b with lanes as i32, keep the low bits of the i64 intermediates.

mul_i64_widen_low_bits_m128i

Multiplies the lower 32 bits (only) of each i64 lane into 64-bit i64 values.

mul_m128

Lanewise a * b.

mul_m128_s

Low lane a * b, other lanes unchanged.

mul_m128d

Lanewise a * b.

mul_m128d_s

Lowest lane a * b, high lane unchanged.

mul_m256davx

Lanewise a * b with f64 lanes.

mul_m256avx

Lanewise a * b with f32 lanes.

mul_u16_keep_high_m128i

Lanewise a * b with lanes as u16, keep the high bits of the u32 intermediates.

mul_u64_widen_low_bits_m128i

Multiplies the lower 32 bits (only) of each u64 lane into 64-bit u64 values.

mul_u8i8_add_horizontal_saturating_m128i

This is dumb and weird.

or_m128

Bitwise a | b.

or_m128d

Bitwise a | b.

or_m128i

Bitwise a | b.

or_m256davx

Bitwise a | b.

or_m256avx

Bitwise a | b.

pack_i16_to_i8_m128i

Saturating convert i16 to i8, and pack the values.

pack_i16_to_u8_m128i

Saturating convert i16 to u8, and pack the values.

pack_i32_to_i16_m128i

Saturating convert i32 to i16, and pack the values.

pack_i32_to_u16_m128i

Saturating convert i32 to u16, and pack the values.

permute_varying_m128davx

Permute with a runtime varying pattern.

permute_varying_m128avx

Permute with a runtime varying pattern.

permute_varying_m256davx

Permute with a runtime varying pattern.

permute_varying_m256avx

Permute with a runtime varying pattern.

population_count_i32popcnt

Count the number of bits set within an i32

population_count_i64popcnt

Count the number of bits set within an i64

population_deposit_u32bmi2

Deposit contiguous low bits from a u32 according to a mask.

population_deposit_u64bmi2

Deposit contiguous low bits from a u64 according to a mask.

population_extract_u32bmi2

Extract bits from a u32 according to a mask.

population_extract_u64bmi2

Extract bits from a u64 according to a mask.

rdrand_u16rdrand

Try to obtain a random u16 from the hardware RNG.

rdrand_u32rdrand

Try to obtain a random u32 from the hardware RNG.

rdrand_u64rdrand

Try to obtain a random u64 from the hardware RNG.

rdseed_u16rdseed

Try to obtain a random u16 from the hardware RNG.

rdseed_u32rdseed

Try to obtain a random u32 from the hardware RNG.

rdseed_u64rdseed

Try to obtain a random u64 from the hardware RNG.

reciprocal_m128

Lanewise 1.0 / a approximation.

reciprocal_m128_s

Low lane 1.0 / a approximation, other lanes unchanged.

reciprocal_m256avx

Reciprocal of f32 lanes.

reciprocal_sqrt_m128

Lanewise 1.0 / sqrt(a) approximation.

reciprocal_sqrt_m128_s

Low lane 1.0 / sqrt(a) approximation, other lanes unchanged.

reciprocal_sqrt_m256avx

Reciprocal of f32 lanes.

set_i16_m128i

Sets the args into an m128i, first arg is the high lane.

set_i16_m256iavx

Set i16 args into an m256i lane.

set_i32_m128i_s

Set an i32 as the low 32-bit lane of an m128i, other lanes blank.

set_i32_m128i

Sets the args into an m128i, first arg is the high lane.

set_i32_m256iavx

Set i32 args into an m256i lane.

set_i64_m128i_s

Set an i64 as the low 64-bit lane of an m128i, other lanes blank.

set_i64_m128i

Sets the args into an m128i, first arg is the high lane.

set_i8_m128i

Sets the args into an m128i, first arg is the high lane.

set_i8_m256iavx

Set i8 args into an m256i lane.

set_m128

Sets the args into an m128, first arg is the high lane.

set_m128_s

Sets the args into an m128, first arg is the high lane.

set_m128d

Sets the args into an m128d, first arg is the high lane.

set_m128d_s

Sets the args into the low lane of a m128d.

set_m256davx

Set f64 args into an m256d lane.

set_m256avx

Set f32 args into an m256 lane.

set_m128d_m256davx

Set m128d args into an m256d.

set_m128i_m256iavx

Set m128i args into an m256i.

set_reversed_i16_m128i

Sets the args into an m128i, first arg is the low lane.

set_reversed_i16_m256iavx

Set i16 args into an m256i lane.

set_reversed_i32_m128i

Sets the args into an m128i, first arg is the low lane.

set_reversed_i32_m256iavx

Set i32 args into an m256i lane.

set_reversed_i8_m128i

Sets the args into an m128i, first arg is the low lane.

set_reversed_i8_m256iavx

Set i8 args into an m256i lane.

set_reversed_m128

Sets the args into an m128, first arg is the low lane.

set_reversed_m128d

Sets the args into an m128d, first arg is the low lane.

set_reversed_m256davx

Set f64 args into an m256d lane.

set_reversed_m256avx

Set f32 args into an m256 lane.

set_reversed_m128d_m256davx

Set m128d args into an m256d.

set_reversed_m128i_m256iavx

Set m128i args into an m256i.

set_splat_i16_m256iavx

Splat an i16 arg into an m256i lane.

set_splat_i32_m256iavx

Splat an i32 arg into an m256i lane.

set_splat_i8_m256iavx

Splat an i8 arg into an m256i lane.

set_splat_m256avx

Splat an f32 arg into an m256 lane.

shift_left_i16_m128i

Shift each i16 lane to the left by the count in the lower i64 lane.

shift_left_i32_m128i

Shift each i32 lane to the left by the count in the lower i64 lane.

shift_left_i64_m128i

Shift each i64 lane to the left by the count in the lower i64 lane.

shift_right_i16_m128i

Shift each i16 lane to the right by the count in the lower i64 lane.

shift_right_i32_m128i

Shift each i32 lane to the right by the count in the lower i64 lane.

shift_right_u16_m128i

Shift each u16 lane to the right by the count in the lower i64 lane.

shift_right_u32_m128i

Shift each u32 lane to the right by the count in the lower i64 lane.

shift_right_u64_m128i

Shift each u64 lane to the right by the count in the lower i64 lane.

shuffle_i8_m128i

Shuffles the i8 lanes according to the pattern in b.

sign_apply_i16_m128i

Applies the sign of i16 values in b to the values in a.

sign_apply_i32_m128i

Applies the sign of i32 values in b to the values in a.

sign_apply_i8_m128i

Applies the sign of i8 values in b to the values in a.

splat_i16_m128i

Splats the i16 to all lanes of the m128i.

splat_i32_m128i

Splats the i32 to all lanes of the m128i.

splat_i64_m128i

Splats the i64 to both lanes of the m128i.

splat_i8_m128i

Splats the i8 to all lanes of the m128i.

splat_m128

Splats the value to all lanes.

splat_m128d

Splats the args into both lanes of the m128d.

sqrt_m128

Lanewise sqrt(a).

sqrt_m128_s

Low lane sqrt(a), other lanes unchanged.

sqrt_m128d

Lanewise sqrt(a).

sqrt_m128d_s

Low lane sqrt(b), upper lane is unchanged from a.

sqrt_m256davx

Lanewise sqrt on f64 lanes.

sqrt_m256avx

Lanewise sqrt on f64 lanes.

store_high_m128d_s

Stores the high lane value to the reference given.

store_i64_m128i_s

Stores the value to the reference given.

store_m128

Stores the value to the reference given.

store_m128_s

Stores the low lane value to the reference given.

store_m128d

Stores the value to the reference given.

store_m128d_s

Stores the low lane value to the reference given.

store_m128i

Stores the value to the reference given.

store_m256davx

Store data from a register into memory.

store_m256avx

Store data from a register into memory.

store_m256iavx

Store data from a register into memory.

store_masked_m128davx

Store data from a register into memory according to a mask.

store_masked_m128avx

Store data from a register into memory according to a mask.

store_masked_m256davx

Store data from a register into memory according to a mask.

store_masked_m256avx

Store data from a register into memory according to a mask.

store_reverse_m128

Stores the value to the reference given in reverse order.

store_reversed_m128d

Stores the value to the reference given.

store_splat_m128

Stores the low lane value to all lanes of the reference given.

store_splat_m128d

Stores the low lane value to all lanes of the reference given.

store_unaligned_hi_lo_m256davx

Store data from a register into memory.

store_unaligned_hi_lo_m256avx

Store data from a register into memory.

store_unaligned_hi_lo_m256iavx

Store data from a register into memory.

store_unaligned_m128

Stores the value to the reference given.

store_unaligned_m128d

Stores the value to the reference given.

store_unaligned_m128i

Stores the value to the reference given.

store_unaligned_m256davx

Store data from a register into memory.

store_unaligned_m256avx

Store data from a register into memory.

store_unaligned_m256iavx

Store data from a register into memory.

sub_horizontal_i16_m128i

Subtract horizontal pairs of i16 values, pack the outputs as a then b.

sub_horizontal_i32_m128i

Subtract horizontal pairs of i32 values, pack the outputs as a then b.

sub_horizontal_m128d

Subtract each lane horizontally, pack the outputs as a then b.

sub_horizontal_m128

Subtract each lane horizontally, pack the outputs as a then b.

sub_horizontal_m256davx

Subtract adjacent f64 lanes.

sub_horizontal_m256avx

Subtract adjacent f32 lanes.

sub_horizontal_saturating_i16_m128i

Subtract horizontal pairs of i16 values, saturating, pack the outputs as a then b.

sub_i16_m128i

Lanewise a - b with lanes as i16.

sub_i32_m128i

Lanewise a - b with lanes as i32.

sub_i64_m128i

Lanewise a - b with lanes as i64.

sub_i8_m128i

Lanewise a - b with lanes as i8.

sub_m128

Lanewise a - b.

sub_m128_s

Low lane a - b, other lanes unchanged.

sub_m128d

Lanewise a - b.

sub_m128d_s

Lowest lane a - b, high lane unchanged.

sub_m256davx

Lanewise a - b with f64 lanes.

sub_m256avx

Lanewise a - b with f32 lanes.

sub_saturating_i16_m128i

Lanewise saturating a - b with lanes as i16.

sub_saturating_i8_m128i

Lanewise saturating a - b with lanes as i8.

sub_saturating_u16_m128i

Lanewise saturating a - b with lanes as u16.

sub_saturating_u8_m128i

Lanewise saturating a - b with lanes as u8.

sum_of_u8_abs_diff_m128i

Compute "sum of u8 absolute differences".

test_all_ones_m128i

Tests if all bits are 1.

test_all_zeroes_m128i

Returns if all masked bits are 0, (a & mask) as u128 == 0

test_mixed_ones_and_zeroes_m128i

Returns if, among the masked bits, there's both 0s and 1s

trailing_zero_count_u32bmi1

Counts the number of trailing zero bits in a u32.

trailing_zero_count_u64bmi1

Counts the number of trailing zero bits in a u64.

transpose_four_m128

Transpose four m128 as if they were a 4x4 matrix.

truncate_m128_to_m128i

Truncate the f32 lanes to i32 lanes.

truncate_m128d_to_m128i

Truncate the f64 lanes to the lower i32 lanes (upper i32 lanes 0).

truncate_to_i32_m128d_s

Truncate the lower lane into an i32.

truncate_to_i64_m128d_s

Truncate the lower lane into an i64.

unpack_hi_m256davx

Unpack and interleave the high lanes.

unpack_hi_m256avx

Unpack and interleave the high lanes.

unpack_high_i16_m128i

Unpack and interleave high i16 lanes of a and b.

unpack_high_i32_m128i

Unpack and interleave high i32 lanes of a and b.

unpack_high_i64_m128i

Unpack and interleave high i64 lanes of a and b.

unpack_high_i8_m128i

Unpack and interleave high i8 lanes of a and b.

unpack_high_m128

Unpack and interleave high lanes of a and b.

unpack_high_m128d

Unpack and interleave high lanes of a and b.

unpack_lo_m256davx

Unpack and interleave the high lanes.

unpack_lo_m256avx

Unpack and interleave the high lanes.

unpack_low_i16_m128i

Unpack and interleave low i16 lanes of a and b.

unpack_low_i32_m128i

Unpack and interleave low i32 lanes of a and b.

unpack_low_i64_m128i

Unpack and interleave low i64 lanes of a and b.

unpack_low_i8_m128i

Unpack and interleave low i8 lanes of a and b.

unpack_low_m128

Unpack and interleave low lanes of a and b.

unpack_low_m128d

Unpack and interleave low lanes of a and b.

xor_m128

Bitwise a ^ b.

xor_m128d

Bitwise a ^ b.

xor_m128i

Bitwise a ^ b.

xor_m256davx

Bitwise a ^ b.

xor_m256avx

Bitwise a ^ b.

zero_extend_m128davx

Zero extend an m128d to m256d

zero_extend_m128avx

Zero extend an m128 to m256

zero_extend_m128iavx

Zero extend an m128i to m256i

zeroed_m128

All lanes zero.

zeroed_m128i

All lanes zero.

zeroed_m128d

Both lanes zero.

zeroed_m256davx

A zeroed m256d

zeroed_m256avx

A zeroed m256

zeroed_m256iavx

A zeroed m256i