[−][src]Crate safe_arch
A crate that safely exposes arch intrinsics via #[cfg()]
.
safe_arch
lets you safely use CPU intrinsics. Those things in the
core::arch
modules. It works purely via #[cfg()]
and
compile time CPU feature declaration. If you want to check for a feature at
runtime and then call an intrinsic or use a fallback path based on that then
this crate is sadly not for you.
SIMD register types are "newtype'd" so that better trait impls can be given
to them, but the inner value is a pub
field so feel to just grab it out if
you need to. Trait impls of the newtypes include: Default
(zeroed),
From
/Into
of appropriate data types, and appropriate operator
overloading.
- Most intrinsics (like addition and multiplication) are totally safe to use as long as the CPU feature is available. In this case, what you get is 1:1 with the actual intrinsic.
- Some intrinsics take a pointer of an assumed minimum alignment and
validity span. For these, the
safe_arch
function takes a reference of an appropriate type to uphold safety.- Try the bytemuck crate (and turn on the
bytemuck
feature of this crate) if you want help safely casting between reference types.
- Try the bytemuck crate (and turn on the
- Some intrinsics are not safe unless you're very careful about how you use them, such as the streaming operations requiring you to use them in combination with an appropriate memory fence. Those operations aren't exposed here.
- Some intrinsics mess with the processor state, such as changing the floating point flags, saving and loading special register state, and so on. LLVM doesn't really support you messing with that within a high level language, so those operations aren't exposed here. Use assembly or something if you want to do that.
Naming Conventions
The actual names for each intrinsic are generally a flaming dumpster of
letters that only make sense after you've learned all the names. They're
very bad for learning what things do. Accordingly, safe_arch
uses very
verbose naming that (hopefully) improves the new-user experience.
- Function names start with the primary "verb" of the operation, and then
any adverbs go after that. This makes for slightly awkward English but
helps the list of all the functions sort a little better.
- Eg:
add_i32_m128i
andadd_i16_saturating_m128i
- Eg:
- Function names end with the register type they're most associated with.
- Eg:
and_m128
(form128
) andand_m128d
(form128d
)
- Eg:
- If a function operates on just the lowest data lane it generally has
_s
after the register type, because it's a "scalar" operation. The higher lanes are generally just copied forward, or taken from a secondary argument, or something. Details vary.- Eg:
sqrt_m128
(all lanes) andsqrt_m128_s
(low lane only)
- Eg:
Of course, people can't even always agree on what words mean. The common verb names for this crate, and their conventions, are as follows:
load
: Reads memory into a register (deref&Foo
toFoo
).store
: Writes a register to memory (writesFoo
to a&mut Foo
).set
: Packs values into a register (works like[1, 2, 3, 4]
to build an array).splat
: Copy a value as many times as possible across the bits of a register (works like[1_i32; LEN]
array building).extract
: Get an individual lane out of a SIMD register (works like array access). The lane to get has to be a const value.insert
: Duplicate a register and then replace the value of a specific lane (works likelet mut a2 = a.clone(); a2[i] = new;
). The lane to overwrite has to be a const value.cast
: change data types while preserving the bit pattern (like howtransmute
would do it).convert
: change data types while trying to preserve the numeric value (which might change the bits, like howas
would do it).
This crate is pre-1.0 and if you feel that an operation should have a better name to improve the crate's consistency please file an issue.
Current Support
- Intel and AMD (
x86
/x86_64
)- 128-bit:
sse
,sse2
,sse3
,ssse3
,sse4.1
,sse4.2
- 256-bit:
avx
- Other:
adx
,aes
,bmi1
,bmi2
,lzcnt
,pclmulqdq
,popcnt
,rdrand
,rdseed
- 128-bit:
Compile Time CPU Target Features
At the time of me writing this, Rust enables the sse
and sse2
CPU
features by default for all i686
(x86) and x86_64
builds. Those CPU
features are built into the design of x86_64
, and you'd need a super old
x86
CPU for it to not support at least sse
and sse2
, so they're a safe
bet for the language to enable all the time. In fact, because the standard
library is compiled with them enabled, simply trying to disable those
features would actually cause ABI issues and fill your program with UB
(link).
If you want additional CPU features available at compile time you'll have to
enable them with an additional arg to rustc
. For a feature named name
you pass -C target-feature=+name
, such as -C target-feature=+sse3
for
sse3
.
You can alternately enable all target features of the current CPU with -C target-cpu=native
. This is primarily of use if you're building a program
you'll only run on your own system.
It's sometimes hard to know if your target platform will support a given
feature set, but the Steam Hardware Survey is generally
taken as a guide to what you can expect people to have available. If you
click "Other Settings" it'll expand into a list of CPU target features and
how common they are. These days, it seems that sse3
can be safely assumed,
and ssse3
, sse4.1
, and sse4.2
are pretty safe bets as well. The stuff
above 128-bit isn't as common yet, give it another few years.
Please note that executing a program on a CPU that doesn't support the target features it was compiles for is Undefined Behavior.
Currently, Rust doesn't actually support an easy way for you to check that a
feature enabled at compile time is actually available at runtime. There is
the "feature_detected" family of macros, but if you
enable a feature they will evaluate to a constant true
instead of actually
deferring the check for the feature to runtime. This means that, if you
did want a check at the start of your program, to confirm that all the
assumed features are present and error out when the assumptions don't hold,
you can't use that macro. You gotta use CPUID and check manually. rip.
Hopefully we can make that process easier in a future version of this crate.
A Note On Working With Cfg
There's two main ways to use cfg
:
- Via an attribute placed on an item, block, or expression:
#[cfg(debug_assertions)] println!("hello");
- Via a macro used within an expression position:
if cfg!(debug_assertions) { println!("hello"); }
The difference might seem small but it's actually very important:
- The attribute form will include code or not before deciding if all the items named and so forth really exist or not. This means that code that is configured via attribute can safely name things that don't always exist as long as the things they name do exist whenever that code is configured into the build.
- The macro form will include the configured code no matter what, and then
the macro resolves to a constant
true
orfalse
and the compiler uses dead code elimination to cut out the path not taken.
This crate uses cfg
via the attribute, so the functions it exposes don't
exist at all when the appropriate CPU target features aren't enabled.
Accordingly, if you plan to call this crate or not depending on what
features are enabled in the build you'll also need to control your use of
this crate via cfg attribute, not cfg macro.
Macros
aes_key_gen_assist_m128i | aes ? |
blend_immediate_i16_m128i | Blends the |
blend_immediate_m128d | Blends the lanes according to the immediate mask. |
blend_immediate_m128 | Blends the lanes according to the immediate mask. |
blend_immediate_m256d | avx Blends the |
blend_immediate_m256 | avx Blends the |
byte_shift_left_u128_immediate_m128i | Shifts all bits in the entire register left by a number of bytes. |
byte_shift_right_u128_immediate_m128i | Shifts all bits in the entire register right by a number of bytes. |
cmp_op_mask_m128 | avx Compare |
cmp_op_mask_m128_s | avx Compare |
cmp_op_mask_m128d | avx Compare |
cmp_op_mask_m128d_s | avx Compare |
cmp_op_mask_m256 | avx Compare |
cmp_op_mask_m256d | avx Compare |
combined_byte_shift_right_immediate_m128i | Counts |
comparison_operator_translation | avx Turns a comparison operator token to the correct constant value. |
dot_product_m128d | Performs a dot product of two |
dot_product_m128 | Performs a dot product of two |
dot_product_m256 | avx This works like |
extract_f32_as_i32_bits_immediate_m128 | Gets the |
extract_i16_as_i32_m128i | Gets an |
extract_i32_from_m256i | avx Extracts an |
extract_i32_immediate_m128i | Gets the |
extract_i64_from_m256i | avx Extracts an |
extract_i64_immediate_m128i | Gets the |
extract_i8_as_i32_immediate_m128i | Gets the |
extract_m128_from_m256 | avx Extracts an |
extract_m128d_from_m256d | avx Extracts an |
extract_m128i_from_m256i | avx Extracts an |
insert_f32_immediate_m128 | Inserts a lane from |
insert_i16_from_i32_m128i | Inserts the low 16 bits of an |
insert_i16_to_m256i | avx Inserts an |
insert_i32_immediate_m128i | Inserts a new value for the |
insert_i32_to_m256i | avx Inserts an |
insert_i64_immediate_m128i | Inserts a new value for the |
insert_i64_to_m256i | avx Inserts an |
insert_i8_immediate_m128i | Inserts a new value for the |
insert_i8_to_m256i | avx Inserts an |
insert_m128_to_m256 | avx Inserts an |
insert_m128d_to_m256d | avx Inserts an |
insert_m128i_to_m256i | avx Inserts an |
mul_i64_carryless_m128i | pclmulqdq Performs a "carryless" multiplication of two |
multi_packed_sum_abs_diff_u8_m128i | Computes eight |
permute_f128_in_m256d | avx Permutes the lanes around. |
permute_f128_in_m256 | avx Permutes the lanes around. |
permute_i128_in_m256i | avx Permutes the lanes around. |
permute_m128d | avx Permutes the lanes around. |
permute_m128 | avx Permutes the lanes around. |
permute_m256d | avx Permutes the lanes around. |
permute_m256 | avx Permutes the lanes around. |
round_m128d | Rounds each lane in the style specified. |
round_m128d_s | Rounds |
round_m128 | Rounds each lane in the style specified. |
round_m128_s | Rounds |
round_m256d | avx Rounds each lane in the style specified. |
round_m256 | avx Rounds each lane in the style specified. |
shift_left_i16_immediate_m128i | Shifts all |
shift_left_i32_immediate_m128i | Shifts all |
shift_left_i64_immediate_m128i | Shifts both |
shift_right_i16_immediate_m128i | Shifts all |
shift_right_i32_immediate_m128i | Shifts all |
shift_right_u16_immediate_m128i | Shifts all |
shift_right_u32_immediate_m128i | Shifts all |
shift_right_u64_immediate_m128i | Shifts both |
shuffle_i16_high_lanes_m128i | Shuffles the higher |
shuffle_i16_low_lanes_m128i | Shuffles the lower |
shuffle_i32_m128i | Shuffles the |
shuffle_m128 | Shuffles the lanes around. |
shuffle_m128d | Shuffles the lanes around. |
shuffle_m256d | avx Shuffles the |
shuffle_m256 | avx Shuffles the |
string_search_for_index | sse4.1 Looks for |
string_search_for_mask | sse4.1 Looks for |
Structs
m128 | The data for a 128-bit SSE register of four |
m128d | The data for a 128-bit SSE register of two |
m128i | The data for a 128-bit SSE register of integer data. |
m256 | The data for a 256-bit SSE register of eight |
m256d | The data for a 256-bit SSE register of four |
m256i | The data for a 256-bit SSE register of integer data. |
Functions
abs_i16_m128i | Lanewise absolute value with lanes as |
abs_i32_m128i | Lanewise absolute value with lanes as |
abs_i8_m128i | Lanewise absolute value with lanes as |
add_carry_u32 | adx Add two |
add_carry_u64 | adx Add two |
add_horizontal_i16_m128i | Add horizontal pairs of |
add_horizontal_i32_m128i | Add horizontal pairs of |
add_horizontal_m128d | Add each lane horizontally, pack the outputs as |
add_horizontal_m128 | Add each lane horizontally, pack the outputs as |
add_horizontal_m256d | avx Add adjacent |
add_horizontal_m256 | avx Add adjacent |
add_horizontal_saturating_i16_m128i | Add horizontal pairs of |
add_i16_m128i | Lanewise |
add_i32_m128i | Lanewise |
add_i64_m128i | Lanewise |
add_i8_m128i | Lanewise |
add_m128 | sse Lanewise |
add_m128_s | Low lane |
add_m128d | Lanewise |
add_m128d_s | Lowest lane |
add_m256d | avx Lanewise |
add_m256 | avx Lanewise |
add_saturating_i16_m128i | Lanewise saturating |
add_saturating_i8_m128i | Lanewise saturating |
add_saturating_u16_m128i | Lanewise saturating |
add_saturating_u8_m128i | Lanewise saturating |
add_sub_m128d | Add the high lane and subtract the low lane. |
add_sub_m128 | Alternately, from the top, add a lane and then subtract a lane. |
add_sub_m256d | avx Alternately, from the top, add |
add_sub_m256 | avx Alternately, from the top, add |
aes_decrypt_last_m128i | aes Perform the last round of AES decryption flow on |
aes_decrypt_m128i | aes Perform one round of AES decryption flow on |
aes_encrypt_last_m128i | aes Perform the last round of AES encryption flow on |
aes_encrypt_m128i | aes Perform one round of AES encryption flow on |
aes_inv_mix_columns_m128i | aes Perform the InvMixColumns transform on |
and_m128 | Bitwise |
and_m128d | Bitwise |
and_m128i | Bitwise |
and_m256d | avx Bitwise |
and_m256 | avx Bitwise |
andnot_m128 | Bitwise |
andnot_m128d | Bitwise |
andnot_m128i | Bitwise |
andnot_m256d | avx Bitwise |
andnot_m256 | avx Bitwise |
andnot_u32 | bmi1 Bitwise |
andnot_u64 | bmi1 Bitwise |
average_u16_m128i | Lanewise average of the |
average_u8_m128i | Lanewise average of the |
bit_extract2_u32 | bmi1 Extract a span of bits from the |
bit_extract2_u64 | bmi1 Extract a span of bits from the |
bit_extract_u32 | bmi1 Extract a span of bits from the |
bit_extract_u64 | bmi1 Extract a span of bits from the |
bit_lowest_set_mask_u32 | bmi1 Gets the mask of all bits up to and including the lowest set bit in a |
bit_lowest_set_mask_u64 | bmi1 Gets the mask of all bits up to and including the lowest set bit in a |
bit_lowest_set_reset_u32 | bmi1 Resets (clears) the lowest set bit. |
bit_lowest_set_reset_u64 | bmi1 Resets (clears) the lowest set bit. |
bit_lowest_set_value_u32 | bmi1 Gets the value of the lowest set bit in a |
bit_lowest_set_value_u64 | bmi1 Gets the value of the lowest set bit in a |
bit_zero_high_index_u32 | bmi2 Zero out all high bits in a |
bit_zero_high_index_u64 | bmi2 Zero out all high bits in a |
blend_varying_i8_m128i | Blend the |
blend_varying_m128d | Blend the lanes according to a runtime varying mask. |
blend_varying_m128 | Blend the lanes according to a runtime varying mask. |
blend_varying_m256d | avx Blend the lanes according to a runtime varying mask. |
blend_varying_m256 | avx Blend the lanes according to a runtime varying mask. |
cast_from_m256_to_m256d | avx Bit-preserving cast from |
cast_from_m256_to_m256i | avx Bit-preserving cast from |
cast_from_m256d_to_m256 | avx Bit-preserving cast from |
cast_from_m256d_to_m256i | avx Bit-preserving cast from |
cast_from_m256i_to_m256d | avx Bit-preserving cast from |
cast_from_m256i_to_m256 | avx Bit-preserving cast from |
cast_to_m128_from_m128d | Bit-preserving cast to |
cast_to_m128_from_m128i | Bit-preserving cast to |
cast_to_m128d_from_m128 | Bit-preserving cast to |
cast_to_m128d_from_m128i | Bit-preserving cast to |
cast_to_m128i_from_m128d | Bit-preserving cast to |
cast_to_m128i_from_m128 | Bit-preserving cast to |
ceil_m128d | Round each lane to a whole number, towards positive infinity |
ceil_m128 | Round each lane to a whole number, towards positive infinity |
ceil_m128d_s | Round the low lane of |
ceil_m128_s | Round the low lane of |
ceil_m256d | avx Round |
ceil_m256 | avx Round |
cmp_eq_i32_m128_s | Low lane equality. |
cmp_eq_i32_m128d_s | Low lane |
cmp_eq_mask_i16_m128i | Lanewise |
cmp_eq_mask_i32_m128i | Lanewise |
cmp_eq_mask_i64_m128i | Lanewise |
cmp_eq_mask_i8_m128i | Lanewise |
cmp_eq_mask_m128 | Lanewise |
cmp_eq_mask_m128_s | Low lane |
cmp_eq_mask_m128d | Lanewise |
cmp_eq_mask_m128d_s | Low lane |
cmp_ge_i32_m128_s | Low lane greater than or equal to. |
cmp_ge_i32_m128d_s | Low lane |
cmp_ge_mask_m128 | Lanewise |
cmp_ge_mask_m128_s | Low lane |
cmp_ge_mask_m128d | Lanewise |
cmp_ge_mask_m128d_s | Low lane |
cmp_gt_i32_m128_s | Low lane greater than. |
cmp_gt_i32_m128d_s | Low lane |
cmp_gt_mask_i16_m128i | Lanewise |
cmp_gt_mask_i32_m128i | Lanewise |
cmp_gt_mask_i64_m128i | sse4.1 Lanewise |
cmp_gt_mask_i8_m128i | Lanewise |
cmp_gt_mask_m128 | Lanewise |
cmp_gt_mask_m128_s | Low lane |
cmp_gt_mask_m128d | Lanewise |
cmp_gt_mask_m128d_s | Low lane |
cmp_le_i32_m128_s | Low lane less than or equal to. |
cmp_le_i32_m128d_s | Low lane |
cmp_le_mask_m128 | Lanewise |
cmp_le_mask_m128_s | Low lane |
cmp_le_mask_m128d | Lanewise |
cmp_le_mask_m128d_s | Low lane |
cmp_lt_i32_m128_s | Low lane less than. |
cmp_lt_i32_m128d_s | Low lane |
cmp_lt_mask_i16_m128i | Lanewise |
cmp_lt_mask_i32_m128i | Lanewise |
cmp_lt_mask_i8_m128i | Lanewise |
cmp_lt_mask_m128 | Lanewise |
cmp_lt_mask_m128_s | Low lane |
cmp_lt_mask_m128d | Lanewise |
cmp_lt_mask_m128d_s | Low lane |
cmp_neq_i32_m128_s | Low lane not equal to. |
cmp_neq_i32_m128d_s | Low lane |
cmp_neq_mask_m128 | Lanewise |
cmp_neq_mask_m128_s | Low lane |
cmp_neq_mask_m128d | Lanewise |
cmp_neq_mask_m128d_s | Low lane |
cmp_nge_mask_m128 | Lanewise |
cmp_nge_mask_m128_s | Low lane |
cmp_nge_mask_m128d | Lanewise |
cmp_nge_mask_m128d_s | Low lane |
cmp_ngt_mask_m128 | Lanewise |
cmp_ngt_mask_m128_s | Low lane |
cmp_ngt_mask_m128d | Lanewise |
cmp_ngt_mask_m128d_s | Low lane |
cmp_nle_mask_m128 | Lanewise |
cmp_nle_mask_m128_s | Low lane |
cmp_nle_mask_m128d | Lanewise |
cmp_nle_mask_m128d_s | Low lane |
cmp_nlt_mask_m128 | Lanewise |
cmp_nlt_mask_m128_s | Low lane |
cmp_nlt_mask_m128d | Lanewise |
cmp_nlt_mask_m128d_s | Low lane |
cmp_ord_mask_m128 | Lanewise |
cmp_ord_mask_m128_s | Low lane |
cmp_ord_mask_m128d | Lanewise |
cmp_ord_mask_m128d_s | Low lane |
cmp_unord_mask_m128 | Lanewise |
cmp_unord_mask_m128_s | Low lane |
cmp_unord_mask_m128d | Lanewise |
cmp_unord_mask_m128d_s | Low lane |
convert_i16_lower2_to_i64_m128i | Convert the lower two |
convert_i16_lower4_to_i32_m128i | Convert the lower four |
convert_i32_lower2_to_i64_m128i | Convert the lower two |
convert_i32_replace_m128_s | Convert |
convert_i32_replace_m128d_s | Convert |
convert_i64_replace_m128d_s | Convert |
convert_i8_lower2_to_i64_m128i | Convert the lower two |
convert_i8_lower4_to_i32_m128i | Convert the lower four |
convert_i8_lower8_to_i16_m128i | Convert the lower eight |
convert_m128_s_replace_m128d_s | Converts the lower |
convert_m128d_s_replace_m128_s | Converts the low |
convert_to_f32_from_m256_s | avx Convert the lowest |
convert_to_f64_from_m256d_s | avx Convert the lowest |
convert_to_i32_from_m256i_s | avx Convert the lowest |
convert_to_i32_m128i_from_m256d | avx Convert |
convert_to_i32_m256i_from_m256 | avx Convert |
convert_to_m128_from_m128i | Rounds the four |
convert_to_m128_from_m128d | Rounds the two |
convert_to_m128_from_m256d | avx Convert |
convert_to_m128d_from_m128i | Rounds the lower two |
convert_to_m128d_from_m128 | Rounds the two |
convert_to_m128i_from_m128d | Rounds the two |
convert_to_m128i_from_m128 | Rounds the two |
convert_to_m128i_from_m256d | avx Convert |
convert_to_m256_from_i32_m256i | avx Convert |
convert_to_m256d_from_i32_m128i | avx Convert |
convert_to_m256d_from_m128 | avx Convert |
convert_to_m256i_from_m256 | avx Convert |
convert_u16_lower2_to_u64_m128i | Convert the lower two |
convert_u16_lower4_to_u32_m128i | Convert the lower four |
convert_u32_lower2_to_u64_m128i | Convert the lower two |
convert_u8_lower2_to_u64_m128i | Convert the lower two |
convert_u8_lower4_to_u32_m128i | Convert the lower four |
convert_u8_lower8_to_u16_m128i | Convert the lower eight |
copy_i64_m128i_s | Copy the low |
copy_replace_low_f64_m128d | Copies the |
crc32_u8 | sse4.1 Accumulates the |
crc32_u16 | sse4.1 Accumulates the |
crc32_u32 | sse4.1 Accumulates the |
crc32_u64 | sse4.1 Accumulates the |
div_m128 | Lanewise |
div_m128_s | Low lane |
div_m128d | Lanewise |
div_m128d_s | Lowest lane |
div_m256d | avx Lanewise |
div_m256 | avx Lanewise |
duplicate_even_lanes_m128 | Duplicate the odd lanes to the even lanes. |
duplicate_even_lanes_m256 | avx Duplicate the even-indexed lanes to the odd lanes. |
duplicate_low_lane_m128d_s | Copy the low lane of the input to both lanes of the output. |
duplicate_odd_lanes_m128 | Duplicate the odd lanes to the even lanes. |
duplicate_odd_lanes_m256d | avx Duplicate the odd-indexed lanes to the even lanes. |
duplicate_odd_lanes_m256 | avx Duplicate the odd-indexed lanes to the even lanes. |
floor_m128d | Round each lane to a whole number, towards negative infinity |
floor_m128 | Round each lane to a whole number, towards negative infinity |
floor_m128d_s | Round the low lane of |
floor_m128_s | Round the low lane of |
floor_m256d | avx Round |
floor_m256 | avx Round |
get_f32_from_m128_s | Gets the low lane as an individual |
get_f64_from_m128d_s | Gets the lower lane as an |
get_i32_from_m128_s | Converts the low lane to |
get_i32_from_m128d_s | Converts the lower lane to an |
get_i32_from_m128i_s | Converts the lower lane to an |
get_i64_from_m128d_s | Converts the lower lane to an |
get_i64_from_m128i_s | Converts the lower lane to an |
leading_zero_count_u32 | lzcnt Count the leading zeroes in a |
leading_zero_count_u64 | lzcnt Count the leading zeroes in a |
load_f32_m128_s | Loads the |
load_f32_splat_m128 | Loads the |
load_f32_splat_m256 | avx Load an |
load_f64_m128d_s | Loads the reference into the low lane of the register. |
load_f64_splat_m128d | Loads the |
load_f64_splat_m256d | avx Load an |
load_i64_m128i_s | Loads the low |
load_m128 | Loads the reference into a register. |
load_m128d | Loads the reference into a register. |
load_m128i | Loads the reference into a register. |
load_m256d | avx Load data from memory into a register. |
load_m256 | avx Load data from memory into a register. |
load_m256i | avx Load data from memory into a register. |
load_m128_splat_m256 | avx Load an |
load_m128d_splat_m256d | avx Load an |
load_masked_m128d | avx Load data from memory into a register according to a mask. |
load_masked_m128 | avx Load data from memory into a register according to a mask. |
load_masked_m256d | avx Load data from memory into a register according to a mask. |
load_masked_m256 | avx Load data from memory into a register according to a mask. |
load_replace_high_m128d | Loads the reference into a register, replacing the high lane. |
load_replace_low_m128d | Loads the reference into a register, replacing the low lane. |
load_reverse_m128 | Loads the reference into a register with reversed order. |
load_reverse_m128d | Loads the reference into a register with reversed order. |
load_unaligned_hi_lo_m256d | avx Load data from memory into a register. |
load_unaligned_hi_lo_m256 | avx Load data from memory into a register. |
load_unaligned_hi_lo_m256i | avx Load data from memory into a register. |
load_unaligned_m128 | Loads the reference into a register. |
load_unaligned_m128d | Loads the reference into a register. |
load_unaligned_m128i | Loads the reference into a register. |
load_unaligned_m256d | avx Load data from memory into a register. |
load_unaligned_m256 | avx Load data from memory into a register. |
load_unaligned_m256i | avx Load data from memory into a register. |
max_i16_m128i | Lanewise |
max_i32_m128i | Lanewise |
max_i8_m128i | Lanewise |
max_m128 | Lanewise |
max_m128_s | Low lane |
max_m128d | Lanewise |
max_m128d_s | Low lane |
max_m256d | avx Lanewise |
max_m256 | avx Lanewise |
max_u16_m128i | Lanewise |
max_u32_m128i | Lanewise |
max_u8_m128i | Lanewise |
min_i16_m128i | Lanewise |
min_i32_m128i | Lanewise |
min_i8_m128i | Lanewise |
min_m128 | Lanewise |
min_m128_s | Low lane |
min_m128d | Lanewise |
min_m128d_s | Low lane |
min_m256d | avx Lanewise |
min_m256 | avx Lanewise |
min_position_u16_m128i | Min |
min_u16_m128i | Lanewise |
min_u32_m128i | Lanewise |
min_u8_m128i | Lanewise |
move_high_low_m128 | Move the high lanes of |
move_low_high_m128 | Move the low lanes of |
move_m128_s | Move the low lane of |
move_mask_i8_m128i | Gathers the |
move_mask_m128 | Gathers the sign bit of each lane. |
move_mask_m128d | Gathers the sign bit of each lane. |
move_mask_m256d | avx Collects the sign bit of each lane into a 4-bit value. |
move_mask_m256 | avx Collects the sign bit of each lane into a 4-bit value. |
mul_extended_u32 | bmi2 Multiply two |
mul_extended_u64 | bmi2 Multiply two |
mul_i16_horizontal_add_m128i | Multiply |
mul_i16_keep_high_m128i | Lanewise |
mul_i16_keep_low_m128i | Lanewise |
mul_i16_scale_round_m128i | Multiply |
mul_i32_keep_low_m128i | Lanewise |
mul_i64_widen_low_bits_m128i | Multiplies the lower 32 bits (only) of each |
mul_m128 | Lanewise |
mul_m128_s | Low lane |
mul_m128d | Lanewise |
mul_m128d_s | Lowest lane |
mul_m256d | avx Lanewise |
mul_m256 | avx Lanewise |
mul_u16_keep_high_m128i | Lanewise |
mul_u64_widen_low_bits_m128i | Multiplies the lower 32 bits (only) of each |
mul_u8i8_add_horizontal_saturating_m128i | This is dumb and weird. |
or_m128 | Bitwise |
or_m128d | Bitwise |
or_m128i | Bitwise |
or_m256d | avx Bitwise |
or_m256 | avx Bitwise |
pack_i16_to_i8_m128i | Saturating convert |
pack_i16_to_u8_m128i | Saturating convert |
pack_i32_to_i16_m128i | Saturating convert |
pack_i32_to_u16_m128i | Saturating convert |
permute_varying_m128d | avx Permute with a runtime varying pattern. |
permute_varying_m128 | avx Permute with a runtime varying pattern. |
permute_varying_m256d | avx Permute with a runtime varying pattern. |
permute_varying_m256 | avx Permute with a runtime varying pattern. |
population_count_i32 | popcnt Count the number of bits set within an |
population_count_i64 | popcnt Count the number of bits set within an |
population_deposit_u32 | bmi2 Deposit contiguous low bits from a |
population_deposit_u64 | bmi2 Deposit contiguous low bits from a |
population_extract_u32 | bmi2 Extract bits from a |
population_extract_u64 | bmi2 Extract bits from a |
rdrand_u16 | rdrand Try to obtain a random |
rdrand_u32 | rdrand Try to obtain a random |
rdrand_u64 | rdrand Try to obtain a random |
rdseed_u16 | rdseed Try to obtain a random |
rdseed_u32 | rdseed Try to obtain a random |
rdseed_u64 | rdseed Try to obtain a random |
reciprocal_m128 | Lanewise |
reciprocal_m128_s | Low lane |
reciprocal_m256 | avx Reciprocal of |
reciprocal_sqrt_m128 | Lanewise |
reciprocal_sqrt_m128_s | Low lane |
reciprocal_sqrt_m256 | avx Reciprocal of |
set_i16_m128i | Sets the args into an |
set_i16_m256i | avx Set |
set_i32_m128i_s | Set an |
set_i32_m128i | Sets the args into an |
set_i32_m256i | avx Set |
set_i64_m128i_s | Set an |
set_i64_m128i | Sets the args into an |
set_i8_m128i | Sets the args into an |
set_i8_m256i | avx Set |
set_m128 | Sets the args into an |
set_m128_s | Sets the args into an |
set_m128d | Sets the args into an |
set_m128d_s | Sets the args into the low lane of a |
set_m256d | avx Set |
set_m256 | avx Set |
set_m128d_m256d | avx Set |
set_m128i_m256i | avx Set |
set_reversed_i16_m128i | Sets the args into an |
set_reversed_i16_m256i | avx Set |
set_reversed_i32_m128i | Sets the args into an |
set_reversed_i32_m256i | avx Set |
set_reversed_i8_m128i | Sets the args into an |
set_reversed_i8_m256i | avx Set |
set_reversed_m128 | Sets the args into an |
set_reversed_m128d | Sets the args into an |
set_reversed_m256d | avx Set |
set_reversed_m256 | avx Set |
set_reversed_m128d_m256d | avx Set |
set_reversed_m128i_m256i | avx Set |
set_splat_i16_m256i | avx Splat an |
set_splat_i32_m256i | avx Splat an |
set_splat_i8_m256i | avx Splat an |
set_splat_m256 | avx Splat an |
shift_left_i16_m128i | Shift each |
shift_left_i32_m128i | Shift each |
shift_left_i64_m128i | Shift each |
shift_right_i16_m128i | Shift each |
shift_right_i32_m128i | Shift each |
shift_right_u16_m128i | Shift each |
shift_right_u32_m128i | Shift each |
shift_right_u64_m128i | Shift each |
shuffle_i8_m128i | Shuffles the |
sign_apply_i16_m128i | Applies the sign of |
sign_apply_i32_m128i | Applies the sign of |
sign_apply_i8_m128i | Applies the sign of |
splat_i16_m128i | Splats the |
splat_i32_m128i | Splats the |
splat_i64_m128i | Splats the |
splat_i8_m128i | Splats the |
splat_m128 | Splats the value to all lanes. |
splat_m128d | Splats the args into both lanes of the |
sqrt_m128 | Lanewise |
sqrt_m128_s | Low lane |
sqrt_m128d | Lanewise |
sqrt_m128d_s | Low lane |
sqrt_m256d | avx Lanewise |
sqrt_m256 | avx Lanewise |
store_high_m128d_s | Stores the high lane value to the reference given. |
store_i64_m128i_s | Stores the value to the reference given. |
store_m128 | Stores the value to the reference given. |
store_m128_s | Stores the low lane value to the reference given. |
store_m128d | Stores the value to the reference given. |
store_m128d_s | Stores the low lane value to the reference given. |
store_m128i | Stores the value to the reference given. |
store_m256d | avx Store data from a register into memory. |
store_m256 | avx Store data from a register into memory. |
store_m256i | avx Store data from a register into memory. |
store_masked_m128d | avx Store data from a register into memory according to a mask. |
store_masked_m128 | avx Store data from a register into memory according to a mask. |
store_masked_m256d | avx Store data from a register into memory according to a mask. |
store_masked_m256 | avx Store data from a register into memory according to a mask. |
store_reverse_m128 | Stores the value to the reference given in reverse order. |
store_reversed_m128d | Stores the value to the reference given. |
store_splat_m128 | Stores the low lane value to all lanes of the reference given. |
store_splat_m128d | Stores the low lane value to all lanes of the reference given. |
store_unaligned_hi_lo_m256d | avx Store data from a register into memory. |
store_unaligned_hi_lo_m256 | avx Store data from a register into memory. |
store_unaligned_hi_lo_m256i | avx Store data from a register into memory. |
store_unaligned_m128 | Stores the value to the reference given. |
store_unaligned_m128d | Stores the value to the reference given. |
store_unaligned_m128i | Stores the value to the reference given. |
store_unaligned_m256d | avx Store data from a register into memory. |
store_unaligned_m256 | avx Store data from a register into memory. |
store_unaligned_m256i | avx Store data from a register into memory. |
sub_horizontal_i16_m128i | Subtract horizontal pairs of |
sub_horizontal_i32_m128i | Subtract horizontal pairs of |
sub_horizontal_m128d | Subtract each lane horizontally, pack the outputs as |
sub_horizontal_m128 | Subtract each lane horizontally, pack the outputs as |
sub_horizontal_m256d | avx Subtract adjacent |
sub_horizontal_m256 | avx Subtract adjacent |
sub_horizontal_saturating_i16_m128i | Subtract horizontal pairs of |
sub_i16_m128i | Lanewise |
sub_i32_m128i | Lanewise |
sub_i64_m128i | Lanewise |
sub_i8_m128i | Lanewise |
sub_m128 | Lanewise |
sub_m128_s | Low lane |
sub_m128d | Lanewise |
sub_m128d_s | Lowest lane |
sub_m256d | avx Lanewise |
sub_m256 | avx Lanewise |
sub_saturating_i16_m128i | Lanewise saturating |
sub_saturating_i8_m128i | Lanewise saturating |
sub_saturating_u16_m128i | Lanewise saturating |
sub_saturating_u8_m128i | Lanewise saturating |
sum_of_u8_abs_diff_m128i | Compute "sum of |
test_all_ones_m128i | Tests if all bits are 1. |
test_all_zeroes_m128i | Returns if all masked bits are 0, |
test_mixed_ones_and_zeroes_m128i | Returns if, among the masked bits, there's both 0s and 1s |
trailing_zero_count_u32 | bmi1 Counts the number of trailing zero bits in a |
trailing_zero_count_u64 | bmi1 Counts the number of trailing zero bits in a |
transpose_four_m128 | Transpose four |
truncate_m128_to_m128i | Truncate the |
truncate_m128d_to_m128i | Truncate the |
truncate_to_i32_m128d_s | Truncate the lower lane into an |
truncate_to_i64_m128d_s | Truncate the lower lane into an |
unpack_hi_m256d | avx Unpack and interleave the high lanes. |
unpack_hi_m256 | avx Unpack and interleave the high lanes. |
unpack_high_i16_m128i | Unpack and interleave high |
unpack_high_i32_m128i | Unpack and interleave high |
unpack_high_i64_m128i | Unpack and interleave high |
unpack_high_i8_m128i | Unpack and interleave high |
unpack_high_m128 | Unpack and interleave high lanes of |
unpack_high_m128d | Unpack and interleave high lanes of |
unpack_lo_m256d | avx Unpack and interleave the high lanes. |
unpack_lo_m256 | avx Unpack and interleave the high lanes. |
unpack_low_i16_m128i | Unpack and interleave low |
unpack_low_i32_m128i | Unpack and interleave low |
unpack_low_i64_m128i | Unpack and interleave low |
unpack_low_i8_m128i | Unpack and interleave low |
unpack_low_m128 | Unpack and interleave low lanes of |
unpack_low_m128d | Unpack and interleave low lanes of |
xor_m128 | Bitwise |
xor_m128d | Bitwise |
xor_m128i | Bitwise |
xor_m256d | avx Bitwise |
xor_m256 | avx Bitwise |
zero_extend_m128d | avx Zero extend an |
zero_extend_m128 | avx Zero extend an |
zero_extend_m128i | avx Zero extend an |
zeroed_m128 | All lanes zero. |
zeroed_m128i | All lanes zero. |
zeroed_m128d | Both lanes zero. |
zeroed_m256d | avx A zeroed |
zeroed_m256 | avx A zeroed |
zeroed_m256i | avx A zeroed |