Module stdsimd::vendor
[−]
[src]
Platform dependent vendor intrinsics.
Constants
Functions
_MM_GET_EXCEPTION_MASK⚠ | |
_MM_GET_EXCEPTION_STATE⚠ | |
_MM_GET_FLUSH_ZERO_MODE⚠ | |
_MM_GET_ROUNDING_MODE⚠ | |
_MM_SET_EXCEPTION_MASK⚠ | |
_MM_SET_EXCEPTION_STATE⚠ | |
_MM_SET_FLUSH_ZERO_MODE⚠ | |
_MM_SET_ROUNDING_MODE⚠ | |
_MM_TRANSPOSE4_PS⚠ |
Transpose the 4x4 matrix formed by 4 rows of f32x4 in place. |
_andn_u32⚠ |
Bitwise logical |
_andn_u64⚠ |
Bitwise logical |
_bextr2_u32⚠ |
Extracts bits of |
_bextr_u32⚠ |
Extracts bits in range [ |
_blcfill_u32⚠ |
Clears all bits below the least significant zero bit of |
_blci_u32⚠ |
Sets all bits of |
_blcic_u32⚠ |
Sets the least significant zero bit of |
_blcmsk_u32⚠ |
Sets the least significant zero bit of |
_blcs_u32⚠ |
Sets the least significant zero bit of |
_blsfill_u32⚠ |
Sets all bits of |
_blsi_u32⚠ |
Extract lowest set isolated bit. |
_blsic_u32⚠ |
Clears least significant bit and sets all other bits. |
_blsmsk_u32⚠ |
Get mask up to lowest set bit. |
_blsr_u32⚠ |
Resets the lowest set bit of |
_bzhi_u32⚠ |
Zero higher bits of |
_lzcnt_u32⚠ |
Counts the leading most significant zero bits. |
_lzcnt_u64⚠ |
Counts the leading most significant zero bits. |
_mm256_abs_epi8⚠ |
Computes the absolute values of packed 8-bit integers in |
_mm256_abs_epi16⚠ |
Computes the absolute values of packed 16-bit integers in |
_mm256_abs_epi32⚠ |
Computes the absolute values of packed 32-bit integers in |
_mm256_add_epi8⚠ |
Add packed 8-bit integers in |
_mm256_add_epi16⚠ |
Add packed 16-bit integers in |
_mm256_add_epi32⚠ |
Add packed 32-bit integers in |
_mm256_add_epi64⚠ |
Add packed 64-bit integers in |
_mm256_add_pd⚠ |
Add packed double-precision (64-bit) floating-point elements
in |
_mm256_add_ps⚠ |
Add packed single-precision (32-bit) floating-point elements in |
_mm256_adds_epi8⚠ |
Add packed 8-bit integers in |
_mm256_adds_epi16⚠ |
Add packed 16-bit integers in |
_mm256_adds_epu8⚠ |
Add packed unsigned 8-bit integers in |
_mm256_adds_epu16⚠ |
Add packed unsigned 16-bit integers in |
_mm256_addsub_pd⚠ |
Alternatively add and subtract packed double-precision (64-bit)
floating-point elements in |
_mm256_addsub_ps⚠ |
Alternatively add and subtract packed single-precision (32-bit)
floating-point elements in |
_mm256_alignr_epi8⚠ |
Concatenate pairs of 16-byte blocks in |
_mm256_and_pd⚠ |
Compute the bitwise AND of a packed double-precision (64-bit)
floating-point elements
in |
_mm256_and_ps⚠ |
Compute the bitwise AND of packed single-precision (32-bit) floating-point
elements in |
_mm256_and_si256⚠ |
Compute the bitwise AND of 256 bits (representing integer data)
in |
_mm256_andnot_pd⚠ |
Compute the bitwise NOT of packed double-precision (64-bit) floating-point
elements in |
_mm256_andnot_ps⚠ |
Compute the bitwise NOT of packed single-precision (32-bit) floating-point
elements in |
_mm256_andnot_si256⚠ |
Compute the bitwise NOT of 256 bits (representing integer data)
in |
_mm256_avg_epu8⚠ |
Average packed unsigned 8-bit integers in |
_mm256_avg_epu16⚠ |
Average packed unsigned 16-bit integers in |
_mm256_blend_epi16⚠ |
Blend packed 16-bit integers from |
_mm256_blend_epi32⚠ |
Blend packed 32-bit integers from |
_mm256_blend_pd⚠ |
Blend packed double-precision (64-bit) floating-point elements from
|
_mm256_blendv_epi8⚠ |
Blend packed 8-bit integers from |
_mm256_blendv_pd⚠ |
Blend packed double-precision (64-bit) floating-point elements from
|
_mm256_blendv_ps⚠ |
Blend packed single-precision (32-bit) floating-point elements from
|
_mm256_broadcast_pd⚠ |
Broadcast 128 bits from memory (composed of 2 packed double-precision (64-bit) floating-point elements) to all elements of the returned vector. |
_mm256_broadcast_ps⚠ |
Broadcast 128 bits from memory (composed of 4 packed single-precision (32-bit) floating-point elements) to all elements of the returned vector. |
_mm256_broadcast_sd⚠ |
Broadcast a double-precision (64-bit) floating-point element from memory to all elements of the returned vector. |
_mm256_broadcast_ss⚠ |
Broadcast a single-precision (32-bit) floating-point element from memory to all elements of the returned vector. |
_mm256_broadcastb_epi8⚠ |
Broadcast the low packed 8-bit integer from |
_mm256_broadcastd_epi32⚠ |
Broadcast the low packed 32-bit integer from |
_mm256_broadcastq_epi64⚠ |
Broadcast the low packed 64-bit integer from |
_mm256_broadcastsd_pd⚠ |
Broadcast the low double-precision (64-bit) floating-point element
from |
_mm256_broadcastsi128_si256⚠ |
Broadcast 128 bits of integer data from a to all 128-bit lanes in the 256-bit returned value. |
_mm256_broadcastss_ps⚠ |
Broadcast the low single-precision (32-bit) floating-point element
from |
_mm256_broadcastw_epi16⚠ |
Broadcast the low packed 16-bit integer from a to all elements of the 256-bit returned value |
_mm256_castpd128_pd256⚠ |
Casts vector of type __m128d to type __m256d; the upper 128 bits of the result are undefined. |
_mm256_castpd256_pd128⚠ |
Casts vector of type __m256d to type __m128d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. |
_mm256_castpd_ps⚠ |
Cast vector of type __m256d to type __m256. |
_mm256_castpd_si256⚠ |
Casts vector of type __m256d to type __m256i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. |
_mm256_castps128_ps256⚠ |
Casts vector of type __m128 to type __m256; the upper 128 bits of the result are undefined. |
_mm256_castps256_ps128⚠ |
Casts vector of type __m256 to type __m128. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. |
_mm256_castps_pd⚠ |
Cast vector of type __m256 to type __m256d. |
_mm256_castps_si256⚠ |
Casts vector of type __m256 to type __m256i. |
_mm256_castsi128_si256⚠ |
Casts vector of type __m128i to type __m256i; the upper 128 bits of the result are undefined. |
_mm256_castsi256_pd⚠ |
Casts vector of type __m256i to type __m256d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. |
_mm256_castsi256_ps⚠ |
Casts vector of type __m256i to type __m256. |
_mm256_castsi256_si128⚠ |
Casts vector of type __m256i to type __m128i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. |
_mm256_ceil_pd⚠ |
Round packed double-precision (64-bit) floating point elements in |
_mm256_ceil_ps⚠ |
Round packed single-precision (32-bit) floating point elements in |
_mm256_cmp_pd⚠ |
Compare packed double-precision (64-bit) floating-point
elements in |
_mm256_cmp_ps⚠ |
Compare packed single-precision (32-bit) floating-point
elements in |
_mm256_cmpeq_epi8⚠ |
Compare packed 8-bit integers in |
_mm256_cmpeq_epi16⚠ |
Compare packed 16-bit integers in |
_mm256_cmpeq_epi32⚠ |
Compare packed 32-bit integers in |
_mm256_cmpeq_epi64⚠ |
Compare packed 64-bit integers in |
_mm256_cmpgt_epi8⚠ |
Compare packed 8-bit integers in |
_mm256_cmpgt_epi16⚠ |
Compare packed 16-bit integers in |
_mm256_cmpgt_epi32⚠ |
Compare packed 32-bit integers in |
_mm256_cmpgt_epi64⚠ |
Compare packed 64-bit integers in |
_mm256_cvtepi16_epi32⚠ |
Sign-extend 16-bit integers to 32-bit integers. |
_mm256_cvtepi16_epi64⚠ |
Sign-extend 16-bit integers to 64-bit integers. |
_mm256_cvtepi32_epi64⚠ |
Sign-extend 32-bit integers to 64-bit integers. |
_mm256_cvtepi32_pd⚠ |
Convert packed 32-bit integers in |
_mm256_cvtepi32_ps⚠ |
Convert packed 32-bit integers in |
_mm256_cvtepi8_epi16⚠ |
Sign-extend 8-bit integers to 16-bit integers. |
_mm256_cvtepi8_epi32⚠ |
Sign-extend 8-bit integers to 32-bit integers. |
_mm256_cvtepi8_epi64⚠ |
Sign-extend 8-bit integers to 64-bit integers. |
_mm256_cvtpd_epi32⚠ |
Convert packed double-precision (64-bit) floating-point elements in |
_mm256_cvtpd_ps⚠ |
Convert packed double-precision (64-bit) floating-point elements in |
_mm256_cvtps_epi32⚠ |
Convert packed single-precision (32-bit) floating-point elements in |
_mm256_cvtps_pd⚠ |
Convert packed single-precision (32-bit) floating-point elements in |
_mm256_cvttpd_epi32⚠ |
Convert packed double-precision (64-bit) floating-point elements in |
_mm256_cvttps_epi32⚠ |
Convert packed single-precision (32-bit) floating-point elements in |
_mm256_div_pd⚠ |
Compute the division of each of the 4 packed 64-bit floating-point elements
in |
_mm256_div_ps⚠ |
Compute the division of each of the 8 packed 32-bit floating-point elements
in |
_mm256_dp_ps⚠ |
Conditionally multiply the packed single-precision (32-bit) floating-point
elements in |
_mm256_extract_epi8⚠ |
Extract an 8-bit integer from |
_mm256_extract_epi16⚠ |
Extract a 16-bit integer from |
_mm256_extract_epi32⚠ |
Extract a 32-bit integer from |
_mm256_extract_epi64⚠ |
Extract a 64-bit integer from |
_mm256_extractf128_pd⚠ |
Extract 128 bits (composed of 2 packed double-precision (64-bit)
floating-point elements) from |
_mm256_extractf128_ps⚠ |
Extract 128 bits (composed of 4 packed single-precision (32-bit)
floating-point elements) from |
_mm256_extractf128_si256⚠ |
Extract 128 bits (composed of integer data) from |
_mm256_floor_pd⚠ |
Round packed double-precision (64-bit) floating point elements in |
_mm256_floor_ps⚠ |
Round packed single-precision (32-bit) floating point elements in |
_mm256_hadd_epi16⚠ |
Horizontally add adjacent pairs of 16-bit integers in |
_mm256_hadd_epi32⚠ |
Horizontally add adjacent pairs of 32-bit integers in |
_mm256_hadd_pd⚠ |
Horizontal addition of adjacent pairs in the two packed vectors
of 4 64-bit floating points |
_mm256_hadd_ps⚠ |
Horizontal addition of adjacent pairs in the two packed vectors
of 8 32-bit floating points |
_mm256_hadds_epi16⚠ |
Horizontally add adjacent pairs of 16-bit integers in |
_mm256_hsub_epi16⚠ |
Horizontally substract adjacent pairs of 16-bit integers in |
_mm256_hsub_epi32⚠ |
Horizontally substract adjacent pairs of 32-bit integers in |
_mm256_hsub_pd⚠ |
Horizontal subtraction of adjacent pairs in the two packed vectors
of 4 64-bit floating points |
_mm256_hsub_ps⚠ |
Horizontal subtraction of adjacent pairs in the two packed vectors
of 8 32-bit floating points |
_mm256_hsubs_epi16⚠ |
Horizontally subtract adjacent pairs of 16-bit integers in |
_mm256_insert_epi8⚠ |
Copy |
_mm256_insert_epi16⚠ |
Copy |
_mm256_insert_epi32⚠ |
Copy |
_mm256_insert_epi64⚠ |
Copy |
_mm256_insertf128_pd⚠ |
Copy |
_mm256_insertf128_ps⚠ |
Copy |
_mm256_insertf128_si256⚠ |
Copy |
_mm256_lddqu_si256⚠ |
Load 256-bits of integer data from unaligned memory into result.
This intrinsic may perform better than |
_mm256_loadu2_m128⚠ |
Load two 128-bit values (composed of 4 packed single-precision (32-bit)
floating-point elements) from memory, and combine them into a 256-bit
value.
|
_mm256_loadu2_m128d⚠ |
Load two 128-bit values (composed of 2 packed double-precision (64-bit)
floating-point elements) from memory, and combine them into a 256-bit
value.
|
_mm256_loadu2_m128i⚠ |
Load two 128-bit values (composed of integer data) from memory, and combine
them into a 256-bit value.
|
_mm256_loadu_pd⚠ |
Load 256-bits (composed of 4 packed double-precision (64-bit)
floating-point elements) from memory into result.
|
_mm256_loadu_ps⚠ |
Load 256-bits (composed of 8 packed single-precision (32-bit)
floating-point elements) from memory into result.
|
_mm256_loadu_si256⚠ |
Load 256-bits of integer data from memory into result.
|
_mm256_madd_epi16⚠ |
Multiply packed signed 16-bit integers in |
_mm256_maddubs_epi16⚠ |
Vertically multiply each unsigned 8-bit integer from |
_mm256_maskload_epi32⚠ |
Load packed 32-bit integers from memory pointed by |
_mm256_maskload_epi64⚠ |
Load packed 64-bit integers from memory pointed by |
_mm256_maskload_pd⚠ |
Load packed double-precision (64-bit) floating-point elements from memory
into result using |
_mm256_maskload_ps⚠ |
Load packed single-precision (32-bit) floating-point elements from memory
into result using |
_mm256_maskstore_epi32⚠ |
Store packed 32-bit integers from |
_mm256_maskstore_epi64⚠ |
Store packed 64-bit integers from |
_mm256_maskstore_pd⚠ |
Store packed double-precision (64-bit) floating-point elements from |
_mm256_maskstore_ps⚠ |
Store packed single-precision (32-bit) floating-point elements from |
_mm256_max_epi8⚠ |
Compare packed 8-bit integers in |
_mm256_max_epi16⚠ |
Compare packed 16-bit integers in |
_mm256_max_epi32⚠ |
Compare packed 32-bit integers in |
_mm256_max_epu8⚠ |
Compare packed unsigned 8-bit integers in |
_mm256_max_epu16⚠ |
Compare packed unsigned 16-bit integers in |
_mm256_max_epu32⚠ |
Compare packed unsigned 32-bit integers in |
_mm256_max_pd⚠ |
Compare packed double-precision (64-bit) floating-point elements
in |
_mm256_max_ps⚠ |
Compare packed single-precision (32-bit) floating-point elements in |
_mm256_min_epi8⚠ |
Compare packed 8-bit integers in |
_mm256_min_epi16⚠ |
Compare packed 16-bit integers in |
_mm256_min_epi32⚠ |
Compare packed 32-bit integers in |
_mm256_min_epu8⚠ |
Compare packed unsigned 8-bit integers in |
_mm256_min_epu16⚠ |
Compare packed unsigned 16-bit integers in |
_mm256_min_epu32⚠ |
Compare packed unsigned 32-bit integers in |
_mm256_min_pd⚠ |
Compare packed double-precision (64-bit) floating-point elements
in |
_mm256_min_ps⚠ |
Compare packed single-precision (32-bit) floating-point elements in |
_mm256_movedup_pd⚠ |
Duplicate even-indexed double-precision (64-bit) floating-point elements from "a", and return the results. |
_mm256_movehdup_ps⚠ |
Duplicate odd-indexed single-precision (32-bit) floating-point elements
from |
_mm256_moveldup_ps⚠ |
Duplicate even-indexed single-precision (32-bit) floating-point elements
from |
_mm256_movemask_epi8⚠ |
Create mask from the most significant bit of each 8-bit element in |
_mm256_movemask_pd⚠ |
Set each bit of the returned mask based on the most significant bit of the
corresponding packed double-precision (64-bit) floating-point element in
|
_mm256_movemask_ps⚠ |
Set each bit of the returned mask based on the most significant bit of the
corresponding packed single-precision (32-bit) floating-point element in
|
_mm256_mpsadbw_epu8⚠ |
Compute the sum of absolute differences (SADs) of quadruplets of unsigned
8-bit integers in |
_mm256_mul_epi32⚠ |
Multiply the low 32-bit integers from each packed 64-bit element in
|
_mm256_mul_epu32⚠ |
Multiply the low unsigned 32-bit integers from each packed 64-bit
element in |
_mm256_mul_pd⚠ |
Add packed double-precision (64-bit) floating-point elements
in |
_mm256_mul_ps⚠ |
Add packed single-precision (32-bit) floating-point elements in |
_mm256_mulhi_epi16⚠ |
Multiply the packed 16-bit integers in |
_mm256_mulhi_epu16⚠ |
Multiply the packed unsigned 16-bit integers in |
_mm256_mulhrs_epi16⚠ |
Multiply packed 16-bit integers in |
_mm256_mullo_epi16⚠ |
Multiply the packed 16-bit integers in |
_mm256_mullo_epi32⚠ |
Multiply the packed 32-bit integers in |
_mm256_or_pd⚠ |
Compute the bitwise OR packed double-precision (64-bit) floating-point
elements
in |
_mm256_or_ps⚠ |
Compute the bitwise OR packed single-precision (32-bit) floating-point
elements in |
_mm256_or_si256⚠ |
Compute the bitwise OR of 256 bits (representing integer data) in |
_mm256_packs_epi16⚠ |
Convert packed 16-bit integers from |
_mm256_packs_epi32⚠ |
Convert packed 32-bit integers from |
_mm256_packus_epi16⚠ |
Convert packed 16-bit integers from |
_mm256_packus_epi32⚠ |
Convert packed 32-bit integers from |
_mm256_permute2f128_pd⚠ |
Shuffle 256-bits (composed of 4 packed double-precision (64-bit)
floating-point elements) selected by |
_mm256_permute2f128_ps⚠ |
Shuffle 256-bits (composed of 8 packed single-precision (32-bit)
floating-point elements) selected by |
_mm256_permute2f128_si256⚠ |
Shuffle 258-bits (composed of integer data) selected by |
_mm256_permute4x64_epi64⚠ |
Permutes 64-bit integers from |
_mm256_permute_pd⚠ |
Shuffle double-precision (64-bit) floating-point elements in |
_mm256_permute_ps⚠ |
Shuffle single-precision (32-bit) floating-point elements in |
_mm256_permutevar8x32_epi32⚠ |
Permutes packed 32-bit integers from |
_mm256_permutevar_pd⚠ | |
_mm256_permutevar_ps⚠ |
Shuffle single-precision (32-bit) floating-point elements in |
_mm256_rcp_ps⚠ |
Compute the approximate reciprocal of packed single-precision (32-bit)
floating-point elements in |
_mm256_round_pd⚠ |
Round packed double-precision (64-bit) floating point elements in |
_mm256_round_ps⚠ |
Round packed single-precision (32-bit) floating point elements in |
_mm256_rsqrt_ps⚠ |
Compute the approximate reciprocal square root of packed single-precision
(32-bit) floating-point elements in |
_mm256_sad_epu8⚠ |
Compute the absolute differences of packed unsigned 8-bit integers in |
_mm256_set1_epi8⚠ |
Broadcast 8-bit integer |
_mm256_set1_epi16⚠ |
Broadcast 16-bit integer |
_mm256_set1_epi32⚠ |
Broadcast 32-bit integer |
_mm256_set1_epi64x⚠ |
Broadcast 64-bit integer |
_mm256_set1_pd⚠ |
Broadcast double-precision (64-bit) floating-point value |
_mm256_set1_ps⚠ |
Broadcast single-precision (32-bit) floating-point value |
_mm256_set_epi8⚠ |
Set packed 8-bit integers in returned vector with the supplied values in reverse order. |
_mm256_set_epi16⚠ |
Set packed 16-bit integers in returned vector with the supplied values. |
_mm256_set_epi32⚠ |
Set packed 32-bit integers in returned vector with the supplied values. |
_mm256_set_epi64x⚠ |
Set packed 64-bit integers in returned vector with the supplied values. |
_mm256_set_m128⚠ |
Set packed __m256 returned vector with the supplied values. |
_mm256_set_m128d⚠ |
Set packed __m256d returned vector with the supplied values. |
_mm256_set_m128i⚠ |
Set packed __m256i returned vector with the supplied values. |
_mm256_set_pd⚠ |
Set packed double-precision (64-bit) floating-point elements in returned vector with the supplied values. |
_mm256_set_ps⚠ |
Set packed single-precision (32-bit) floating-point elements in returned vector with the supplied values. |
_mm256_setr_epi8⚠ |
Set packed 8-bit integers in returned vector with the supplied values in reverse order. |
_mm256_setr_epi16⚠ |
Set packed 16-bit integers in returned vector with the supplied values in reverse order. |
_mm256_setr_epi32⚠ |
Set packed 32-bit integers in returned vector with the supplied values in reverse order. |
_mm256_setr_epi64x⚠ |
Set packed 64-bit integers in returned vector with the supplied values in reverse order. |
_mm256_setr_m128⚠ |
Set packed __m256 returned vector with the supplied values. |
_mm256_setr_m128d⚠ |
Set packed __m256d returned vector with the supplied values. |
_mm256_setr_m128i⚠ |
Set packed __m256i returned vector with the supplied values. |
_mm256_setr_pd⚠ |
Set packed double-precision (64-bit) floating-point elements in returned vector with the supplied values in reverse order. |
_mm256_setr_ps⚠ |
Set packed single-precision (32-bit) floating-point elements in returned vector with the supplied values in reverse order. |
_mm256_setzero_pd⚠ |
Return vector of type __m256d with all elements set to zero. |
_mm256_setzero_ps⚠ |
Return vector of type __m256 with all elements set to zero. |
_mm256_setzero_si256⚠ |
Return vector of type __m256i with all elements set to zero. |
_mm256_shuffle_epi8⚠ |
Shuffle bytes from |
_mm256_shuffle_epi32⚠ |
Shuffle 32-bit integers in 128-bit lanes of |
_mm256_shuffle_pd⚠ |
Shuffle double-precision (64-bit) floating-point elements within 128-bit
lanes using the control in |
_mm256_sign_epi8⚠ |
Negate packed 8-bit integers in |
_mm256_sign_epi16⚠ |
Negate packed 16-bit integers in |
_mm256_sign_epi32⚠ |
Negate packed 32-bit integers in |
_mm256_sll_epi16⚠ |
Shift packed 16-bit integers in |
_mm256_sll_epi32⚠ |
Shift packed 32-bit integers in |
_mm256_sll_epi64⚠ |
Shift packed 64-bit integers in |
_mm256_slli_epi16⚠ |
Shift packed 16-bit integers in |
_mm256_slli_epi32⚠ |
Shift packed 32-bit integers in |
_mm256_slli_epi64⚠ |
Shift packed 64-bit integers in |
_mm256_sllv_epi32⚠ |
Shift packed 32-bit integers in |
_mm256_sllv_epi64⚠ |
Shift packed 64-bit integers in |
_mm256_sqrt_pd⚠ |
Return the square root of packed double-precision (64-bit) floating point
elements in |
_mm256_sqrt_ps⚠ |
Return the square root of packed single-precision (32-bit) floating point
elements in |
_mm256_sra_epi16⚠ |
Shift packed 16-bit integers in |
_mm256_sra_epi32⚠ |
Shift packed 32-bit integers in |
_mm256_srai_epi16⚠ |
Shift packed 16-bit integers in |
_mm256_srai_epi32⚠ |
Shift packed 32-bit integers in |
_mm256_srav_epi32⚠ |
Shift packed 32-bit integers in |
_mm256_srl_epi16⚠ |
Shift packed 16-bit integers in |
_mm256_srl_epi32⚠ |
Shift packed 32-bit integers in |
_mm256_srl_epi64⚠ |
Shift packed 64-bit integers in |
_mm256_srli_epi16⚠ |
Shift packed 16-bit integers in |
_mm256_srli_epi32⚠ |
Shift packed 32-bit integers in |
_mm256_srli_epi64⚠ |
Shift packed 64-bit integers in |
_mm256_srlv_epi32⚠ |
Shift packed 32-bit integers in |
_mm256_srlv_epi64⚠ |
Shift packed 64-bit integers in |
_mm256_storeu2_m128⚠ |
Store the high and low 128-bit halves (each composed of 4 packed
single-precision (32-bit) floating-point elements) from |
_mm256_storeu2_m128d⚠ |
Store the high and low 128-bit halves (each composed of 2 packed
double-precision (64-bit) floating-point elements) from |
_mm256_storeu2_m128i⚠ |
Store the high and low 128-bit halves (each composed of integer data) from
|
_mm256_storeu_pd⚠ |
Store 256-bits (composed of 4 packed double-precision (64-bit)
floating-point elements) from |
_mm256_storeu_ps⚠ |
Store 256-bits (composed of 8 packed single-precision (32-bit)
floating-point elements) from |
_mm256_storeu_si256⚠ |
Store 256-bits of integer data from |
_mm256_sub_epi8⚠ |
Subtract packed 8-bit integers in |
_mm256_sub_epi16⚠ |
Subtract packed 16-bit integers in |
_mm256_sub_epi32⚠ |
Subtract packed 32-bit integers in |
_mm256_sub_epi64⚠ |
Subtract packed 64-bit integers in |
_mm256_sub_pd⚠ |
Subtract packed double-precision (64-bit) floating-point elements in |
_mm256_sub_ps⚠ |
Subtract packed single-precision (32-bit) floating-point elements in |
_mm256_subs_epi8⚠ |
Subtract packed 8-bit integers in |
_mm256_subs_epi16⚠ |
Subtract packed 16-bit integers in |
_mm256_subs_epu8⚠ |
Subtract packed unsigned 8-bit integers in |
_mm256_subs_epu16⚠ |
Subtract packed unsigned 16-bit integers in |
_mm256_testc_pd⚠ |
Compute the bitwise AND of 256 bits (representing double-precision (64-bit)
floating-point elements) in |
_mm256_testc_ps⚠ |
Compute the bitwise AND of 256 bits (representing single-precision (32-bit)
floating-point elements) in |
_mm256_testc_si256⚠ |
Compute the bitwise AND of 256 bits (representing integer data) in |
_mm256_testnzc_pd⚠ |
Compute the bitwise AND of 256 bits (representing double-precision (64-bit)
floating-point elements) in |
_mm256_testnzc_ps⚠ |
Compute the bitwise AND of 256 bits (representing single-precision (32-bit)
floating-point elements) in |
_mm256_testz_pd⚠ |
Compute the bitwise AND of 256 bits (representing double-precision (64-bit)
floating-point elements) in |
_mm256_testz_ps⚠ |
Compute the bitwise AND of 256 bits (representing single-precision (32-bit)
floating-point elements) in |
_mm256_testz_si256⚠ |
Compute the bitwise AND of 256 bits (representing integer data) in |
_mm256_undefined_pd⚠ |
Return vector of type |
_mm256_undefined_ps⚠ |
Return vector of type |
_mm256_undefined_si256⚠ |
Return vector of type __m256i with undefined elements. |
_mm256_unpackhi_epi8⚠ |
Unpack and interleave 8-bit integers from the high half of each
128-bit lane in |
_mm256_unpackhi_epi16⚠ |
Unpack and interleave 16-bit integers from the high half of each
128-bit lane of |
_mm256_unpackhi_epi32⚠ |
Unpack and interleave 32-bit integers from the high half of each
128-bit lane of |
_mm256_unpackhi_epi64⚠ |
Unpack and interleave 64-bit integers from the high half of each
128-bit lane of |
_mm256_unpackhi_pd⚠ |
Unpack and interleave double-precision (64-bit) floating-point elements
from the high half of each 128-bit lane in |
_mm256_unpackhi_ps⚠ |
Unpack and interleave single-precision (32-bit) floating-point elements
from the high half of each 128-bit lane in |
_mm256_unpacklo_epi8⚠ |
Unpack and interleave 8-bit integers from the low half of each
128-bit lane of |
_mm256_unpacklo_epi16⚠ |
Unpack and interleave 16-bit integers from the low half of each
128-bit lane of |
_mm256_unpacklo_epi32⚠ |
Unpack and interleave 32-bit integers from the low half of each
128-bit lane of |
_mm256_unpacklo_epi64⚠ |
Unpack and interleave 64-bit integers from the low half of each
128-bit lane of |
_mm256_unpacklo_pd⚠ |
Unpack and interleave double-precision (64-bit) floating-point elements
from the low half of each 128-bit lane in |
_mm256_unpacklo_ps⚠ |
Unpack and interleave single-precision (32-bit) floating-point elements
from the low half of each 128-bit lane in |
_mm256_xor_pd⚠ |
Compute the bitwise XOR of packed double-precision (64-bit) floating-point
elements in |
_mm256_xor_ps⚠ |
Compute the bitwise XOR of packed single-precision (32-bit) floating-point
elements in |
_mm256_xor_si256⚠ |
Compute the bitwise XOR of 256 bits (representing integer data)
in |
_mm256_zeroall⚠ |
Zero the contents of all XMM or YMM registers. |
_mm256_zeroupper⚠ |
Zero the upper 128 bits of all YMM registers; the lower 128-bits of the registers are unmodified. |
_mm256_zextpd128_pd256⚠ |
Constructs a 256-bit floating-point vector of [4 x double] from a 128-bit floating-point vector of [2 x double]. The lower 128 bits contain the value of the source vector. The upper 128 bits are set to zero. |
_mm256_zextps128_ps256⚠ |
Constructs a 256-bit floating-point vector of [8 x float] from a 128-bit floating-point vector of [4 x float]. The lower 128 bits contain the value of the source vector. The upper 128 bits are set to zero. |
_mm256_zextsi128_si256⚠ |
Constructs a 256-bit integer vector from a 128-bit integer vector. The lower 128 bits contain the value of the source vector. The upper 128 bits are set to zero. |
_mm_abs_epi8⚠ |
Compute the absolute value of packed 8-bit signed integers in |
_mm_abs_epi16⚠ |
Compute the absolute value of each of the packed 16-bit signed integers in
|
_mm_abs_epi32⚠ |
Compute the absolute value of each of the packed 32-bit signed integers in
|
_mm_add_epi8⚠ |
Add packed 8-bit integers in |
_mm_add_epi16⚠ |
Add packed 16-bit integers in |
_mm_add_epi32⚠ |
Add packed 32-bit integers in |
_mm_add_epi64⚠ |
Add packed 64-bit integers in |
_mm_add_pd⚠ |
Add packed double-precision (64-bit) floating-point elements in |
_mm_add_ps⚠ |
Adds f32x4 vectors. |
_mm_add_sd⚠ |
Return a new vector with the low element of |
_mm_add_ss⚠ |
Adds the first component of |
_mm_adds_epi8⚠ |
Add packed 8-bit integers in |
_mm_adds_epi16⚠ |
Add packed 16-bit integers in |
_mm_adds_epu8⚠ |
Add packed unsigned 8-bit integers in |
_mm_adds_epu16⚠ |
Add packed unsigned 16-bit integers in |
_mm_addsub_pd⚠ |
Alternatively add and subtract packed double-precision (64-bit)
floating-point elements in |
_mm_addsub_ps⚠ |
Alternatively add and subtract packed single-precision (32-bit)
floating-point elements in |
_mm_alignr_epi8⚠ |
Concatenate 16-byte blocks in |
_mm_and_pd⚠ |
Compute the bitwise AND of packed double-precision (64-bit) floating-point
elements in |
_mm_and_ps⚠ |
Bitwise AND of packed single-precision (32-bit) floating-point elements. |
_mm_and_si128⚠ |
Compute the bitwise AND of 128 bits (representing integer data) in |
_mm_andnot_pd⚠ |
Compute the bitwise NOT of |
_mm_andnot_ps⚠ |
Bitwise AND-NOT of packed single-precision (32-bit) floating-point elements. |
_mm_andnot_si128⚠ |
Compute the bitwise NOT of 128 bits (representing integer data) in |
_mm_avg_epu8⚠ |
Average packed unsigned 8-bit integers in |
_mm_avg_epu16⚠ |
Average packed unsigned 16-bit integers in |
_mm_blend_epi16⚠ | |
_mm_blend_epi32⚠ |
Blend packed 32-bit integers from |
_mm_blend_pd⚠ |
Blend packed double-precision (64-bit) floating-point elements from |
_mm_blend_ps⚠ |
Blend packed single-precision (32-bit) floating-point elements from |
_mm_blendv_epi8⚠ | |
_mm_blendv_pd⚠ |
Blend packed double-precision (64-bit) floating-point elements from |
_mm_blendv_ps⚠ |
Blend packed single-precision (32-bit) floating-point elements from |
_mm_broadcast_ss⚠ |
Broadcast a single-precision (32-bit) floating-point element from memory to all elements of the returned vector. |
_mm_broadcastb_epi8⚠ |
Broadcast the low packed 8-bit integer from |
_mm_broadcastd_epi32⚠ |
Broadcast the low packed 32-bit integer from |
_mm_broadcastq_epi64⚠ |
Broadcast the low packed 64-bit integer from |
_mm_broadcastsd_pd⚠ |
Broadcast the low double-precision (64-bit) floating-point element
from |
_mm_broadcastss_ps⚠ |
Broadcast the low single-precision (32-bit) floating-point element
from |
_mm_broadcastw_epi16⚠ |
Broadcast the low packed 16-bit integer from a to all elements of the 128-bit returned value |
_mm_bslli_si128⚠ |
Shift |
_mm_bsrli_si128⚠ |
Shift |
_mm_ceil_pd⚠ |
Round the packed double-precision (64-bit) floating-point elements in |
_mm_ceil_ps⚠ |
Round the packed single-precision (32-bit) floating-point elements in |
_mm_ceil_sd⚠ |
Round the lower double-precision (64-bit) floating-point element in |
_mm_ceil_ss⚠ |
Round the lower single-precision (32-bit) floating-point element in |
_mm_clflush⚠ |
Invalidate and flush the cache line that contains |
_mm_cmp_pd⚠ |
Compare packed double-precision (64-bit) floating-point
elements in |
_mm_cmp_ps⚠ |
Compare packed single-precision (32-bit) floating-point
elements in |
_mm_cmp_sd⚠ |
Compare the lower double-precision (64-bit) floating-point element in
|
_mm_cmp_ss⚠ |
Compare the lower single-precision (32-bit) floating-point element in
|
_mm_cmpeq_epi8⚠ |
Compare packed 8-bit integers in |
_mm_cmpeq_epi16⚠ |
Compare packed 16-bit integers in |
_mm_cmpeq_epi32⚠ |
Compare packed 32-bit integers in |
_mm_cmpeq_pd⚠ |
Compare corresponding elements in |
_mm_cmpeq_ps⚠ |
Compare each of the four floats in |
_mm_cmpeq_sd⚠ |
Return a new vector with the low element of |
_mm_cmpeq_ss⚠ |
Compare the lowest |
_mm_cmpestra⚠ |
Compare packed strings in |
_mm_cmpestrc⚠ |
Compare packed strings in |
_mm_cmpestri⚠ |
Compare packed strings |
_mm_cmpestrm⚠ |
Compare packed strings in |
_mm_cmpestro⚠ |
Compare packed strings in |
_mm_cmpestrs⚠ |
Compare packed strings in |
_mm_cmpestrz⚠ |
Compare packed strings in |
_mm_cmpge_pd⚠ |
Compare corresponding elements in |
_mm_cmpge_ps⚠ |
Compare each of the four floats in |
_mm_cmpge_sd⚠ |
Return a new vector with the low element of |
_mm_cmpge_ss⚠ |
Compare the lowest |
_mm_cmpgt_epi8⚠ |
Compare packed 8-bit integers in |
_mm_cmpgt_epi16⚠ |
Compare packed 16-bit integers in |
_mm_cmpgt_epi32⚠ |
Compare packed 32-bit integers in |
_mm_cmpgt_pd⚠ |
Compare corresponding elements in |
_mm_cmpgt_ps⚠ |
Compare each of the four floats in |
_mm_cmpgt_sd⚠ |
Return a new vector with the low element of |
_mm_cmpgt_ss⚠ |
Compare the lowest |
_mm_cmpistra⚠ |
Compare packed strings with implicit lengths in |
_mm_cmpistrc⚠ |
Compare packed strings with implicit lengths in |
_mm_cmpistri⚠ |
Compare packed strings with implicit lengths in |
_mm_cmpistrm⚠ |
Compare packed strings with implicit lengths in |
_mm_cmpistro⚠ |
Compare packed strings with implicit lengths in |
_mm_cmpistrs⚠ |
Compare packed strings with implicit lengths in |
_mm_cmpistrz⚠ |
Compare packed strings with implicit lengths in |
_mm_cmple_pd⚠ |
Compare corresponding elements in |
_mm_cmple_ps⚠ |
Compare each of the four floats in |
_mm_cmple_sd⚠ |
Return a new vector with the low element of |
_mm_cmple_ss⚠ |
Compare the lowest |
_mm_cmplt_epi8⚠ |
Compare packed 8-bit integers in |
_mm_cmplt_epi16⚠ |
Compare packed 16-bit integers in |
_mm_cmplt_epi32⚠ |
Compare packed 32-bit integers in |
_mm_cmplt_pd⚠ |
Compare corresponding elements in |
_mm_cmplt_ps⚠ |
Compare each of the four floats in |
_mm_cmplt_sd⚠ |
Return a new vector with the low element of |
_mm_cmplt_ss⚠ |
Compare the lowest |
_mm_cmpneq_pd⚠ |
Compare corresponding elements in |
_mm_cmpneq_ps⚠ |
Compare each of the four floats in |
_mm_cmpneq_sd⚠ |
Return a new vector with the low element of |
_mm_cmpneq_ss⚠ |
Compare the lowest |
_mm_cmpnge_pd⚠ |
Compare corresponding elements in |
_mm_cmpnge_ps⚠ |
Compare each of the four floats in |
_mm_cmpnge_sd⚠ |
Return a new vector with the low element of |
_mm_cmpnge_ss⚠ |
Compare the lowest |
_mm_cmpngt_pd⚠ |
Compare corresponding elements in |
_mm_cmpngt_ps⚠ |
Compare each of the four floats in |
_mm_cmpngt_sd⚠ |
Return a new vector with the low element of |
_mm_cmpngt_ss⚠ |
Compare the lowest |
_mm_cmpnle_pd⚠ |
Compare corresponding elements in |
_mm_cmpnle_ps⚠ |
Compare each of the four floats in |
_mm_cmpnle_sd⚠ |
Return a new vector with the low element of |
_mm_cmpnle_ss⚠ |
Compare the lowest |
_mm_cmpnlt_pd⚠ |
Compare corresponding elements in |
_mm_cmpnlt_ps⚠ |
Compare each of the four floats in |
_mm_cmpnlt_sd⚠ |
Return a new vector with the low element of |
_mm_cmpnlt_ss⚠ |
Compare the lowest |
_mm_cmpord_pd⚠ |
Compare corresponding elements in |
_mm_cmpord_ps⚠ |
Compare each of the four floats in |
_mm_cmpord_sd⚠ |
Return a new vector with the low element of |
_mm_cmpord_ss⚠ |
Check if the lowest |
_mm_cmpunord_pd⚠ |
Compare corresponding elements in |
_mm_cmpunord_ps⚠ |
Compare each of the four floats in |
_mm_cmpunord_sd⚠ |
Return a new vector with the low element of |
_mm_cmpunord_ss⚠ |
Check if the lowest |
_mm_comieq_sd⚠ |
Compare the lower element of |
_mm_comieq_ss⚠ |
Compare two 32-bit floats from the low-order bits of |
_mm_comige_sd⚠ |
Compare the lower element of |
_mm_comige_ss⚠ |
Compare two 32-bit floats from the low-order bits of |
_mm_comigt_sd⚠ |
Compare the lower element of |
_mm_comigt_ss⚠ |
Compare two 32-bit floats from the low-order bits of |
_mm_comile_sd⚠ |
Compare the lower element of |
_mm_comile_ss⚠ |
Compare two 32-bit floats from the low-order bits of |
_mm_comilt_sd⚠ |
Compare the lower element of |
_mm_comilt_ss⚠ |
Compare two 32-bit floats from the low-order bits of |
_mm_comineq_sd⚠ |
Compare the lower element of |
_mm_comineq_ss⚠ |
Compare two 32-bit floats from the low-order bits of |
_mm_crc32_u8⚠ |
Starting with the initial value in |
_mm_crc32_u16⚠ |
Starting with the initial value in |
_mm_crc32_u32⚠ |
Starting with the initial value in |
_mm_cvt_si2ss⚠ |
Alias for |
_mm_cvt_ss2si⚠ |
Alias for |
_mm_cvtepi32_pd⚠ |
Convert the lower two packed 32-bit integers in |
_mm_cvtepi32_ps⚠ |
Convert packed 32-bit integers in |
_mm_cvtpd_epi32⚠ |
Convert packed double-precision (64-bit) floating-point elements in |
_mm_cvtpd_ps⚠ |
Convert packed double-precision (64-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements |
_mm_cvtps_epi32⚠ |
Convert packed single-precision (32-bit) floating-point elements in |
_mm_cvtps_pd⚠ |
Convert packed single-precision (32-bit) floating-point elements in |
_mm_cvtsd_si32⚠ |
Convert the lower double-precision (64-bit) floating-point element in a to a 32-bit integer. |
_mm_cvtsd_ss⚠ |
Convert the lower double-precision (64-bit) floating-point element in |
_mm_cvtsi128_si32⚠ |
Return the lowest element of |
_mm_cvtsi32_sd⚠ |
Return |
_mm_cvtsi32_si128⚠ |
Return a vector whose lowest element is |
_mm_cvtsi32_ss⚠ |
Convert a 32 bit integer to a 32 bit float. The result vector is the input
vector |
_mm_cvtss_f32⚠ |
Extract the lowest 32 bit float from the input vector. |
_mm_cvtss_sd⚠ |
Convert the lower single-precision (32-bit) floating-point element in |
_mm_cvtss_si32⚠ |
Convert the lowest 32 bit float in the input vector to a 32 bit integer. |
_mm_cvtt_ss2si⚠ |
Alias for |
_mm_cvttpd_epi32⚠ |
Convert packed double-precision (64-bit) floating-point elements in |
_mm_cvttps_epi32⚠ |
Convert packed single-precision (32-bit) floating-point elements in |
_mm_cvttsd_si32⚠ |
Convert the lower double-precision (64-bit) floating-point element in |
_mm_cvttss_si32⚠ |
Convert the lowest 32 bit float in the input vector to a 32 bit integer with truncation. |
_mm_div_pd⚠ |
Divide packed double-precision (64-bit) floating-point elements in |
_mm_div_ps⚠ |
Divides f32x4 vectors. |
_mm_div_sd⚠ |
Return a new vector with the low element of |
_mm_div_ss⚠ |
Divides the first component of |
_mm_dp_pd⚠ |
Returns the dot product of two f64x2 vectors. |
_mm_dp_ps⚠ |
Returns the dot product of two f32x4 vectors. |
_mm_extract_epi8⚠ |
Extract an 8-bit integer from |
_mm_extract_epi16⚠ |
Return the |
_mm_extract_epi32⚠ |
Extract an 32-bit integer from |
_mm_extract_ps⚠ |
Extract a single-precision (32-bit) floating-point element from |
_mm_floor_pd⚠ |
Round the packed double-precision (64-bit) floating-point elements in |
_mm_floor_ps⚠ |
Round the packed single-precision (32-bit) floating-point elements in |
_mm_floor_sd⚠ |
Round the lower double-precision (64-bit) floating-point element in |
_mm_floor_ss⚠ |
Round the lower single-precision (32-bit) floating-point element in |
_mm_getcsr⚠ |
Get the unsigned 32-bit value of the MXCSR control and status register. |
_mm_hadd_epi16⚠ |
Horizontally add the adjacent pairs of values contained in 2 packed 128-bit vectors of [8 x i16]. |
_mm_hadd_epi32⚠ |
Horizontally add the adjacent pairs of values contained in 2 packed 128-bit vectors of [4 x i32]. |
_mm_hadd_pd⚠ |
Horizontally add adjacent pairs of double-precision (64-bit)
floating-point elements in |
_mm_hadd_ps⚠ |
Horizontally add adjacent pairs of single-precision (32-bit)
floating-point elements in |
_mm_hadds_epi16⚠ |
Horizontally add the adjacent pairs of values contained in 2 packed 128-bit vectors of [8 x i16]. Positive sums greater than 7FFFh are saturated to 7FFFh. Negative sums less than 8000h are saturated to 8000h. |
_mm_hsub_epi16⚠ |
Horizontally subtract the adjacent pairs of values contained in 2 packed 128-bit vectors of [8 x i16]. |
_mm_hsub_epi32⚠ |
Horizontally subtract the adjacent pairs of values contained in 2 packed 128-bit vectors of [4 x i32]. |
_mm_hsub_pd⚠ |
Horizontally subtract adjacent pairs of double-precision (64-bit)
floating-point elements in |
_mm_hsub_ps⚠ |
Horizontally add adjacent pairs of single-precision (32-bit)
floating-point elements in |
_mm_hsubs_epi16⚠ |
Horizontally subtract the adjacent pairs of values contained in 2 packed 128-bit vectors of [8 x i16]. Positive differences greater than 7FFFh are saturated to 7FFFh. Negative differences less than 8000h are saturated to 8000h. |
_mm_insert_epi8⚠ |
Return a copy of |
_mm_insert_epi16⚠ |
Return a new vector where the |
_mm_insert_epi32⚠ |
Return a copy of |
_mm_insert_ps⚠ |
Select a single value in |
_mm_lddqu_si128⚠ |
Load 128-bits of integer data from unaligned memory.
This intrinsic may perform better than |
_mm_lfence⚠ |
Perform a serializing operation on all load-from-memory instructions that were issued prior to this instruction. |
_mm_load1_pd⚠ |
Load a double-precision (64-bit) floating-point element from memory into both elements of returned vector. |
_mm_load1_ps⚠ |
Construct a |
_mm_load_pd⚠ |
Load 128-bits (composed of 2 packed double-precision (64-bit)
floating-point elements) from memory into the returned vector.
|
_mm_load_pd1⚠ |
Load a double-precision (64-bit) floating-point element from memory into both elements of returned vector. |
_mm_load_ps⚠ |
Load four |
_mm_load_ps1⚠ |
Alias for |
_mm_load_si128⚠ |
Load 128-bits of integer data from memory into a new vector. |
_mm_load_ss⚠ |
Construct a |
_mm_loaddup_pd⚠ |
Load a double-precision (64-bit) floating-point element from memory into both elements of return vector. |
_mm_loadh_pi⚠ |
Set the upper two single-precision floating-point values with 64 bits of
data loaded from the address |
_mm_loadl_epi64⚠ |
Load 64-bit integer from memory into first element of returned vector. |
_mm_loadl_pi⚠ |
Load two floats from |
_mm_loadr_pd⚠ |
Load 2 double-precision (64-bit) floating-point elements from memory into
the returned vector in reverse order. |
_mm_loadr_ps⚠ |
Load four |
_mm_loadu_pd⚠ |
Load 128-bits (composed of 2 packed double-precision (64-bit)
floating-point elements) from memory into the returned vector.
|
_mm_loadu_ps⚠ |
Load four |
_mm_loadu_si128⚠ |
Load 128-bits of integer data from memory into a new vector. |
_mm_madd_epi16⚠ |
Multiply and then horizontally add signed 16 bit integers in |
_mm_maddubs_epi16⚠ |
Multiply corresponding pairs of packed 8-bit unsigned integer values contained in the first source operand and packed 8-bit signed integer values contained in the second source operand, add pairs of contiguous products with signed saturation, and writes the 16-bit sums to the corresponding bits in the destination. |
_mm_maskload_epi32⚠ |
Load packed 32-bit integers from memory pointed by |
_mm_maskload_epi64⚠ |
Load packed 64-bit integers from memory pointed by |
_mm_maskload_pd⚠ |
Load packed double-precision (64-bit) floating-point elements from memory
into result using |
_mm_maskload_ps⚠ |
Load packed single-precision (32-bit) floating-point elements from memory
into result using |
_mm_maskmoveu_si128⚠ |
Conditionally store 8-bit integer elements from |
_mm_maskstore_epi32⚠ |
Store packed 32-bit integers from |
_mm_maskstore_epi64⚠ |
Store packed 64-bit integers from |
_mm_maskstore_pd⚠ |
Store packed double-precision (64-bit) floating-point elements from |
_mm_maskstore_ps⚠ |
Store packed single-precision (32-bit) floating-point elements from |
_mm_max_epi8⚠ |
Compare packed 8-bit integers in |
_mm_max_epi16⚠ |
Compare packed 16-bit integers in |
_mm_max_epi32⚠ | |
_mm_max_epu8⚠ |
Compare packed unsigned 8-bit integers in |
_mm_max_epu16⚠ |
Compare packed unsigned 16-bit integers in |
_mm_max_epu32⚠ | |
_mm_max_pd⚠ |
Return a new vector with the maximum values from corresponding elements in
|
_mm_max_ps⚠ |
Compare packed single-precision (32-bit) floating-point elements in |
_mm_max_sd⚠ |
Return a new vector with the low element of |
_mm_max_ss⚠ |
Compare the first single-precision (32-bit) floating-point element of |
_mm_mfence⚠ |
Perform a serializing operation on all load-from-memory and store-to-memory instructions that were issued prior to this instruction. |
_mm_min_epi16⚠ |
Compare packed 16-bit integers in |
_mm_min_epu8⚠ |
Compare packed unsigned 8-bit integers in |
_mm_min_pd⚠ |
Return a new vector with the minimum values from corresponding elements in
|
_mm_min_ps⚠ |
Compare packed single-precision (32-bit) floating-point elements in |
_mm_min_sd⚠ |
Return a new vector with the low element of |
_mm_min_ss⚠ |
Compare the first single-precision (32-bit) floating-point element of |
_mm_move_epi64⚠ |
Return a vector where the low element is extracted from |
_mm_move_ss⚠ |
Return a |
_mm_movedup_pd⚠ |
Duplicate the low double-precision (64-bit) floating-point element
from |
_mm_movehdup_ps⚠ |
Duplicate odd-indexed single-precision (32-bit) floating-point elements
from |
_mm_movehl_ps⚠ |
Combine higher half of |
_mm_moveldup_ps⚠ |
Duplicate even-indexed single-precision (32-bit) floating-point elements
from |
_mm_movelh_ps⚠ |
Combine lower half of |
_mm_movemask_epi8⚠ |
Return a mask of the most significant bit of each element in |
_mm_movemask_pd⚠ |
Return a mask of the most significant bit of each element in |
_mm_movemask_ps⚠ |
Return a mask of the most significant bit of each element in |
_mm_mul_epu32⚠ |
Multiply the low unsigned 32-bit integers from each packed 64-bit element
in |
_mm_mul_pd⚠ |
Multiply packed double-precision (64-bit) floating-point elements in |
_mm_mul_ps⚠ |
Multiplies f32x4 vectors. |
_mm_mul_sd⚠ |
Return a new vector with the low element of |
_mm_mul_ss⚠ |
Multiplies the first component of |
_mm_mulhi_epi16⚠ |
Multiply the packed 16-bit integers in |
_mm_mulhi_epu16⚠ |
Multiply the packed unsigned 16-bit integers in |
_mm_mulhrs_epi16⚠ |
Multiply packed 16-bit signed integer values, truncate the 32-bit product to the 18 most significant bits by right-shifting, round the truncated value by adding 1, and write bits [16:1] to the destination. |
_mm_mullo_epi16⚠ |
Multiply the packed 16-bit integers in |
_mm_or_pd⚠ |
Compute the bitwise OR of |
_mm_or_ps⚠ |
Bitwise OR of packed single-precision (32-bit) floating-point elements. |
_mm_or_si128⚠ |
Compute the bitwise OR of 128 bits (representing integer data) in |
_mm_packs_epi16⚠ |
Convert packed 16-bit integers from |
_mm_packs_epi32⚠ |
Convert packed 32-bit integers from |
_mm_packus_epi16⚠ |
Convert packed 16-bit integers from |
_mm_pause⚠ |
Provide a hint to the processor that the code sequence is a spin-wait loop. |
_mm_permute_pd⚠ |
Shuffle double-precision (64-bit) floating-point elements in |
_mm_permute_ps⚠ |
Shuffle single-precision (32-bit) floating-point elements in |
_mm_permutevar_pd⚠ |
Shuffle double-precision (64-bit) floating-point elements in |
_mm_permutevar_ps⚠ |
Shuffle single-precision (32-bit) floating-point elements in |
_mm_prefetch⚠ |
Fetch the cache line that contains address |
_mm_rcp_ps⚠ |
Return the approximate reciprocal of packed single-precision (32-bit)
floating-point elements in |
_mm_rcp_ss⚠ |
Return the approximate reciprocal of the first single-precision
(32-bit) floating-point element in |
_mm_round_pd⚠ |
Round the packed double-precision (64-bit) floating-point elements in |
_mm_round_ps⚠ |
Round the packed single-precision (32-bit) floating-point elements in |
_mm_round_sd⚠ |
Round the lower double-precision (64-bit) floating-point element in |
_mm_round_ss⚠ |
Round the lower single-precision (32-bit) floating-point element in |
_mm_rsqrt_ps⚠ |
Return the approximate reciprocal square root of packed single-precision
(32-bit) floating-point elements in |
_mm_rsqrt_ss⚠ |
Return the approximate reciprocal square root of the fist single-precision
(32-bit) floating-point elements in |
_mm_sad_epu8⚠ |
Sum the absolute differences of packed unsigned 8-bit integers. |
_mm_set1_epi8⚠ |
Broadcast 8-bit integer |
_mm_set1_epi16⚠ |
Broadcast 16-bit integer |
_mm_set1_epi32⚠ |
Broadcast 32-bit integer |
_mm_set1_epi64x⚠ |
Broadcast 64-bit integer |
_mm_set1_pd⚠ |
Broadcast double-precision (64-bit) floating-point value a to all elements of the return value. |
_mm_set1_ps⚠ |
Construct a |
_mm_set_epi8⚠ |
Set packed 8-bit integers with the supplied values. |
_mm_set_epi16⚠ |
Set packed 16-bit integers with the supplied values. |
_mm_set_epi32⚠ |
Set packed 32-bit integers with the supplied values. |
_mm_set_epi64x⚠ |
Set packed 64-bit integers with the supplied values, from highest to lowest. |
_mm_set_pd⚠ |
Set packed double-precision (64-bit) floating-point elements in the return value with the supplied values. |
_mm_set_pd1⚠ |
Broadcast double-precision (64-bit) floating-point value a to all elements of the return value. |
_mm_set_ps⚠ |
Construct a |
_mm_set_ps1⚠ |
Alias for |
_mm_set_sd⚠ |
Copy double-precision (64-bit) floating-point element |
_mm_set_ss⚠ |
Construct a |
_mm_setcsr⚠ |
Set the MXCSR register with the 32-bit unsigned integer value. |
_mm_setr_epi8⚠ |
Set packed 8-bit integers with the supplied values in reverse order. |
_mm_setr_epi16⚠ |
Set packed 16-bit integers with the supplied values in reverse order. |
_mm_setr_epi32⚠ |
Set packed 32-bit integers with the supplied values in reverse order. |
_mm_setr_pd⚠ |
Set packed double-precision (64-bit) floating-point elements in the return value with the supplied values in reverse order. |
_mm_setr_ps⚠ |
Construct a |
_mm_setzero_pd⚠ |
returns packed double-precision (64-bit) floating-point elements with all zeros. |
_mm_setzero_ps⚠ |
Construct a |
_mm_setzero_si128⚠ |
Returns a vector with all elements set to zero. |
_mm_sfence⚠ |
Perform a serializing operation on all store-to-memory instructions that were issued prior to this instruction. |
_mm_shuffle_epi8⚠ |
Shuffle bytes from |
_mm_shuffle_epi32⚠ |
Shuffle 32-bit integers in |
_mm_shuffle_ps⚠ |
Shuffle packed single-precision (32-bit) floating-point elements in |
_mm_shufflehi_epi16⚠ |
Shuffle 16-bit integers in the high 64 bits of |
_mm_shufflelo_epi16⚠ |
Shuffle 16-bit integers in the low 64 bits of |
_mm_sign_epi8⚠ |
Negate packed 8-bit integers in |
_mm_sign_epi16⚠ |
Negate packed 16-bit integers in |
_mm_sign_epi32⚠ |
Negate packed 32-bit integers in |
_mm_sll_epi16⚠ |
Shift packed 16-bit integers in |
_mm_sll_epi32⚠ |
Shift packed 32-bit integers in |
_mm_sll_epi64⚠ |
Shift packed 64-bit integers in |
_mm_slli_epi16⚠ |
Shift packed 16-bit integers in |
_mm_slli_epi32⚠ |
Shift packed 32-bit integers in |
_mm_slli_epi64⚠ |
Shift packed 64-bit integers in |
_mm_slli_si128⚠ |
Shift |
_mm_sllv_epi32⚠ |
Shift packed 32-bit integers in |
_mm_sllv_epi64⚠ |
Shift packed 64-bit integers in |
_mm_sqrt_pd⚠ |
Return a new vector with the square root of each of the values in |
_mm_sqrt_ps⚠ |
Return the square root of packed single-precision (32-bit) floating-point
elements in |
_mm_sqrt_sd⚠ |
Return a new vector with the low element of |
_mm_sqrt_ss⚠ |
Return the square root of the first single-precision (32-bit)
floating-point element in |
_mm_sra_epi16⚠ |
Shift packed 16-bit integers in |
_mm_sra_epi32⚠ |
Shift packed 32-bit integers in |
_mm_srai_epi16⚠ |
Shift packed 16-bit integers in |
_mm_srai_epi32⚠ |
Shift packed 32-bit integers in |
_mm_srav_epi32⚠ |
Shift packed 32-bit integers in |
_mm_srl_epi16⚠ |
Shift packed 16-bit integers in |
_mm_srl_epi32⚠ |
Shift packed 32-bit integers in |
_mm_srl_epi64⚠ |
Shift packed 64-bit integers in |
_mm_srli_epi16⚠ |
Shift packed 16-bit integers in |
_mm_srli_epi32⚠ |
Shift packed 32-bit integers in |
_mm_srli_epi64⚠ |
Shift packed 64-bit integers in |
_mm_srli_si128⚠ |
Shift |
_mm_srlv_epi32⚠ |
Shift packed 32-bit integers in |
_mm_srlv_epi64⚠ |
Shift packed 64-bit integers in |
_mm_store1_pd⚠ |
Store the lower double-precision (64-bit) floating-point element from |
_mm_store1_ps⚠ |
Store the lowest 32 bit float of |
_mm_store_pd⚠ |
Store 128-bits (composed of 2 packed double-precision (64-bit)
floating-point elements) from |
_mm_store_pd1⚠ |
Store the lower double-precision (64-bit) floating-point element from |
_mm_store_ps⚠ |
Store four 32-bit floats into aligned memory. |
_mm_store_ps1⚠ |
Alias for |
_mm_store_si128⚠ |
Store 128-bits of integer data from |
_mm_store_ss⚠ |
Store the lowest 32 bit float of |
_mm_storeh_pi⚠ |
Store the upper half of |
_mm_storel_epi64⚠ |
Store the lower 64-bit integer |
_mm_storel_pi⚠ |
Store the lower half of |
_mm_storer_pd⚠ |
Store 2 double-precision (64-bit) floating-point elements from |
_mm_storer_ps⚠ |
Store four 32-bit floats into aligned memory in reverse order. |
_mm_storeu_pd⚠ |
Store 128-bits (composed of 2 packed double-precision (64-bit)
floating-point elements) from |
_mm_storeu_ps⚠ |
Store four 32-bit floats into memory. There are no restrictions on memory
alignment. For aligned memory |
_mm_storeu_si128⚠ |
Store 128-bits of integer data from |
_mm_sub_epi8⚠ |
Subtract packed 8-bit integers in |
_mm_sub_epi16⚠ |
Subtract packed 16-bit integers in |
_mm_sub_epi32⚠ |
Subtract packed 32-bit integers in |
_mm_sub_epi64⚠ |
Subtract packed 64-bit integers in |
_mm_sub_pd⚠ |
Subtract packed double-precision (64-bit) floating-point elements in |
_mm_sub_ps⚠ |
Subtracts f32x4 vectors. |
_mm_sub_sd⚠ |
Return a new vector with the low element of |
_mm_sub_ss⚠ |
Subtracts the first component of |
_mm_subs_epi8⚠ |
Subtract packed 8-bit integers in |
_mm_subs_epi16⚠ |
Subtract packed 16-bit integers in |
_mm_subs_epu8⚠ |
Subtract packed unsigned 8-bit integers in |
_mm_subs_epu16⚠ |
Subtract packed unsigned 16-bit integers in |
_mm_testc_pd⚠ |
Compute the bitwise AND of 128 bits (representing double-precision (64-bit)
floating-point elements) in |
_mm_testc_ps⚠ |
Compute the bitwise AND of 128 bits (representing single-precision (32-bit)
floating-point elements) in |
_mm_testnzc_pd⚠ |
Compute the bitwise AND of 128 bits (representing double-precision (64-bit)
floating-point elements) in |
_mm_testnzc_ps⚠ |
Compute the bitwise AND of 128 bits (representing single-precision (32-bit)
floating-point elements) in |
_mm_testz_pd⚠ |
Compute the bitwise AND of 128 bits (representing double-precision (64-bit)
floating-point elements) in |
_mm_testz_ps⚠ |
Compute the bitwise AND of 128 bits (representing single-precision (32-bit)
floating-point elements) in |
_mm_tzcnt_u32⚠ |
Counts the number of trailing least significant zero bits. |
_mm_tzcnt_u64⚠ |
Counts the number of trailing least significant zero bits. |
_mm_ucomieq_sd⚠ |
Compare the lower element of |
_mm_ucomieq_ss⚠ |
Compare two 32-bit floats from the low-order bits of |
_mm_ucomige_sd⚠ |
Compare the lower element of |
_mm_ucomige_ss⚠ |
Compare two 32-bit floats from the low-order bits of |
_mm_ucomigt_sd⚠ |
Compare the lower element of |
_mm_ucomigt_ss⚠ |
Compare two 32-bit floats from the low-order bits of |
_mm_ucomile_sd⚠ |
Compare the lower element of |
_mm_ucomile_ss⚠ |
Compare two 32-bit floats from the low-order bits of |
_mm_ucomilt_sd⚠ |
Compare the lower element of |
_mm_ucomilt_ss⚠ |
Compare two 32-bit floats from the low-order bits of |
_mm_ucomineq_sd⚠ |
Compare the lower element of |
_mm_ucomineq_ss⚠ |
Compare two 32-bit floats from the low-order bits of |
_mm_undefined_pd⚠ |
Return vector of type __m128d with undefined elements. |
_mm_undefined_ps⚠ |
Return vector of type __m128 with undefined elements. |
_mm_undefined_si128⚠ |
Return vector of type __m128i with undefined elements. |
_mm_unpackhi_epi8⚠ |
Unpack and interleave 8-bit integers from the high half of |
_mm_unpackhi_epi16⚠ |
Unpack and interleave 16-bit integers from the high half of |
_mm_unpackhi_epi32⚠ |
Unpack and interleave 32-bit integers from the high half of |
_mm_unpackhi_epi64⚠ |
Unpack and interleave 64-bit integers from the high half of |
_mm_unpackhi_ps⚠ |
Unpack and interleave single-precision (32-bit) floating-point elements
from the higher half of |
_mm_unpacklo_epi8⚠ |
Unpack and interleave 8-bit integers from the low half of |
_mm_unpacklo_epi16⚠ |
Unpack and interleave 16-bit integers from the low half of |
_mm_unpacklo_epi32⚠ |
Unpack and interleave 32-bit integers from the low half of |
_mm_unpacklo_epi64⚠ |
Unpack and interleave 64-bit integers from the low half of |
_mm_unpacklo_ps⚠ |
Unpack and interleave single-precision (32-bit) floating-point elements
from the lower half of |
_mm_xor_pd⚠ |
Compute the bitwise OR of |
_mm_xor_ps⚠ |
Bitwise exclusive OR of packed single-precision (32-bit) floating-point elements. |
_mm_xor_si128⚠ |
Compute the bitwise XOR of 128 bits (representing integer data) in |
_mulx_u32⚠ |
Unsigned multiply without affecting flags. |
_pdep_u32⚠ |
Scatter contiguous low order bits of |
_pext_u32⚠ |
Gathers the bits of |
_popcnt32⚠ |
Counts the bits that are set. |
_popcnt64⚠ |
Counts the bits that are set. |
_t1mskc_u32⚠ |
Clears all bits below the least significant zero of |
_tzcnt_u16⚠ |
Counts the number of trailing least significant zero bits. |
_tzcnt_u32⚠ |
Counts the number of trailing least significant zero bits. |
_tzcnt_u64⚠ |
Counts the number of trailing least significant zero bits. |
_tzmsk_u32⚠ |
Sets all bits below the least significant one of |
Type Definitions
__m128i | |
__m256i |