Module stdsimd::vendor [−] [src]

Platform dependent vendor intrinsics.

Constants

_CMP_EQ_OQ
_CMP_EQ_OS
_CMP_EQ_UQ
_CMP_EQ_US
_CMP_FALSE_OQ
_CMP_FALSE_OS
_CMP_GE_OQ
_CMP_GE_OS
_CMP_GT_OQ
_CMP_GT_OS
_CMP_LE_OQ
_CMP_LE_OS
_CMP_LT_OQ
_CMP_LT_OS
_CMP_NEQ_OQ
_CMP_NEQ_OS
_CMP_NEQ_UQ
_CMP_NEQ_US
_CMP_NGE_UQ
_CMP_NGE_US
_CMP_NGT_UQ
_CMP_NGT_US
_CMP_NLE_UQ
_CMP_NLE_US
_CMP_NLT_UQ
_CMP_NLT_US
_CMP_ORD_Q
_CMP_ORD_S
_CMP_TRUE_UQ
_CMP_TRUE_US
_CMP_UNORD_Q
_CMP_UNORD_S
_MM_EXCEPT_DENORM	See `_mm_setcsr`
_MM_EXCEPT_DIV_ZERO	See `_mm_setcsr`
_MM_EXCEPT_INEXACT	See `_mm_setcsr`
_MM_EXCEPT_INVALID	See `_mm_setcsr`
_MM_EXCEPT_MASK
_MM_EXCEPT_OVERFLOW	See `_mm_setcsr`
_MM_EXCEPT_UNDERFLOW	See `_mm_setcsr`
_MM_FLUSH_ZERO_MASK
_MM_FLUSH_ZERO_OFF	See `_mm_setcsr`
_MM_FLUSH_ZERO_ON	See `_mm_setcsr`
_MM_FROUND_CEIL	round up and do not suppress exceptions
_MM_FROUND_CUR_DIRECTION	use MXCSR.RC; see `vendor::_MM_SET_ROUNDING_MODE`
_MM_FROUND_FLOOR	round down and do not suppress exceptions
_MM_FROUND_NEARBYINT	use MXCSR.RC and suppress exceptions; see `vendor::_MM_SET_ROUNDING_MODE`
_MM_FROUND_NINT	round to nearest and do not suppress exceptions
_MM_FROUND_NO_EXC	suppress exceptions
_MM_FROUND_RAISE_EXC	do not suppress exceptions
_MM_FROUND_RINT	use MXCSR.RC and do not suppress exceptions; see `vendor::_MM_SET_ROUNDING_MODE`
_MM_FROUND_TO_NEAREST_INT	round to nearest
_MM_FROUND_TO_NEG_INF	round down
_MM_FROUND_TO_POS_INF	round up
_MM_FROUND_TO_ZERO	truncate
_MM_FROUND_TRUNC	truncate and do not suppress exceptions
_MM_HINT_NTA	See `_mm_prefetch`.
_MM_HINT_T0	See `_mm_prefetch`.
_MM_HINT_T1	See `_mm_prefetch`.
_MM_HINT_T2	See `_mm_prefetch`.
_MM_MASK_DENORM	See `_mm_setcsr`
_MM_MASK_DIV_ZERO	See `_mm_setcsr`
_MM_MASK_INEXACT	See `_mm_setcsr`
_MM_MASK_INVALID	See `_mm_setcsr`
_MM_MASK_MASK
_MM_MASK_OVERFLOW	See `_mm_setcsr`
_MM_MASK_UNDERFLOW	See `_mm_setcsr`
_MM_ROUND_DOWN	See `_mm_setcsr`
_MM_ROUND_MASK
_MM_ROUND_NEAREST	See `_mm_setcsr`
_MM_ROUND_TOWARD_ZERO	See `_mm_setcsr`
_MM_ROUND_UP	See `_mm_setcsr`
_SIDD_BIT_MASK	Mask only: return the bit mask
_SIDD_CMP_EQUAL_ANY	For each character in `a`, find if it is in `b` (Default)
_SIDD_CMP_EQUAL_EACH	The strings defined by `a` and `b` are equal
_SIDD_CMP_EQUAL_ORDERED	Search for the defined substring in the target
_SIDD_CMP_RANGES	For each character in `a`, determine if `b[0] <= c <= b[1] or b[1] <= c <= b[2]...`
_SIDD_LEAST_SIGNIFICANT	Index only: return the least significant bit (Default)
_SIDD_MASKED_NEGATIVE_POLARITY	Negate results only before the end of the string
_SIDD_MASKED_POSITIVE_POLARITY	Do not negate results before the end of the string
_SIDD_MOST_SIGNIFICANT	Index only: return the most significant bit
_SIDD_NEGATIVE_POLARITY	Negate results
_SIDD_POSITIVE_POLARITY	Do not negate results (Default)
_SIDD_SBYTE_OPS	String contains signed 8-bit characters
_SIDD_SWORD_OPS	String contains unsigned 16-bit characters
_SIDD_UBYTE_OPS	String contains unsigned 8-bit characters (Default)
_SIDD_UNIT_MASK	Mask only: return the byte mask
_SIDD_UWORD_OPS	String contains unsigned 16-bit characters

Functions

_MM_GET_EXCEPTION_MASK ^⚠
_MM_GET_EXCEPTION_STATE ^⚠
_MM_GET_FLUSH_ZERO_MODE ^⚠
_MM_GET_ROUNDING_MODE ^⚠
_MM_SET_EXCEPTION_MASK ^⚠
_MM_SET_EXCEPTION_STATE ^⚠
_MM_SET_FLUSH_ZERO_MODE ^⚠
_MM_SET_ROUNDING_MODE ^⚠
_MM_TRANSPOSE4_PS ^⚠	Transpose the 4x4 matrix formed by 4 rows of f32x4 in place.
_andn_u32 ^⚠	Bitwise logical `AND` of inverted `a` with `b`.
_andn_u64 ^⚠	Bitwise logical `AND` of inverted `a` with `b`.
_bextr2_u32 ^⚠	Extracts bits of `a` specified by `control` into the least significant bits of the result.
_bextr_u32 ^⚠	Extracts bits in range [`start`, `start` + `length`) from `a` into the least significant bits of the result.
_blcfill_u32 ^⚠	Clears all bits below the least significant zero bit of `x`.
_blci_u32 ^⚠	Sets all bits of `x` to 1 except for the least significant zero bit.
_blcic_u32 ^⚠	Sets the least significant zero bit of `x` and clears all other bits.
_blcmsk_u32 ^⚠	Sets the least significant zero bit of `x` and clears all bits above that bit.
_blcs_u32 ^⚠	Sets the least significant zero bit of `x`.
_blsfill_u32 ^⚠	Sets all bits of `x` below the least significant one.
_blsi_u32 ^⚠	Extract lowest set isolated bit.
_blsic_u32 ^⚠	Clears least significant bit and sets all other bits.
_blsmsk_u32 ^⚠	Get mask up to lowest set bit.
_blsr_u32 ^⚠	Resets the lowest set bit of `x`.
_bzhi_u32 ^⚠	Zero higher bits of `a` >= `index`.
_lzcnt_u32 ^⚠	Counts the leading most significant zero bits.
_lzcnt_u64 ^⚠	Counts the leading most significant zero bits.
_mm256_abs_epi8 ^⚠	Computes the absolute values of packed 8-bit integers in `a`.
_mm256_abs_epi16 ^⚠	Computes the absolute values of packed 16-bit integers in `a`.
_mm256_abs_epi32 ^⚠	Computes the absolute values of packed 32-bit integers in `a`.
_mm256_add_epi8 ^⚠	Add packed 8-bit integers in `a` and `b`.
_mm256_add_epi16 ^⚠	Add packed 16-bit integers in `a` and `b`.
_mm256_add_epi32 ^⚠	Add packed 32-bit integers in `a` and `b`.
_mm256_add_epi64 ^⚠	Add packed 64-bit integers in `a` and `b`.
_mm256_add_pd ^⚠	Add packed double-precision (64-bit) floating-point elements in `a` and `b`.
_mm256_add_ps ^⚠	Add packed single-precision (32-bit) floating-point elements in `a` and `b`.
_mm256_adds_epi8 ^⚠	Add packed 8-bit integers in `a` and `b` using saturation.
_mm256_adds_epi16 ^⚠	Add packed 16-bit integers in `a` and `b` using saturation.
_mm256_adds_epu8 ^⚠	Add packed unsigned 8-bit integers in `a` and `b` using saturation.
_mm256_adds_epu16 ^⚠	Add packed unsigned 16-bit integers in `a` and `b` using saturation.
_mm256_addsub_pd ^⚠	Alternatively add and subtract packed double-precision (64-bit) floating-point elements in `a` to/from packed elements in `b`.
_mm256_addsub_ps ^⚠	Alternatively add and subtract packed single-precision (32-bit) floating-point elements in `a` to/from packed elements in `b`.
_mm256_alignr_epi8 ^⚠	Concatenate pairs of 16-byte blocks in `a` and `b` into a 32-byte temporary result, shift the result right by `n` bytes, and return the low 16 bytes.
_mm256_and_pd ^⚠	Compute the bitwise AND of a packed double-precision (64-bit) floating-point elements in `a` and `b`.
_mm256_and_ps ^⚠	Compute the bitwise AND of packed single-precision (32-bit) floating-point elements in `a` and `b`.
_mm256_and_si256 ^⚠	Compute the bitwise AND of 256 bits (representing integer data) in `a` and `b`.
_mm256_andnot_pd ^⚠	Compute the bitwise NOT of packed double-precision (64-bit) floating-point elements in `a` and then AND with `b`.
_mm256_andnot_ps ^⚠	Compute the bitwise NOT of packed single-precision (32-bit) floating-point elements in `a` and then AND with `b`.
_mm256_andnot_si256 ^⚠	Compute the bitwise NOT of 256 bits (representing integer data) in `a` and then AND with `b`.
_mm256_avg_epu8 ^⚠	Average packed unsigned 8-bit integers in `a` and `b`.
_mm256_avg_epu16 ^⚠	Average packed unsigned 16-bit integers in `a` and `b`.
_mm256_blend_epi16 ^⚠	Blend packed 16-bit integers from `a` and `b` using control mask `imm8`.
_mm256_blend_epi32 ^⚠	Blend packed 32-bit integers from `a` and `b` using control mask `imm8`.
_mm256_blend_pd ^⚠	Blend packed double-precision (64-bit) floating-point elements from `a` and `b` using control mask `imm8`.
_mm256_blendv_epi8 ^⚠	Blend packed 8-bit integers from `a` and `b` using `mask`.
_mm256_blendv_pd ^⚠	Blend packed double-precision (64-bit) floating-point elements from `a` and `b` using `c` as a mask.
_mm256_blendv_ps ^⚠	Blend packed single-precision (32-bit) floating-point elements from `a` and `b` using `c` as a mask.
_mm256_broadcast_pd ^⚠	Broadcast 128 bits from memory (composed of 2 packed double-precision (64-bit) floating-point elements) to all elements of the returned vector.
_mm256_broadcast_ps ^⚠	Broadcast 128 bits from memory (composed of 4 packed single-precision (32-bit) floating-point elements) to all elements of the returned vector.
_mm256_broadcast_sd ^⚠	Broadcast a double-precision (64-bit) floating-point element from memory to all elements of the returned vector.
_mm256_broadcast_ss ^⚠	Broadcast a single-precision (32-bit) floating-point element from memory to all elements of the returned vector.
_mm256_broadcastb_epi8 ^⚠	Broadcast the low packed 8-bit integer from `a` to all elements of the 256-bit returned value.
_mm256_broadcastd_epi32 ^⚠	Broadcast the low packed 32-bit integer from `a` to all elements of the 256-bit returned value.
_mm256_broadcastq_epi64 ^⚠	Broadcast the low packed 64-bit integer from `a` to all elements of the 256-bit returned value.
_mm256_broadcastsd_pd ^⚠	Broadcast the low double-precision (64-bit) floating-point element from `a` to all elements of the 256-bit returned value.
_mm256_broadcastsi128_si256 ^⚠	Broadcast 128 bits of integer data from a to all 128-bit lanes in the 256-bit returned value.
_mm256_broadcastss_ps ^⚠	Broadcast the low single-precision (32-bit) floating-point element from `a` to all elements of the 256-bit returned value.
_mm256_broadcastw_epi16 ^⚠	Broadcast the low packed 16-bit integer from a to all elements of the 256-bit returned value
_mm256_castpd128_pd256 ^⚠	Casts vector of type __m128d to type __m256d; the upper 128 bits of the result are undefined.
_mm256_castpd256_pd128 ^⚠	Casts vector of type __m256d to type __m128d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm256_castpd_ps ^⚠	Cast vector of type __m256d to type __m256.
_mm256_castpd_si256 ^⚠	Casts vector of type __m256d to type __m256i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm256_castps128_ps256 ^⚠	Casts vector of type __m128 to type __m256; the upper 128 bits of the result are undefined.
_mm256_castps256_ps128 ^⚠	Casts vector of type __m256 to type __m128. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm256_castps_pd ^⚠	Cast vector of type __m256 to type __m256d.
_mm256_castps_si256 ^⚠	Casts vector of type __m256 to type __m256i.
_mm256_castsi128_si256 ^⚠	Casts vector of type __m128i to type __m256i; the upper 128 bits of the result are undefined.
_mm256_castsi256_pd ^⚠	Casts vector of type __m256i to type __m256d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm256_castsi256_ps ^⚠	Casts vector of type __m256i to type __m256.
_mm256_castsi256_si128 ^⚠	Casts vector of type __m256i to type __m128i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
_mm256_ceil_pd ^⚠	Round packed double-precision (64-bit) floating point elements in `a` toward positive infinity.
_mm256_ceil_ps ^⚠	Round packed single-precision (32-bit) floating point elements in `a` toward positive infinity.
_mm256_cmp_pd ^⚠	Compare packed double-precision (64-bit) floating-point elements in `a` and `b` based on the comparison operand specified by `imm8`.
_mm256_cmp_ps ^⚠	Compare packed single-precision (32-bit) floating-point elements in `a` and `b` based on the comparison operand specified by `imm8`.
_mm256_cmpeq_epi8 ^⚠	Compare packed 8-bit integers in `a` and `b` for equality.
_mm256_cmpeq_epi16 ^⚠	Compare packed 16-bit integers in `a` and `b` for equality.
_mm256_cmpeq_epi32 ^⚠	Compare packed 32-bit integers in `a` and `b` for equality.
_mm256_cmpeq_epi64 ^⚠	Compare packed 64-bit integers in `a` and `b` for equality.
_mm256_cmpgt_epi8 ^⚠	Compare packed 8-bit integers in `a` and `b` for greater-than.
_mm256_cmpgt_epi16 ^⚠	Compare packed 16-bit integers in `a` and `b` for greater-than.
_mm256_cmpgt_epi32 ^⚠	Compare packed 32-bit integers in `a` and `b` for greater-than.
_mm256_cmpgt_epi64 ^⚠	Compare packed 64-bit integers in `a` and `b` for greater-than.
_mm256_cvtepi16_epi32 ^⚠	Sign-extend 16-bit integers to 32-bit integers.
_mm256_cvtepi16_epi64 ^⚠	Sign-extend 16-bit integers to 64-bit integers.
_mm256_cvtepi32_epi64 ^⚠	Sign-extend 32-bit integers to 64-bit integers.
_mm256_cvtepi32_pd ^⚠	Convert packed 32-bit integers in `a` to packed double-precision (64-bit) floating-point elements.
_mm256_cvtepi32_ps ^⚠	Convert packed 32-bit integers in `a` to packed single-precision (32-bit) floating-point elements.
_mm256_cvtepi8_epi16 ^⚠	Sign-extend 8-bit integers to 16-bit integers.
_mm256_cvtepi8_epi32 ^⚠	Sign-extend 8-bit integers to 32-bit integers.
_mm256_cvtepi8_epi64 ^⚠	Sign-extend 8-bit integers to 64-bit integers.
_mm256_cvtpd_epi32 ^⚠	Convert packed double-precision (64-bit) floating-point elements in `a` to packed 32-bit integers.
_mm256_cvtpd_ps ^⚠	Convert packed double-precision (64-bit) floating-point elements in `a` to packed single-precision (32-bit) floating-point elements.
_mm256_cvtps_epi32 ^⚠	Convert packed single-precision (32-bit) floating-point elements in `a` to packed 32-bit integers.
_mm256_cvtps_pd ^⚠	Convert packed single-precision (32-bit) floating-point elements in `a` to packed double-precision (64-bit) floating-point elements.
_mm256_cvttpd_epi32 ^⚠	Convert packed double-precision (64-bit) floating-point elements in `a` to packed 32-bit integers with truncation.
_mm256_cvttps_epi32 ^⚠	Convert packed single-precision (32-bit) floating-point elements in `a` to packed 32-bit integers with truncation.
_mm256_div_pd ^⚠	Compute the division of each of the 4 packed 64-bit floating-point elements in `a` by the corresponding packed elements in `b`.
_mm256_div_ps ^⚠	Compute the division of each of the 8 packed 32-bit floating-point elements in `a` by the corresponding packed elements in `b`.
_mm256_dp_ps ^⚠	Conditionally multiply the packed single-precision (32-bit) floating-point elements in `a` and `b` using the high 4 bits in `imm8`, sum the four products, and conditionally return the sum using the low 4 bits of `imm8`.
_mm256_extract_epi8 ^⚠	Extract an 8-bit integer from `a`, selected with `imm8`.
_mm256_extract_epi16 ^⚠	Extract a 16-bit integer from `a`, selected with `imm8`.
_mm256_extract_epi32 ^⚠	Extract a 32-bit integer from `a`, selected with `imm8`.
_mm256_extract_epi64 ^⚠	Extract a 64-bit integer from `a`, selected with `imm8`.
_mm256_extractf128_pd ^⚠	Extract 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from `a`, selected with `imm8`.
_mm256_extractf128_ps ^⚠	Extract 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from `a`, selected with `imm8`.
_mm256_extractf128_si256 ^⚠	Extract 128 bits (composed of integer data) from `a`, selected with `imm8`.
_mm256_floor_pd ^⚠	Round packed double-precision (64-bit) floating point elements in `a` toward negative infinity.
_mm256_floor_ps ^⚠	Round packed single-precision (32-bit) floating point elements in `a` toward negative infinity.
_mm256_hadd_epi16 ^⚠	Horizontally add adjacent pairs of 16-bit integers in `a` and `b`.
_mm256_hadd_epi32 ^⚠	Horizontally add adjacent pairs of 32-bit integers in `a` and `b`.
_mm256_hadd_pd ^⚠	Horizontal addition of adjacent pairs in the two packed vectors of 4 64-bit floating points `a` and `b`. In the result, sums of elements from `a` are returned in even locations, while sums of elements from `b` are returned in odd locations.
_mm256_hadd_ps ^⚠	Horizontal addition of adjacent pairs in the two packed vectors of 8 32-bit floating points `a` and `b`. In the result, sums of elements from `a` are returned in locations of indices 0, 1, 4, 5; while sums of elements from `b` are locations 2, 3, 6, 7.
_mm256_hadds_epi16 ^⚠	Horizontally add adjacent pairs of 16-bit integers in `a` and `b` using saturation.
_mm256_hsub_epi16 ^⚠	Horizontally substract adjacent pairs of 16-bit integers in `a` and `b`.
_mm256_hsub_epi32 ^⚠	Horizontally substract adjacent pairs of 32-bit integers in `a` and `b`.
_mm256_hsub_pd ^⚠	Horizontal subtraction of adjacent pairs in the two packed vectors of 4 64-bit floating points `a` and `b`. In the result, sums of elements from `a` are returned in even locations, while sums of elements from `b` are returned in odd locations.
_mm256_hsub_ps ^⚠	Horizontal subtraction of adjacent pairs in the two packed vectors of 8 32-bit floating points `a` and `b`. In the result, sums of elements from `a` are returned in locations of indices 0, 1, 4, 5; while sums of elements from `b` are locations 2, 3, 6, 7.
_mm256_hsubs_epi16 ^⚠	Horizontally subtract adjacent pairs of 16-bit integers in `a` and `b` using saturation.
_mm256_insert_epi8 ^⚠	Copy `a` to result, and insert the 8-bit integer `i` into result at the location specified by `index`.
_mm256_insert_epi16 ^⚠	Copy `a` to result, and insert the 16-bit integer `i` into result at the location specified by `index`.
_mm256_insert_epi32 ^⚠	Copy `a` to result, and insert the 32-bit integer `i` into result at the location specified by `index`.
_mm256_insert_epi64 ^⚠	Copy `a` to result, and insert the 64-bit integer `i` into result at the location specified by `index`.
_mm256_insertf128_pd ^⚠	Copy `a` to result, then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from `b` into result at the location specified by `imm8`.
_mm256_insertf128_ps ^⚠	Copy `a` to result, then insert 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from `b` into result at the location specified by `imm8`.
_mm256_insertf128_si256 ^⚠	Copy `a` to result, then insert 128 bits from `b` into result at the location specified by `imm8`.
_mm256_lddqu_si256 ^⚠	Load 256-bits of integer data from unaligned memory into result. This intrinsic may perform better than `_mm256_loadu_si256` when the data crosses a cache line boundary.
_mm256_loadu2_m128 ^⚠	Load two 128-bit values (composed of 4 packed single-precision (32-bit) floating-point elements) from memory, and combine them into a 256-bit value. `hiaddr` and `loaddr` do not need to be aligned on any particular boundary.
_mm256_loadu2_m128d ^⚠	Load two 128-bit values (composed of 2 packed double-precision (64-bit) floating-point elements) from memory, and combine them into a 256-bit value. `hiaddr` and `loaddr` do not need to be aligned on any particular boundary.
_mm256_loadu2_m128i ^⚠	Load two 128-bit values (composed of integer data) from memory, and combine them into a 256-bit value. `hiaddr` and `loaddr` do not need to be aligned on any particular boundary.
_mm256_loadu_pd ^⚠	Load 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from memory into result. `mem_addr` does not need to be aligned on any particular boundary.
_mm256_loadu_ps ^⚠	Load 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from memory into result. `mem_addr` does not need to be aligned on any particular boundary.
_mm256_loadu_si256 ^⚠	Load 256-bits of integer data from memory into result. `mem_addr` does not need to be aligned on any particular boundary.
_mm256_madd_epi16 ^⚠	Multiply packed signed 16-bit integers in `a` and `b`, producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers.
_mm256_maddubs_epi16 ^⚠	Vertically multiply each unsigned 8-bit integer from `a` with the corresponding signed 8-bit integer from `b`, producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers
_mm256_maskload_epi32 ^⚠	Load packed 32-bit integers from memory pointed by `mem_addr` using `mask` (elements are zeroed out when the highest bit is not set in the corresponding element).
_mm256_maskload_epi64 ^⚠	Load packed 64-bit integers from memory pointed by `mem_addr` using `mask` (elements are zeroed out when the highest bit is not set in the corresponding element).
_mm256_maskload_pd ^⚠	Load packed double-precision (64-bit) floating-point elements from memory into result using `mask` (elements are zeroed out when the high bit of the corresponding element is not set).
_mm256_maskload_ps ^⚠	Load packed single-precision (32-bit) floating-point elements from memory into result using `mask` (elements are zeroed out when the high bit of the corresponding element is not set).
_mm256_maskstore_epi32 ^⚠	Store packed 32-bit integers from `a` into memory pointed by `mem_addr` using `mask` (elements are not stored when the highest bit is not set in the corresponding element).
_mm256_maskstore_epi64 ^⚠	Store packed 64-bit integers from `a` into memory pointed by `mem_addr` using `mask` (elements are not stored when the highest bit is not set in the corresponding element).
_mm256_maskstore_pd ^⚠	Store packed double-precision (64-bit) floating-point elements from `a` into memory using `mask`.
_mm256_maskstore_ps ^⚠	Store packed single-precision (32-bit) floating-point elements from `a` into memory using `mask`.
_mm256_max_epi8 ^⚠	Compare packed 8-bit integers in `a` and `b`, and return the packed maximum values.
_mm256_max_epi16 ^⚠	Compare packed 16-bit integers in `a` and `b`, and return the packed maximum values.
_mm256_max_epi32 ^⚠	Compare packed 32-bit integers in `a` and `b`, and return the packed maximum values.
_mm256_max_epu8 ^⚠	Compare packed unsigned 8-bit integers in `a` and `b`, and return the packed maximum values.
_mm256_max_epu16 ^⚠	Compare packed unsigned 16-bit integers in `a` and `b`, and return the packed maximum values.
_mm256_max_epu32 ^⚠	Compare packed unsigned 32-bit integers in `a` and `b`, and return the packed maximum values.
_mm256_max_pd ^⚠	Compare packed double-precision (64-bit) floating-point elements in `a` and `b`, and return packed maximum values
_mm256_max_ps ^⚠	Compare packed single-precision (32-bit) floating-point elements in `a` and `b`, and return packed maximum values
_mm256_min_epi8 ^⚠	Compare packed 8-bit integers in `a` and `b`, and return the packed minimum values.
_mm256_min_epi16 ^⚠	Compare packed 16-bit integers in `a` and `b`, and return the packed minimum values.
_mm256_min_epi32 ^⚠	Compare packed 32-bit integers in `a` and `b`, and return the packed minimum values.
_mm256_min_epu8 ^⚠	Compare packed unsigned 8-bit integers in `a` and `b`, and return the packed minimum values.
_mm256_min_epu16 ^⚠	Compare packed unsigned 16-bit integers in `a` and `b`, and return the packed minimum values.
_mm256_min_epu32 ^⚠	Compare packed unsigned 32-bit integers in `a` and `b`, and return the packed minimum values.
_mm256_min_pd ^⚠	Compare packed double-precision (64-bit) floating-point elements in `a` and `b`, and return packed minimum values
_mm256_min_ps ^⚠	Compare packed single-precision (32-bit) floating-point elements in `a` and `b`, and return packed minimum values
_mm256_movedup_pd ^⚠	Duplicate even-indexed double-precision (64-bit) floating-point elements from "a", and return the results.
_mm256_movehdup_ps ^⚠	Duplicate odd-indexed single-precision (32-bit) floating-point elements from `a`, and return the results.
_mm256_moveldup_ps ^⚠	Duplicate even-indexed single-precision (32-bit) floating-point elements from `a`, and return the results.
_mm256_movemask_epi8 ^⚠	Create mask from the most significant bit of each 8-bit element in `a`, return the result.
_mm256_movemask_pd ^⚠	Set each bit of the returned mask based on the most significant bit of the corresponding packed double-precision (64-bit) floating-point element in `a`.
_mm256_movemask_ps ^⚠	Set each bit of the returned mask based on the most significant bit of the corresponding packed single-precision (32-bit) floating-point element in `a`.
_mm256_mpsadbw_epu8 ^⚠	Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in `a` compared to those in `b`, and store the 16-bit results in dst. Eight SADs are performed for each 128-bit lane using one quadruplet from `b` and eight quadruplets from `a`. One quadruplet is selected from `b` starting at on the offset specified in `imm8`. Eight quadruplets are formed from sequential 8-bit integers selected from `a` starting at the offset specified in `imm8`.
_mm256_mul_epi32 ^⚠	Multiply the low 32-bit integers from each packed 64-bit element in `a` and `b`
_mm256_mul_epu32 ^⚠	Multiply the low unsigned 32-bit integers from each packed 64-bit element in `a` and `b`
_mm256_mul_pd ^⚠	Add packed double-precision (64-bit) floating-point elements in `a` and `b`.
_mm256_mul_ps ^⚠	Add packed single-precision (32-bit) floating-point elements in `a` and `b`.
_mm256_mulhi_epi16 ^⚠	Multiply the packed 16-bit integers in `a` and `b`, producing intermediate 32-bit integers and returning the high 16 bits of the intermediate integers.
_mm256_mulhi_epu16 ^⚠	Multiply the packed unsigned 16-bit integers in `a` and `b`, producing intermediate 32-bit integers and returning the high 16 bits of the intermediate integers.
_mm256_mulhrs_epi16 ^⚠	Multiply packed 16-bit integers in `a` and `b`, producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and return bits [16:1]
_mm256_mullo_epi16 ^⚠	Multiply the packed 16-bit integers in `a` and `b`, producing intermediate 32-bit integers, and return the low 16 bits of the intermediate integers
_mm256_mullo_epi32 ^⚠	Multiply the packed 32-bit integers in `a` and `b`, producing intermediate 64-bit integers, and return the low 16 bits of the intermediate integers
_mm256_or_pd ^⚠	Compute the bitwise OR packed double-precision (64-bit) floating-point elements in `a` and `b`.
_mm256_or_ps ^⚠	Compute the bitwise OR packed single-precision (32-bit) floating-point elements in `a` and `b`.
_mm256_or_si256 ^⚠	Compute the bitwise OR of 256 bits (representing integer data) in `a` and `b`
_mm256_packs_epi16 ^⚠	Convert packed 16-bit integers from `a` and `b` to packed 8-bit integers using signed saturation
_mm256_packs_epi32 ^⚠	Convert packed 32-bit integers from `a` and `b` to packed 16-bit integers using signed saturation
_mm256_packus_epi16 ^⚠	Convert packed 16-bit integers from `a` and `b` to packed 8-bit integers using unsigned saturation
_mm256_packus_epi32 ^⚠	Convert packed 32-bit integers from `a` and `b` to packed 16-bit integers using unsigned saturation
_mm256_permute2f128_pd ^⚠	Shuffle 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) selected by `imm8` from `a` and `b`.
_mm256_permute2f128_ps ^⚠	Shuffle 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) selected by `imm8` from `a` and `b`.
_mm256_permute2f128_si256 ^⚠	Shuffle 258-bits (composed of integer data) selected by `imm8` from `a` and `b`.
_mm256_permute4x64_epi64 ^⚠	Permutes 64-bit integers from `a` using control mask `imm8`.
_mm256_permute_pd ^⚠	Shuffle double-precision (64-bit) floating-point elements in `a` within 128-bit lanes using the control in `imm8`.
_mm256_permute_ps ^⚠	Shuffle single-precision (32-bit) floating-point elements in `a` within 128-bit lanes using the control in `imm8`.
_mm256_permutevar8x32_epi32 ^⚠	Permutes packed 32-bit integers from `a` according to the content of `b`.
_mm256_permutevar_pd ^⚠
_mm256_permutevar_ps ^⚠	Shuffle single-precision (32-bit) floating-point elements in `a` within 128-bit lanes using the control in `b`.
_mm256_rcp_ps ^⚠	Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in `a`, and return the results. The maximum relative error for this approximation is less than 1.5*2^-12.
_mm256_round_pd ^⚠	Round packed double-precision (64-bit) floating point elements in `a` according to the flag `b`. The value of `b` may be as follows:
_mm256_round_ps ^⚠	Round packed single-precision (32-bit) floating point elements in `a` according to the flag `b`. The value of `b` may be as follows:
_mm256_rsqrt_ps ^⚠	Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in `a`, and return the results. The maximum relative error for this approximation is less than 1.5*2^-12.
_mm256_sad_epu8 ^⚠	Compute the absolute differences of packed unsigned 8-bit integers in `a` and `b`, then horizontally sum each consecutive 8 differences to produce four unsigned 16-bit integers, and pack these unsigned 16-bit integers in the low 16 bits of the 64-bit return value
_mm256_set1_epi8 ^⚠	Broadcast 8-bit integer `a` to all elements of returned vector. This intrinsic may generate the `vpbroadcastb`.
_mm256_set1_epi16 ^⚠	Broadcast 16-bit integer `a` to all all elements of returned vector. This intrinsic may generate the `vpbroadcastw`.
_mm256_set1_epi32 ^⚠	Broadcast 32-bit integer `a` to all elements of returned vector. This intrinsic may generate the `vpbroadcastd`.
_mm256_set1_epi64x ^⚠	Broadcast 64-bit integer `a` to all elements of returned vector. This intrinsic may generate the `vpbroadcastq`.
_mm256_set1_pd ^⚠	Broadcast double-precision (64-bit) floating-point value `a` to all elements of returned vector.
_mm256_set1_ps ^⚠	Broadcast single-precision (32-bit) floating-point value `a` to all elements of returned vector.
_mm256_set_epi8 ^⚠	Set packed 8-bit integers in returned vector with the supplied values in reverse order.
_mm256_set_epi16 ^⚠	Set packed 16-bit integers in returned vector with the supplied values.
_mm256_set_epi32 ^⚠	Set packed 32-bit integers in returned vector with the supplied values.
_mm256_set_epi64x ^⚠	Set packed 64-bit integers in returned vector with the supplied values.
_mm256_set_m128 ^⚠	Set packed __m256 returned vector with the supplied values.
_mm256_set_m128d ^⚠	Set packed __m256d returned vector with the supplied values.
_mm256_set_m128i ^⚠	Set packed __m256i returned vector with the supplied values.
_mm256_set_pd ^⚠	Set packed double-precision (64-bit) floating-point elements in returned vector with the supplied values.
_mm256_set_ps ^⚠	Set packed single-precision (32-bit) floating-point elements in returned vector with the supplied values.
_mm256_setr_epi8 ^⚠	Set packed 8-bit integers in returned vector with the supplied values in reverse order.
_mm256_setr_epi16 ^⚠	Set packed 16-bit integers in returned vector with the supplied values in reverse order.
_mm256_setr_epi32 ^⚠	Set packed 32-bit integers in returned vector with the supplied values in reverse order.
_mm256_setr_epi64x ^⚠	Set packed 64-bit integers in returned vector with the supplied values in reverse order.
_mm256_setr_m128 ^⚠	Set packed __m256 returned vector with the supplied values.
_mm256_setr_m128d ^⚠	Set packed __m256d returned vector with the supplied values.
_mm256_setr_m128i ^⚠	Set packed __m256i returned vector with the supplied values.
_mm256_setr_pd ^⚠	Set packed double-precision (64-bit) floating-point elements in returned vector with the supplied values in reverse order.
_mm256_setr_ps ^⚠	Set packed single-precision (32-bit) floating-point elements in returned vector with the supplied values in reverse order.
_mm256_setzero_pd ^⚠	Return vector of type __m256d with all elements set to zero.
_mm256_setzero_ps ^⚠	Return vector of type __m256 with all elements set to zero.
_mm256_setzero_si256 ^⚠	Return vector of type __m256i with all elements set to zero.
_mm256_shuffle_epi8 ^⚠	Shuffle bytes from `a` according to the content of `b`.
_mm256_shuffle_epi32 ^⚠	Shuffle 32-bit integers in 128-bit lanes of `a` using the control in `imm8`.
_mm256_shuffle_pd ^⚠	Shuffle double-precision (64-bit) floating-point elements within 128-bit lanes using the control in `imm8`.
_mm256_sign_epi8 ^⚠	Negate packed 8-bit integers in `a` when the corresponding signed 8-bit integer in `b` is negative, and return the results. Results are zeroed out when the corresponding element in `b` is zero.
_mm256_sign_epi16 ^⚠	Negate packed 16-bit integers in `a` when the corresponding signed 16-bit integer in `b` is negative, and return the results. Results are zeroed out when the corresponding element in `b` is zero.
_mm256_sign_epi32 ^⚠	Negate packed 32-bit integers in `a` when the corresponding signed 32-bit integer in `b` is negative, and return the results. Results are zeroed out when the corresponding element in `b` is zero.
_mm256_sll_epi16 ^⚠	Shift packed 16-bit integers in `a` left by `count` while shifting in zeros, and return the result
_mm256_sll_epi32 ^⚠	Shift packed 32-bit integers in `a` left by `count` while shifting in zeros, and return the result
_mm256_sll_epi64 ^⚠	Shift packed 64-bit integers in `a` left by `count` while shifting in zeros, and return the result
_mm256_slli_epi16 ^⚠	Shift packed 16-bit integers in `a` left by `imm8` while shifting in zeros, return the results;
_mm256_slli_epi32 ^⚠	Shift packed 32-bit integers in `a` left by `imm8` while shifting in zeros, return the results;
_mm256_slli_epi64 ^⚠	Shift packed 64-bit integers in `a` left by `imm8` while shifting in zeros, return the results;
_mm256_sllv_epi32 ^⚠	Shift packed 32-bit integers in `a` left by the amount specified by the corresponding element in `count` while shifting in zeros, and return the result.
_mm256_sllv_epi64 ^⚠	Shift packed 64-bit integers in `a` left by the amount specified by the corresponding element in `count` while shifting in zeros, and return the result.
_mm256_sqrt_pd ^⚠	Return the square root of packed double-precision (64-bit) floating point elements in `a`.
_mm256_sqrt_ps ^⚠	Return the square root of packed single-precision (32-bit) floating point elements in `a`.
_mm256_sra_epi16 ^⚠	Shift packed 16-bit integers in `a` right by `count` while shifting in sign bits.
_mm256_sra_epi32 ^⚠	Shift packed 32-bit integers in `a` right by `count` while shifting in sign bits.
_mm256_srai_epi16 ^⚠	Shift packed 16-bit integers in `a` right by `imm8` while shifting in sign bits.
_mm256_srai_epi32 ^⚠	Shift packed 32-bit integers in `a` right by `imm8` while shifting in sign bits.
_mm256_srav_epi32 ^⚠	Shift packed 32-bit integers in `a` right by the amount specified by the corresponding element in `count` while shifting in sign bits.
_mm256_srl_epi16 ^⚠	Shift packed 16-bit integers in `a` right by `count` while shifting in zeros.
_mm256_srl_epi32 ^⚠	Shift packed 32-bit integers in `a` right by `count` while shifting in zeros.
_mm256_srl_epi64 ^⚠	Shift packed 64-bit integers in `a` right by `count` while shifting in zeros.
_mm256_srli_epi16 ^⚠	Shift packed 16-bit integers in `a` right by `imm8` while shifting in zeros
_mm256_srli_epi32 ^⚠	Shift packed 32-bit integers in `a` right by `imm8` while shifting in zeros
_mm256_srli_epi64 ^⚠	Shift packed 64-bit integers in `a` right by `imm8` while shifting in zeros
_mm256_srlv_epi32 ^⚠	Shift packed 32-bit integers in `a` right by the amount specified by the corresponding element in `count` while shifting in zeros,
_mm256_srlv_epi64 ^⚠	Shift packed 64-bit integers in `a` right by the amount specified by the corresponding element in `count` while shifting in zeros,
_mm256_storeu2_m128 ^⚠	Store the high and low 128-bit halves (each composed of 4 packed single-precision (32-bit) floating-point elements) from `a` into memory two different 128-bit locations. `hiaddr` and `loaddr` do not need to be aligned on any particular boundary.
_mm256_storeu2_m128d ^⚠	Store the high and low 128-bit halves (each composed of 2 packed double-precision (64-bit) floating-point elements) from `a` into memory two different 128-bit locations. `hiaddr` and `loaddr` do not need to be aligned on any particular boundary.
_mm256_storeu2_m128i ^⚠	Store the high and low 128-bit halves (each composed of integer data) from `a` into memory two different 128-bit locations. `hiaddr` and `loaddr` do not need to be aligned on any particular boundary.
_mm256_storeu_pd ^⚠	Store 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from `a` into memory. `mem_addr` does not need to be aligned on any particular boundary.
_mm256_storeu_ps ^⚠	Store 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from `a` into memory. `mem_addr` does not need to be aligned on any particular boundary.
_mm256_storeu_si256 ^⚠	Store 256-bits of integer data from `a` into memory. `mem_addr` does not need to be aligned on any particular boundary.
_mm256_sub_epi8 ^⚠	Subtract packed 8-bit integers in `b` from packed 16-bit integers in `a`
_mm256_sub_epi16 ^⚠	Subtract packed 16-bit integers in `b` from packed 16-bit integers in `a`
_mm256_sub_epi32 ^⚠	Subtract packed 32-bit integers in `b` from packed 16-bit integers in `a`
_mm256_sub_epi64 ^⚠	Subtract packed 64-bit integers in `b` from packed 16-bit integers in `a`
_mm256_sub_pd ^⚠	Subtract packed double-precision (64-bit) floating-point elements in `b` from packed elements in `a`.
_mm256_sub_ps ^⚠	Subtract packed single-precision (32-bit) floating-point elements in `b` from packed elements in `a`.
_mm256_subs_epi8 ^⚠	Subtract packed 8-bit integers in `b` from packed 8-bit integers in `a` using saturation.
_mm256_subs_epi16 ^⚠	Subtract packed 16-bit integers in `b` from packed 16-bit integers in `a` using saturation.
_mm256_subs_epu8 ^⚠	Subtract packed unsigned 8-bit integers in `b` from packed 8-bit integers in `a` using saturation.
_mm256_subs_epu16 ^⚠	Subtract packed unsigned 16-bit integers in `b` from packed 16-bit integers in `a` using saturation.
_mm256_testc_pd ^⚠	Compute the bitwise AND of 256 bits (representing double-precision (64-bit) floating-point elements) in `a` and `b`, producing an intermediate 256-bit value, and set `ZF` to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set `ZF` to 0. Compute the bitwise NOT of `a` and then AND with `b`, producing an intermediate value, and set `CF` to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set `CF` to 0. Return the `CF` value.
_mm256_testc_ps ^⚠	Compute the bitwise AND of 256 bits (representing single-precision (32-bit) floating-point elements) in `a` and `b`, producing an intermediate 256-bit value, and set `ZF` to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set `ZF` to 0. Compute the bitwise NOT of `a` and then AND with `b`, producing an intermediate value, and set `CF` to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set `CF` to 0. Return the `CF` value.
_mm256_testc_si256 ^⚠	Compute the bitwise AND of 256 bits (representing integer data) in `a` and `b`, and set `ZF` to 1 if the result is zero, otherwise set `ZF` to 0. Compute the bitwise NOT of `a` and then AND with `b`, and set `CF` to 1 if the result is zero, otherwise set `CF` to 0. Return the `CF` value.
_mm256_testnzc_pd ^⚠	Compute the bitwise AND of 256 bits (representing double-precision (64-bit) floating-point elements) in `a` and `b`, producing an intermediate 256-bit value, and set `ZF` to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set `ZF` to 0. Compute the bitwise NOT of `a` and then AND with `b`, producing an intermediate value, and set `CF` to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set `CF` to 0. Return 1 if both the `ZF` and `CF` values are zero, otherwise return 0.
_mm256_testnzc_ps ^⚠	Compute the bitwise AND of 256 bits (representing single-precision (32-bit) floating-point elements) in `a` and `b`, producing an intermediate 256-bit value, and set `ZF` to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set `ZF` to 0. Compute the bitwise NOT of `a` and then AND with `b`, producing an intermediate value, and set `CF` to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set `CF` to 0. Return 1 if both the `ZF` and `CF` values are zero, otherwise return 0.
_mm256_testz_pd ^⚠	Compute the bitwise AND of 256 bits (representing double-precision (64-bit) floating-point elements) in `a` and `b`, producing an intermediate 256-bit value, and set `ZF` to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set `ZF` to 0. Compute the bitwise NOT of `a` and then AND with `b`, producing an intermediate value, and set `CF` to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set `CF` to 0. Return the `ZF` value.
_mm256_testz_ps ^⚠	Compute the bitwise AND of 256 bits (representing single-precision (32-bit) floating-point elements) in `a` and `b`, producing an intermediate 256-bit value, and set `ZF` to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set `ZF` to 0. Compute the bitwise NOT of `a` and then AND with `b`, producing an intermediate value, and set `CF` to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set `CF` to 0. Return the `ZF` value.
_mm256_testz_si256 ^⚠	Compute the bitwise AND of 256 bits (representing integer data) in `a` and `b`, and set `ZF` to 1 if the result is zero, otherwise set `ZF` to 0. Compute the bitwise NOT of `a` and then AND with `b`, and set `CF` to 1 if the result is zero, otherwise set `CF` to 0. Return the `ZF` value.
_mm256_undefined_pd ^⚠	Return vector of type `f64x4` with undefined elements.
_mm256_undefined_ps ^⚠	Return vector of type `f32x8` with undefined elements.
_mm256_undefined_si256 ^⚠	Return vector of type __m256i with undefined elements.
_mm256_unpackhi_epi8 ^⚠	Unpack and interleave 8-bit integers from the high half of each 128-bit lane in `a` and `b`.
_mm256_unpackhi_epi16 ^⚠	Unpack and interleave 16-bit integers from the high half of each 128-bit lane of `a` and `b`.
_mm256_unpackhi_epi32 ^⚠	Unpack and interleave 32-bit integers from the high half of each 128-bit lane of `a` and `b`.
_mm256_unpackhi_epi64 ^⚠	Unpack and interleave 64-bit integers from the high half of each 128-bit lane of `a` and `b`.
_mm256_unpackhi_pd ^⚠	Unpack and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in `a` and `b`.
_mm256_unpackhi_ps ^⚠	Unpack and interleave single-precision (32-bit) floating-point elements from the high half of each 128-bit lane in `a` and `b`.
_mm256_unpacklo_epi8 ^⚠	Unpack and interleave 8-bit integers from the low half of each 128-bit lane of `a` and `b`.
_mm256_unpacklo_epi16 ^⚠	Unpack and interleave 16-bit integers from the low half of each 128-bit lane of `a` and `b`.
_mm256_unpacklo_epi32 ^⚠	Unpack and interleave 32-bit integers from the low half of each 128-bit lane of `a` and `b`.
_mm256_unpacklo_epi64 ^⚠	Unpack and interleave 64-bit integers from the low half of each 128-bit lane of `a` and `b`.
_mm256_unpacklo_pd ^⚠	Unpack and interleave double-precision (64-bit) floating-point elements from the low half of each 128-bit lane in `a` and `b`.
_mm256_unpacklo_ps ^⚠	Unpack and interleave single-precision (32-bit) floating-point elements from the low half of each 128-bit lane in `a` and `b`.
_mm256_xor_pd ^⚠	Compute the bitwise XOR of packed double-precision (64-bit) floating-point elements in `a` and `b`.
_mm256_xor_ps ^⚠	Compute the bitwise XOR of packed single-precision (32-bit) floating-point elements in `a` and `b`.
_mm256_xor_si256 ^⚠	Compute the bitwise XOR of 256 bits (representing integer data) in `a` and `b`
_mm256_zeroall ^⚠	Zero the contents of all XMM or YMM registers.
_mm256_zeroupper ^⚠	Zero the upper 128 bits of all YMM registers; the lower 128-bits of the registers are unmodified.
_mm256_zextpd128_pd256 ^⚠	Constructs a 256-bit floating-point vector of [4 x double] from a 128-bit floating-point vector of [2 x double]. The lower 128 bits contain the value of the source vector. The upper 128 bits are set to zero.
_mm256_zextps128_ps256 ^⚠	Constructs a 256-bit floating-point vector of [8 x float] from a 128-bit floating-point vector of [4 x float]. The lower 128 bits contain the value of the source vector. The upper 128 bits are set to zero.
_mm256_zextsi128_si256 ^⚠	Constructs a 256-bit integer vector from a 128-bit integer vector. The lower 128 bits contain the value of the source vector. The upper 128 bits are set to zero.
_mm_abs_epi8 ^⚠	Compute the absolute value of packed 8-bit signed integers in `a` and return the unsigned results.
_mm_abs_epi16 ^⚠	Compute the absolute value of each of the packed 16-bit signed integers in `a` and return the 16-bit unsigned integer
_mm_abs_epi32 ^⚠	Compute the absolute value of each of the packed 32-bit signed integers in `a` and return the 32-bit unsigned integer
_mm_add_epi8 ^⚠	Add packed 8-bit integers in `a` and `b`.
_mm_add_epi16 ^⚠	Add packed 16-bit integers in `a` and `b`.
_mm_add_epi32 ^⚠	Add packed 32-bit integers in `a` and `b`.
_mm_add_epi64 ^⚠	Add packed 64-bit integers in `a` and "b`.
_mm_add_pd ^⚠	Add packed double-precision (64-bit) floating-point elements in `a` and `b`.
_mm_add_ps ^⚠	Adds f32x4 vectors.
_mm_add_sd ^⚠	Return a new vector with the low element of `a` replaced by the sum of the low elements of `a` and `b`.
_mm_add_ss ^⚠	Adds the first component of `a` and `b`, the other components are copied from `a`.
_mm_adds_epi8 ^⚠	Add packed 8-bit integers in `a` and `b` using saturation.
_mm_adds_epi16 ^⚠	Add packed 16-bit integers in `a` and `b` using saturation.
_mm_adds_epu8 ^⚠	Add packed unsigned 8-bit integers in `a` and `b` using saturation.
_mm_adds_epu16 ^⚠	Add packed unsigned 16-bit integers in `a` and `b` using saturation.
_mm_addsub_pd ^⚠	Alternatively add and subtract packed double-precision (64-bit) floating-point elements in `a` to/from packed elements in `b`.
_mm_addsub_ps ^⚠	Alternatively add and subtract packed single-precision (32-bit) floating-point elements in `a` to/from packed elements in `b`.
_mm_alignr_epi8 ^⚠	Concatenate 16-byte blocks in `a` and `b` into a 32-byte temporary result, shift the result right by `n` bytes, and return the low 16 bytes.
_mm_and_pd ^⚠	Compute the bitwise AND of packed double-precision (64-bit) floating-point elements in `a` and `b`.
_mm_and_ps ^⚠	Bitwise AND of packed single-precision (32-bit) floating-point elements.
_mm_and_si128 ^⚠	Compute the bitwise AND of 128 bits (representing integer data) in `a` and `b`.
_mm_andnot_pd ^⚠	Compute the bitwise NOT of `a` and then AND with `b`.
_mm_andnot_ps ^⚠	Bitwise AND-NOT of packed single-precision (32-bit) floating-point elements.
_mm_andnot_si128 ^⚠	Compute the bitwise NOT of 128 bits (representing integer data) in `a` and then AND with `b`.
_mm_avg_epu8 ^⚠	Average packed unsigned 8-bit integers in `a` and `b`.
_mm_avg_epu16 ^⚠	Average packed unsigned 16-bit integers in `a` and `b`.
_mm_blend_epi16 ^⚠
_mm_blend_epi32 ^⚠	Blend packed 32-bit integers from `a` and `b` using control mask `imm8`.
_mm_blend_pd ^⚠	Blend packed double-precision (64-bit) floating-point elements from `a` and `b` using control mask `imm2`
_mm_blend_ps ^⚠	Blend packed single-precision (32-bit) floating-point elements from `a` and `b` using mask `imm4`
_mm_blendv_epi8 ^⚠
_mm_blendv_pd ^⚠	Blend packed double-precision (64-bit) floating-point elements from `a` and `b` using `mask`
_mm_blendv_ps ^⚠	Blend packed single-precision (32-bit) floating-point elements from `a` and `b` using `mask`
_mm_broadcast_ss ^⚠	Broadcast a single-precision (32-bit) floating-point element from memory to all elements of the returned vector.
_mm_broadcastb_epi8 ^⚠	Broadcast the low packed 8-bit integer from `a` to all elements of the 128-bit returned value.
_mm_broadcastd_epi32 ^⚠	Broadcast the low packed 32-bit integer from `a` to all elements of the 128-bit returned value.
_mm_broadcastq_epi64 ^⚠	Broadcast the low packed 64-bit integer from `a` to all elements of the 128-bit returned value.
_mm_broadcastsd_pd ^⚠	Broadcast the low double-precision (64-bit) floating-point element from `a` to all elements of the 128-bit returned value.
_mm_broadcastss_ps ^⚠	Broadcast the low single-precision (32-bit) floating-point element from `a` to all elements of the 128-bit returned value.
_mm_broadcastw_epi16 ^⚠	Broadcast the low packed 16-bit integer from a to all elements of the 128-bit returned value
_mm_bslli_si128 ^⚠	Shift `a` left by `imm8` bytes while shifting in zeros.
_mm_bsrli_si128 ^⚠	Shift `a` right by `imm8` bytes while shifting in zeros.
_mm_ceil_pd ^⚠	Round the packed double-precision (64-bit) floating-point elements in `a` up to an integer value, and store the results as packed double-precision floating-point elements.
_mm_ceil_ps ^⚠	Round the packed single-precision (32-bit) floating-point elements in `a` up to an integer value, and store the results as packed single-precision floating-point elements.
_mm_ceil_sd ^⚠	Round the lower double-precision (64-bit) floating-point element in `b` up to an integer value, store the result as a double-precision floating-point element in the lower element of the intrisic result, and copy the upper element from `a` to the upper element of the intrinsic result.
_mm_ceil_ss ^⚠	Round the lower single-precision (32-bit) floating-point element in `b` up to an integer value, store the result as a single-precision floating-point element in the lower element of the intrinsic result, and copy the upper 3 packed elements from `a` to the upper elements of the intrinsic result.
_mm_clflush ^⚠	Invalidate and flush the cache line that contains `p` from all levels of the cache hierarchy.
_mm_cmp_pd ^⚠	Compare packed double-precision (64-bit) floating-point elements in `a` and `b` based on the comparison operand specified by `imm8`.
_mm_cmp_ps ^⚠	Compare packed single-precision (32-bit) floating-point elements in `a` and `b` based on the comparison operand specified by `imm8`.
_mm_cmp_sd ^⚠	Compare the lower double-precision (64-bit) floating-point element in `a` and `b` based on the comparison operand specified by `imm8`, store the result in the lower element of returned vector, and copy the upper element from `a` to the upper element of returned vector.
_mm_cmp_ss ^⚠	Compare the lower single-precision (32-bit) floating-point element in `a` and `b` based on the comparison operand specified by `imm8`, store the result in the lower element of returned vector, and copy the upper 3 packed elements from `a` to the upper elements of returned vector.
_mm_cmpeq_epi8 ^⚠	Compare packed 8-bit integers in `a` and `b` for equality.
_mm_cmpeq_epi16 ^⚠	Compare packed 16-bit integers in `a` and `b` for equality.
_mm_cmpeq_epi32 ^⚠	Compare packed 32-bit integers in `a` and `b` for equality.
_mm_cmpeq_pd ^⚠	Compare corresponding elements in `a` and `b` for equality.
_mm_cmpeq_ps ^⚠	Compare each of the four floats in `a` to the corresponding element in `b`. The result in the output vector will be `0xffffffff` if the input elements were equal, or `0` otherwise.
_mm_cmpeq_sd ^⚠	Return a new vector with the low element of `a` replaced by the equality comparison of the lower elements of `a` and `b`.
_mm_cmpeq_ss ^⚠	Compare the lowest `f32` of both inputs for equality. The lowest 32 bits of the result will be `0xffffffff` if the two inputs are equal, or `0` otherwise. The upper 96 bits of the result are the upper 96 bits of `a`.
_mm_cmpestra ^⚠	Compare packed strings in `a` and `b` with lengths `la` and `lb` using the control in `imm8`, and return `1` if `b` did not contain a null character and the resulting mask was zero, and `0` otherwise.
_mm_cmpestrc ^⚠	Compare packed strings in `a` and `b` with lengths `la` and `lb` using the control in `imm8`, and return `1` if the resulting mask was non-zero, and `0` otherwise.
_mm_cmpestri ^⚠	Compare packed strings `a` and `b` with lengths `la` and `lb` using the control in `imm8`, and return the generated index. Similar to `_mm_cmpistri` with the excception that `_mm_cmpistri` implicityly determines the length of `a` and `b`.
_mm_cmpestrm ^⚠	Compare packed strings in `a` and `b` with lengths `la` and `lb` using the control in `imm8`, and return the generated mask.
_mm_cmpestro ^⚠	Compare packed strings in `a` and `b` with lengths `la` and `lb` using the control in `imm8`, and return bit `0` of the resulting bit mask.
_mm_cmpestrs ^⚠	Compare packed strings in `a` and `b` with lengths `la` and `lb` using the control in `imm8`, and return `1` if any character in a was null, and `0` otherwise.
_mm_cmpestrz ^⚠	Compare packed strings in `a` and `b` with lengths `la` and `lb` using the control in `imm8`, and return `1` if any character in `b` was null, and `0` otherwise.
_mm_cmpge_pd ^⚠	Compare corresponding elements in `a` and `b` for greater-than-or-equal.
_mm_cmpge_ps ^⚠	Compare each of the four floats in `a` to the corresponding element in `b`. The result in the output vector will be `0xffffffff` if the input element in `a` is greater than or equal to the corresponding element in `b`, or `0` otherwise.
_mm_cmpge_sd ^⚠	Return a new vector with the low element of `a` replaced by the greater-than-or-equal comparison of the lower elements of `a` and `b`.
_mm_cmpge_ss ^⚠	Compare the lowest `f32` of both inputs for greater than or equal. The lowest 32 bits of the result will be `0xffffffff` if `a.extract(0)` is greater than or equal `b.extract(0)`, or `0` otherwise. The upper 96 bits of the result are the upper 96 bits of `a`.
_mm_cmpgt_epi8 ^⚠	Compare packed 8-bit integers in `a` and `b` for greater-than.
_mm_cmpgt_epi16 ^⚠	Compare packed 16-bit integers in `a` and `b` for greater-than.
_mm_cmpgt_epi32 ^⚠	Compare packed 32-bit integers in `a` and `b` for greater-than.
_mm_cmpgt_pd ^⚠	Compare corresponding elements in `a` and `b` for greater-than.
_mm_cmpgt_ps ^⚠	Compare each of the four floats in `a` to the corresponding element in `b`. The result in the output vector will be `0xffffffff` if the input element in `a` is greater than the corresponding element in `b`, or `0` otherwise.
_mm_cmpgt_sd ^⚠	Return a new vector with the low element of `a` replaced by the greater-than comparison of the lower elements of `a` and `b`.
_mm_cmpgt_ss ^⚠	Compare the lowest `f32` of both inputs for greater than. The lowest 32 bits of the result will be `0xffffffff` if `a.extract(0)` is greater than `b.extract(0)`, or `0` otherwise. The upper 96 bits of the result are the upper 96 bits of `a`.
_mm_cmpistra ^⚠	Compare packed strings with implicit lengths in `a` and `b` using the control in `imm8`, and return `1` if `b` did not contain a null character and the resulting mask was zero, and `0` otherwise.
_mm_cmpistrc ^⚠	Compare packed strings with implicit lengths in `a` and `b` using the control in `imm8`, and return `1` if the resulting mask was non-zero, and `0` otherwise.
_mm_cmpistri ^⚠	Compare packed strings with implicit lengths in `a` and `b` using the control in `imm8`, and return the generated index. Similar to `_mm_cmpestri` with the excception that `_mm_cmpestri` requires the lengths of `a` and `b` to be explicitly specified.
_mm_cmpistrm ^⚠	Compare packed strings with implicit lengths in `a` and `b` using the control in `imm8`, and return the generated mask.
_mm_cmpistro ^⚠	Compare packed strings with implicit lengths in `a` and `b` using the control in `imm8`, and return bit `0` of the resulting bit mask.
_mm_cmpistrs ^⚠	Compare packed strings with implicit lengths in `a` and `b` using the control in `imm8`, and returns `1` if any character in `a` was null, and `0` otherwise.
_mm_cmpistrz ^⚠	Compare packed strings with implicit lengths in `a` and `b` using the control in `imm8`, and return `1` if any character in `b` was null. and `0` otherwise.
_mm_cmple_pd ^⚠	Compare corresponding elements in `a` and `b` for less-than-or-equal
_mm_cmple_ps ^⚠	Compare each of the four floats in `a` to the corresponding element in `b`. The result in the output vector will be `0xffffffff` if the input element in `a` is less than or equal to the corresponding element in `b`, or `0` otherwise.
_mm_cmple_sd ^⚠	Return a new vector with the low element of `a` replaced by the less-than-or-equal comparison of the lower elements of `a` and `b`.
_mm_cmple_ss ^⚠	Compare the lowest `f32` of both inputs for less than or equal. The lowest 32 bits of the result will be `0xffffffff` if `a.extract(0)` is less than or equal `b.extract(0)`, or `0` otherwise. The upper 96 bits of the result are the upper 96 bits of `a`.
_mm_cmplt_epi8 ^⚠	Compare packed 8-bit integers in `a` and `b` for less-than.
_mm_cmplt_epi16 ^⚠	Compare packed 16-bit integers in `a` and `b` for less-than.
_mm_cmplt_epi32 ^⚠	Compare packed 32-bit integers in `a` and `b` for less-than.
_mm_cmplt_pd ^⚠	Compare corresponding elements in `a` and `b` for less-than.
_mm_cmplt_ps ^⚠	Compare each of the four floats in `a` to the corresponding element in `b`. The result in the output vector will be `0xffffffff` if the input element in `a` is less than the corresponding element in `b`, or `0` otherwise.
_mm_cmplt_sd ^⚠	Return a new vector with the low element of `a` replaced by the less-than comparison of the lower elements of `a` and `b`.
_mm_cmplt_ss ^⚠	Compare the lowest `f32` of both inputs for less than. The lowest 32 bits of the result will be `0xffffffff` if `a.extract(0)` is less than `b.extract(0)`, or `0` otherwise. The upper 96 bits of the result are the upper 96 bits of `a`.
_mm_cmpneq_pd ^⚠	Compare corresponding elements in `a` and `b` for not-equal.
_mm_cmpneq_ps ^⚠	Compare each of the four floats in `a` to the corresponding element in `b`. The result in the output vector will be `0xffffffff` if the input elements are not equal, or `0` otherwise.
_mm_cmpneq_sd ^⚠	Return a new vector with the low element of `a` replaced by the not-equal comparison of the lower elements of `a` and `b`.
_mm_cmpneq_ss ^⚠	Compare the lowest `f32` of both inputs for inequality. The lowest 32 bits of the result will be `0xffffffff` if `a.extract(0)` is not equal to `b.extract(0)`, or `0` otherwise. The upper 96 bits of the result are the upper 96 bits of `a`.
_mm_cmpnge_pd ^⚠	Compare corresponding elements in `a` and `b` for not-greater-than-or-equal.
_mm_cmpnge_ps ^⚠	Compare each of the four floats in `a` to the corresponding element in `b`. The result in the output vector will be `0xffffffff` if the input element in `a` is not greater than or equal to the corresponding element in `b`, or `0` otherwise.
_mm_cmpnge_sd ^⚠	Return a new vector with the low element of `a` replaced by the not-greater-than-or-equal comparison of the lower elements of `a` and `b`.
_mm_cmpnge_ss ^⚠	Compare the lowest `f32` of both inputs for not-greater-than-or-equal. The lowest 32 bits of the result will be `0xffffffff` if `a.extract(0)` is not greater than or equal to `b.extract(0)`, or `0` otherwise. The upper 96 bits of the result are the upper 96 bits of `a`.
_mm_cmpngt_pd ^⚠	Compare corresponding elements in `a` and `b` for not-greater-than.
_mm_cmpngt_ps ^⚠	Compare each of the four floats in `a` to the corresponding element in `b`. The result in the output vector will be `0xffffffff` if the input element in `a` is not greater than the corresponding element in `b`, or `0` otherwise.
_mm_cmpngt_sd ^⚠	Return a new vector with the low element of `a` replaced by the not-greater-than comparison of the lower elements of `a` and `b`.
_mm_cmpngt_ss ^⚠	Compare the lowest `f32` of both inputs for not-greater-than. The lowest 32 bits of the result will be `0xffffffff` if `a.extract(0)` is not greater than `b.extract(0)`, or `0` otherwise. The upper 96 bits of the result are the upper 96 bits of `a`.
_mm_cmpnle_pd ^⚠	Compare corresponding elements in `a` and `b` for not-less-than-or-equal.
_mm_cmpnle_ps ^⚠	Compare each of the four floats in `a` to the corresponding element in `b`. The result in the output vector will be `0xffffffff` if the input element in `a` is not less than or equal to the corresponding element in `b`, or `0` otherwise.
_mm_cmpnle_sd ^⚠	Return a new vector with the low element of `a` replaced by the not-less-than-or-equal comparison of the lower elements of `a` and `b`.
_mm_cmpnle_ss ^⚠	Compare the lowest `f32` of both inputs for not-less-than-or-equal. The lowest 32 bits of the result will be `0xffffffff` if `a.extract(0)` is not less than or equal to `b.extract(0)`, or `0` otherwise. The upper 96 bits of the result are the upper 96 bits of `a`.
_mm_cmpnlt_pd ^⚠	Compare corresponding elements in `a` and `b` for not-less-than.
_mm_cmpnlt_ps ^⚠	Compare each of the four floats in `a` to the corresponding element in `b`. The result in the output vector will be `0xffffffff` if the input element in `a` is not less than the corresponding element in `b`, or `0` otherwise.
_mm_cmpnlt_sd ^⚠	Return a new vector with the low element of `a` replaced by the not-less-than comparison of the lower elements of `a` and `b`.
_mm_cmpnlt_ss ^⚠	Compare the lowest `f32` of both inputs for not-less-than. The lowest 32 bits of the result will be `0xffffffff` if `a.extract(0)` is not less than `b.extract(0)`, or `0` otherwise. The upper 96 bits of the result are the upper 96 bits of `a`.
_mm_cmpord_pd ^⚠	Compare corresponding elements in `a` and `b` to see if neither is `NaN`.
_mm_cmpord_ps ^⚠	Compare each of the four floats in `a` to the corresponding element in `b`. Returns four floats that have one of two possible bit patterns. The element in the output vector will be `0xffffffff` if the input elements in `a` and `b` are ordered (i.e., neither of them is a NaN), or 0 otherwise.
_mm_cmpord_sd ^⚠	Return a new vector with the low element of `a` replaced by the result of comparing both of the lower elements of `a` and `b` to `NaN`. If neither are equal to `NaN` then `0xFFFFFFFFFFFFFFFF` is used and `0` otherwise.
_mm_cmpord_ss ^⚠	Check if the lowest `f32` of both inputs are ordered. The lowest 32 bits of the result will be `0xffffffff` if neither of `a.extract(0)` or `b.extract(0)` is a NaN, or `0` otherwise. The upper 96 bits of the result are the upper 96 bits of `a`.
_mm_cmpunord_pd ^⚠	Compare corresponding elements in `a` and `b` to see if either is `NaN`.
_mm_cmpunord_ps ^⚠	Compare each of the four floats in `a` to the corresponding element in `b`. Returns four floats that have one of two possible bit patterns. The element in the output vector will be `0xffffffff` if the input elements in `a` and `b` are unordered (i.e., at least on of them is a NaN), or 0 otherwise.
_mm_cmpunord_sd ^⚠	Return a new vector with the low element of `a` replaced by the result of comparing both of the lower elements of `a` and `b` to `NaN`. If either is equal to `NaN` then `0xFFFFFFFFFFFFFFFF` is used and `0` otherwise.
_mm_cmpunord_ss ^⚠	Check if the lowest `f32` of both inputs are unordered. The lowest 32 bits of the result will be `0xffffffff` if any of `a.extract(0)` or `b.extract(0)` is a NaN, or `0` otherwise. The upper 96 bits of the result are the upper 96 bits of `a`.
_mm_comieq_sd ^⚠	Compare the lower element of `a` and `b` for equality.
_mm_comieq_ss ^⚠	Compare two 32-bit floats from the low-order bits of `a` and `b`. Returns `1` if they are equal, or `0` otherwise.
_mm_comige_sd ^⚠	Compare the lower element of `a` and `b` for greater-than-or-equal.
_mm_comige_ss ^⚠	Compare two 32-bit floats from the low-order bits of `a` and `b`. Returns `1` if the value from `a` is greater than or equal to the one from `b`, or `0` otherwise.
_mm_comigt_sd ^⚠	Compare the lower element of `a` and `b` for greater-than.
_mm_comigt_ss ^⚠	Compare two 32-bit floats from the low-order bits of `a` and `b`. Returns `1` if the value from `a` is greater than the one from `b`, or `0` otherwise.
_mm_comile_sd ^⚠	Compare the lower element of `a` and `b` for less-than-or-equal.
_mm_comile_ss ^⚠	Compare two 32-bit floats from the low-order bits of `a` and `b`. Returns `1` if the value from `a` is less than or equal to the one from `b`, or `0` otherwise.
_mm_comilt_sd ^⚠	Compare the lower element of `a` and `b` for less-than.
_mm_comilt_ss ^⚠	Compare two 32-bit floats from the low-order bits of `a` and `b`. Returns `1` if the value from `a` is less than the one from `b`, or `0` otherwise.
_mm_comineq_sd ^⚠	Compare the lower element of `a` and `b` for not-equal.
_mm_comineq_ss ^⚠	Compare two 32-bit floats from the low-order bits of `a` and `b`. Returns `1` if they are not equal, or `0` otherwise.
_mm_crc32_u8 ^⚠	Starting with the initial value in `crc`, return the accumulated CRC32 value for unsigned 8-bit integer `v`.
_mm_crc32_u16 ^⚠	Starting with the initial value in `crc`, return the accumulated CRC32 value for unsigned 16-bit integer `v`.
_mm_crc32_u32 ^⚠	Starting with the initial value in `crc`, return the accumulated CRC32 value for unsigned 32-bit integer `v`.
_mm_cvt_si2ss ^⚠	Alias for `_mm_cvtsi32_ss`.
_mm_cvt_ss2si ^⚠	Alias for `_mm_cvtss_si32`.
_mm_cvtepi32_pd ^⚠	Convert the lower two packed 32-bit integers in `a` to packed double-precision (64-bit) floating-point elements.
_mm_cvtepi32_ps ^⚠	Convert packed 32-bit integers in `a` to packed single-precision (32-bit) floating-point elements.
_mm_cvtpd_epi32 ^⚠	Convert packed double-precision (64-bit) floating-point elements in `a` to packed 32-bit integers.
_mm_cvtpd_ps ^⚠	Convert packed double-precision (64-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements
_mm_cvtps_epi32 ^⚠	Convert packed single-precision (32-bit) floating-point elements in `a` to packed 32-bit integers.
_mm_cvtps_pd ^⚠	Convert packed single-precision (32-bit) floating-point elements in `a` to packed double-precision (64-bit) floating-point elements.
_mm_cvtsd_si32 ^⚠	Convert the lower double-precision (64-bit) floating-point element in a to a 32-bit integer.
_mm_cvtsd_ss ^⚠	Convert the lower double-precision (64-bit) floating-point element in `b` to a single-precision (32-bit) floating-point element, store the result in the lower element of the return value, and copy the upper element from `a` to the upper element the return value.
_mm_cvtsi128_si32 ^⚠	Return the lowest element of `a`.
_mm_cvtsi32_sd ^⚠	Return `a` with its lower element replaced by `b` after converting it to an `f64`.
_mm_cvtsi32_si128 ^⚠	Return a vector whose lowest element is `a` and all higher elements are `0`.
_mm_cvtsi32_ss ^⚠	Convert a 32 bit integer to a 32 bit float. The result vector is the input vector `a` with the lowest 32 bit float replaced by the converted integer.
_mm_cvtss_f32 ^⚠	Extract the lowest 32 bit float from the input vector.
_mm_cvtss_sd ^⚠	Convert the lower single-precision (32-bit) floating-point element in `b` to a double-precision (64-bit) floating-point element, store the result in the lower element of the return value, and copy the upper element from `a` to the upper element the return value.
_mm_cvtss_si32 ^⚠	Convert the lowest 32 bit float in the input vector to a 32 bit integer.
_mm_cvtt_ss2si ^⚠	Alias for `_mm_cvttss_si32`.
_mm_cvttpd_epi32 ^⚠	Convert packed double-precision (64-bit) floating-point elements in `a` to packed 32-bit integers with truncation.
_mm_cvttps_epi32 ^⚠	Convert packed single-precision (32-bit) floating-point elements in `a` to packed 32-bit integers with truncation.
_mm_cvttsd_si32 ^⚠	Convert the lower double-precision (64-bit) floating-point element in `a` to a 32-bit integer with truncation.
_mm_cvttss_si32 ^⚠	Convert the lowest 32 bit float in the input vector to a 32 bit integer with truncation.
_mm_div_pd ^⚠	Divide packed double-precision (64-bit) floating-point elements in `a` by packed elements in `b`.
_mm_div_ps ^⚠	Divides f32x4 vectors.
_mm_div_sd ^⚠	Return a new vector with the low element of `a` replaced by the result of diving the lower element of `a` by the lower element of `b`.
_mm_div_ss ^⚠	Divides the first component of `b` by `a`, the other components are copied from `a`.
_mm_dp_pd ^⚠	Returns the dot product of two f64x2 vectors.
_mm_dp_ps ^⚠	Returns the dot product of two f32x4 vectors.
_mm_extract_epi8 ^⚠	Extract an 8-bit integer from `a` selected with `imm8`
_mm_extract_epi16 ^⚠	Return the `imm8` element of `a`.
_mm_extract_epi32 ^⚠	Extract an 32-bit integer from `a` selected with `imm8`
_mm_extract_ps ^⚠	Extract a single-precision (32-bit) floating-point element from `a`, selected with `imm8`
_mm_floor_pd ^⚠	Round the packed double-precision (64-bit) floating-point elements in `a` down to an integer value, and store the results as packed double-precision floating-point elements.
_mm_floor_ps ^⚠	Round the packed single-precision (32-bit) floating-point elements in `a` down to an integer value, and store the results as packed single-precision floating-point elements.
_mm_floor_sd ^⚠	Round the lower double-precision (64-bit) floating-point element in `b` down to an integer value, store the result as a double-precision floating-point element in the lower element of the intrinsic result, and copy the upper element from `a` to the upper element of the intrinsic result.
_mm_floor_ss ^⚠	Round the lower single-precision (32-bit) floating-point element in `b` down to an integer value, store the result as a single-precision floating-point element in the lower element of the intrinsic result, and copy the upper 3 packed elements from `a` to the upper elements of the intrinsic result.
_mm_getcsr ^⚠	Get the unsigned 32-bit value of the MXCSR control and status register.
_mm_hadd_epi16 ^⚠	Horizontally add the adjacent pairs of values contained in 2 packed 128-bit vectors of [8 x i16].
_mm_hadd_epi32 ^⚠	Horizontally add the adjacent pairs of values contained in 2 packed 128-bit vectors of [4 x i32].
_mm_hadd_pd ^⚠	Horizontally add adjacent pairs of double-precision (64-bit) floating-point elements in `a` and `b`, and pack the results.
_mm_hadd_ps ^⚠	Horizontally add adjacent pairs of single-precision (32-bit) floating-point elements in `a` and `b`, and pack the results.
_mm_hadds_epi16 ^⚠	Horizontally add the adjacent pairs of values contained in 2 packed 128-bit vectors of [8 x i16]. Positive sums greater than 7FFFh are saturated to 7FFFh. Negative sums less than 8000h are saturated to 8000h.
_mm_hsub_epi16 ^⚠	Horizontally subtract the adjacent pairs of values contained in 2 packed 128-bit vectors of [8 x i16].
_mm_hsub_epi32 ^⚠	Horizontally subtract the adjacent pairs of values contained in 2 packed 128-bit vectors of [4 x i32].
_mm_hsub_pd ^⚠	Horizontally subtract adjacent pairs of double-precision (64-bit) floating-point elements in `a` and `b`, and pack the results.
_mm_hsub_ps ^⚠	Horizontally add adjacent pairs of single-precision (32-bit) floating-point elements in `a` and `b`, and pack the results.
_mm_hsubs_epi16 ^⚠	Horizontally subtract the adjacent pairs of values contained in 2 packed 128-bit vectors of [8 x i16]. Positive differences greater than 7FFFh are saturated to 7FFFh. Negative differences less than 8000h are saturated to 8000h.
_mm_insert_epi8 ^⚠	Return a copy of `a` with the 8-bit integer from `i` inserted at a location specified by `imm8`.
_mm_insert_epi16 ^⚠	Return a new vector where the `imm8` element of `a` is replaced with `i`.
_mm_insert_epi32 ^⚠	Return a copy of `a` with the 32-bit integer from `i` inserted at a location specified by `imm8`.
_mm_insert_ps ^⚠	Select a single value in `a` to store at some position in `b`, Then zero elements according to `imm8`.
_mm_lddqu_si128 ^⚠	Load 128-bits of integer data from unaligned memory. This intrinsic may perform better than `_mm_loadu_si128` when the data crosses a cache line boundary.
_mm_lfence ^⚠	Perform a serializing operation on all load-from-memory instructions that were issued prior to this instruction.
_mm_load1_pd ^⚠	Load a double-precision (64-bit) floating-point element from memory into both elements of returned vector.
_mm_load1_ps ^⚠	Construct a `f32x4` by duplicating the value read from `p` into all elements.
_mm_load_pd ^⚠	Load 128-bits (composed of 2 packed double-precision (64-bit) floating-point elements) from memory into the returned vector. `mem_addr` must be aligned on a 16-byte boundary or a general-protection exception may be generated.
_mm_load_pd1 ^⚠	Load a double-precision (64-bit) floating-point element from memory into both elements of returned vector.
_mm_load_ps ^⚠	Load four `f32` values from aligned memory into a `f32x4`. If the pointer is not aligned to a 128-bit boundary (16 bytes) a general protection fault will be triggered (fatal program crash).
_mm_load_ps1 ^⚠	Alias for `_mm_load1_ps`
_mm_load_si128 ^⚠	Load 128-bits of integer data from memory into a new vector.
_mm_load_ss ^⚠	Construct a `f32x4` with the lowest element read from `p` and the other elements set to zero.
_mm_loaddup_pd ^⚠	Load a double-precision (64-bit) floating-point element from memory into both elements of return vector.
_mm_loadh_pi ^⚠	Set the upper two single-precision floating-point values with 64 bits of data loaded from the address `p`; the lower two values are passed through from `a`.
_mm_loadl_epi64 ^⚠	Load 64-bit integer from memory into first element of returned vector.
_mm_loadl_pi ^⚠	Load two floats from `p` into the lower half of a `f32x4`. The upper half is copied from the upper half of `a`.
_mm_loadr_pd ^⚠	Load 2 double-precision (64-bit) floating-point elements from memory into the returned vector in reverse order. `mem_addr` must be aligned on a 16-byte boundary or a general-protection exception may be generated.
_mm_loadr_ps ^⚠	Load four `f32` values from aligned memory into a `f32x4` in reverse order.
_mm_loadu_pd ^⚠	Load 128-bits (composed of 2 packed double-precision (64-bit) floating-point elements) from memory into the returned vector. `mem_addr` does not need to be aligned on any particular boundary.
_mm_loadu_ps ^⚠	Load four `f32` values from memory into a `f32x4`. There are no restrictions on memory alignment. For aligned memory `_mm_load_ps` may be faster.
_mm_loadu_si128 ^⚠	Load 128-bits of integer data from memory into a new vector.
_mm_madd_epi16 ^⚠	Multiply and then horizontally add signed 16 bit integers in `a` and `b`.
_mm_maddubs_epi16 ^⚠	Multiply corresponding pairs of packed 8-bit unsigned integer values contained in the first source operand and packed 8-bit signed integer values contained in the second source operand, add pairs of contiguous products with signed saturation, and writes the 16-bit sums to the corresponding bits in the destination.
_mm_maskload_epi32 ^⚠	Load packed 32-bit integers from memory pointed by `mem_addr` using `mask` (elements are zeroed out when the highest bit is not set in the corresponding element).
_mm_maskload_epi64 ^⚠	Load packed 64-bit integers from memory pointed by `mem_addr` using `mask` (elements are zeroed out when the highest bit is not set in the corresponding element).
_mm_maskload_pd ^⚠	Load packed double-precision (64-bit) floating-point elements from memory into result using `mask` (elements are zeroed out when the high bit of the corresponding element is not set).
_mm_maskload_ps ^⚠	Load packed single-precision (32-bit) floating-point elements from memory into result using `mask` (elements are zeroed out when the high bit of the corresponding element is not set).
_mm_maskmoveu_si128 ^⚠	Conditionally store 8-bit integer elements from `a` into memory using `mask`.
_mm_maskstore_epi32 ^⚠	Store packed 32-bit integers from `a` into memory pointed by `mem_addr` using `mask` (elements are not stored when the highest bit is not set in the corresponding element).
_mm_maskstore_epi64 ^⚠	Store packed 64-bit integers from `a` into memory pointed by `mem_addr` using `mask` (elements are not stored when the highest bit is not set in the corresponding element).
_mm_maskstore_pd ^⚠	Store packed double-precision (64-bit) floating-point elements from `a` into memory using `mask`.
_mm_maskstore_ps ^⚠	Store packed single-precision (32-bit) floating-point elements from `a` into memory using `mask`.
_mm_max_epi8 ^⚠	Compare packed 8-bit integers in `a` and `b`,87 and return packed maximum values in dst.
_mm_max_epi16 ^⚠	Compare packed 16-bit integers in `a` and `b`, and return the packed maximum values.
_mm_max_epi32 ^⚠
_mm_max_epu8 ^⚠	Compare packed unsigned 8-bit integers in `a` and `b`, and return the packed maximum values.
_mm_max_epu16 ^⚠	Compare packed unsigned 16-bit integers in `a` and `b`, and return packed maximum.
_mm_max_epu32 ^⚠
_mm_max_pd ^⚠	Return a new vector with the maximum values from corresponding elements in `a` and `b`.
_mm_max_ps ^⚠	Compare packed single-precision (32-bit) floating-point elements in `a` and `b`, and return the corresponding maximum values.
_mm_max_sd ^⚠	Return a new vector with the low element of `a` replaced by the maximum of the lower elements of `a` and `b`.
_mm_max_ss ^⚠	Compare the first single-precision (32-bit) floating-point element of `a` and `b`, and return the maximum value in the first element of the return value, the other elements are copied from `a`.
_mm_mfence ^⚠	Perform a serializing operation on all load-from-memory and store-to-memory instructions that were issued prior to this instruction.
_mm_min_epi16 ^⚠	Compare packed 16-bit integers in `a` and `b`, and return the packed minimum values.
_mm_min_epu8 ^⚠	Compare packed unsigned 8-bit integers in `a` and `b`, and return the packed minimum values.
_mm_min_pd ^⚠	Return a new vector with the minimum values from corresponding elements in `a` and `b`.
_mm_min_ps ^⚠	Compare packed single-precision (32-bit) floating-point elements in `a` and `b`, and return the corresponding minimum values.
_mm_min_sd ^⚠	Return a new vector with the low element of `a` replaced by the minimum of the lower elements of `a` and `b`.
_mm_min_ss ^⚠	Compare the first single-precision (32-bit) floating-point element of `a` and `b`, and return the minimum value in the first element of the return value, the other elements are copied from `a`.
_mm_move_epi64 ^⚠	Return a vector where the low element is extracted from `a` and its upper element is zero.
_mm_move_ss ^⚠	Return a `f32x4` with the first component from `b` and the remaining components from `a`.
_mm_movedup_pd ^⚠	Duplicate the low double-precision (64-bit) floating-point element from `a`.
_mm_movehdup_ps ^⚠	Duplicate odd-indexed single-precision (32-bit) floating-point elements from `a`.
_mm_movehl_ps ^⚠	Combine higher half of `a` and `b`. The highwe half of `b` occupies the lower half of result.
_mm_moveldup_ps ^⚠	Duplicate even-indexed single-precision (32-bit) floating-point elements from `a`.
_mm_movelh_ps ^⚠	Combine lower half of `a` and `b`. The lower half of `b` occupies the higher half of result.
_mm_movemask_epi8 ^⚠	Return a mask of the most significant bit of each element in `a`.
_mm_movemask_pd ^⚠	Return a mask of the most significant bit of each element in `a`.
_mm_movemask_ps ^⚠	Return a mask of the most significant bit of each element in `a`.
_mm_mul_epu32 ^⚠	Multiply the low unsigned 32-bit integers from each packed 64-bit element in `a` and `b`.
_mm_mul_pd ^⚠	Multiply packed double-precision (64-bit) floating-point elements in `a` and `b`.
_mm_mul_ps ^⚠	Multiplies f32x4 vectors.
_mm_mul_sd ^⚠	Return a new vector with the low element of `a` replaced by multiplying the low elements of `a` and `b`.
_mm_mul_ss ^⚠	Multiplies the first component of `a` and `b`, the other components are copied from `a`.
_mm_mulhi_epi16 ^⚠	Multiply the packed 16-bit integers in `a` and `b`.
_mm_mulhi_epu16 ^⚠	Multiply the packed unsigned 16-bit integers in `a` and `b`.
_mm_mulhrs_epi16 ^⚠	Multiply packed 16-bit signed integer values, truncate the 32-bit product to the 18 most significant bits by right-shifting, round the truncated value by adding 1, and write bits [16:1] to the destination.
_mm_mullo_epi16 ^⚠	Multiply the packed 16-bit integers in `a` and `b`.
_mm_or_pd ^⚠	Compute the bitwise OR of `a` and `b`.
_mm_or_ps ^⚠	Bitwise OR of packed single-precision (32-bit) floating-point elements.
_mm_or_si128 ^⚠	Compute the bitwise OR of 128 bits (representing integer data) in `a` and `b`.
_mm_packs_epi16 ^⚠	Convert packed 16-bit integers from `a` and `b` to packed 8-bit integers using signed saturation.
_mm_packs_epi32 ^⚠	Convert packed 32-bit integers from `a` and `b` to packed 16-bit integers using signed saturation.
_mm_packus_epi16 ^⚠	Convert packed 16-bit integers from `a` and `b` to packed 8-bit integers using unsigned saturation.
_mm_pause ^⚠	Provide a hint to the processor that the code sequence is a spin-wait loop.
_mm_permute_pd ^⚠	Shuffle double-precision (64-bit) floating-point elements in `a` using the control in `imm8`.
_mm_permute_ps ^⚠	Shuffle single-precision (32-bit) floating-point elements in `a` using the control in `imm8`.
_mm_permutevar_pd ^⚠	Shuffle double-precision (64-bit) floating-point elements in `a` using the control in `b`.
_mm_permutevar_ps ^⚠	Shuffle single-precision (32-bit) floating-point elements in `a` using the control in `b`.
_mm_prefetch ^⚠	Fetch the cache line that contains address `p` using the given `strategy`.
_mm_rcp_ps ^⚠	Return the approximate reciprocal of packed single-precision (32-bit) floating-point elements in `a`.
_mm_rcp_ss ^⚠	Return the approximate reciprocal of the first single-precision (32-bit) floating-point element in `a`, the other elements are unchanged.
_mm_round_pd ^⚠	Round the packed double-precision (64-bit) floating-point elements in `a` using the `rounding` parameter, and store the results as packed double-precision floating-point elements. Rounding is done according to the rounding parameter, which can be one of:
_mm_round_ps ^⚠	Round the packed single-precision (32-bit) floating-point elements in `a` using the `rounding` parameter, and store the results as packed single-precision floating-point elements. Rounding is done according to the rounding parameter, which can be one of:
_mm_round_sd ^⚠	Round the lower double-precision (64-bit) floating-point element in `b` using the `rounding` parameter, store the result as a double-precision floating-point element in the lower element of the intrinsic result, and copy the upper element from `a` to the upper element of the intrinsic result. Rounding is done according to the rounding parameter, which can be one of:
_mm_round_ss ^⚠	Round the lower single-precision (32-bit) floating-point element in `b` using the `rounding` parameter, store the result as a single-precision floating-point element in the lower element of the intrinsic result, and copy the upper 3 packed elements from `a` to the upper elements of the instrinsic result. Rounding is done according to the rounding parameter, which can be one of:
_mm_rsqrt_ps ^⚠	Return the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in `a`.
_mm_rsqrt_ss ^⚠	Return the approximate reciprocal square root of the fist single-precision (32-bit) floating-point elements in `a`, the other elements are unchanged.
_mm_sad_epu8 ^⚠	Sum the absolute differences of packed unsigned 8-bit integers.
_mm_set1_epi8 ^⚠	Broadcast 8-bit integer `a` to all elements.
_mm_set1_epi16 ^⚠	Broadcast 16-bit integer `a` to all elements.
_mm_set1_epi32 ^⚠	Broadcast 32-bit integer `a` to all elements.
_mm_set1_epi64x ^⚠	Broadcast 64-bit integer `a` to all elements.
_mm_set1_pd ^⚠	Broadcast double-precision (64-bit) floating-point value a to all elements of the return value.
_mm_set1_ps ^⚠	Construct a `f32x4` with all element set to `a`.
_mm_set_epi8 ^⚠	Set packed 8-bit integers with the supplied values.
_mm_set_epi16 ^⚠	Set packed 16-bit integers with the supplied values.
_mm_set_epi32 ^⚠	Set packed 32-bit integers with the supplied values.
_mm_set_epi64x ^⚠	Set packed 64-bit integers with the supplied values, from highest to lowest.
_mm_set_pd ^⚠	Set packed double-precision (64-bit) floating-point elements in the return value with the supplied values.
_mm_set_pd1 ^⚠	Broadcast double-precision (64-bit) floating-point value a to all elements of the return value.
_mm_set_ps ^⚠	Construct a `f32x4` from four floating point values highest to lowest.
_mm_set_ps1 ^⚠	Alias for `_mm_set1_ps`
_mm_set_sd ^⚠	Copy double-precision (64-bit) floating-point element `a` to the lower element of the packed 64-bit return value.
_mm_set_ss ^⚠	Construct a `f32x4` with the lowest element set to `a` and the rest set to zero.
_mm_setcsr ^⚠	Set the MXCSR register with the 32-bit unsigned integer value.
_mm_setr_epi8 ^⚠	Set packed 8-bit integers with the supplied values in reverse order.
_mm_setr_epi16 ^⚠	Set packed 16-bit integers with the supplied values in reverse order.
_mm_setr_epi32 ^⚠	Set packed 32-bit integers with the supplied values in reverse order.
_mm_setr_pd ^⚠	Set packed double-precision (64-bit) floating-point elements in the return value with the supplied values in reverse order.
_mm_setr_ps ^⚠	Construct a `f32x4` from four floating point values lowest to highest.
_mm_setzero_pd ^⚠	returns packed double-precision (64-bit) floating-point elements with all zeros.
_mm_setzero_ps ^⚠	Construct a `f32x4` with all elements initialized to zero.
_mm_setzero_si128 ^⚠	Returns a vector with all elements set to zero.
_mm_sfence ^⚠	Perform a serializing operation on all store-to-memory instructions that were issued prior to this instruction.
_mm_shuffle_epi8 ^⚠	Shuffle bytes from `a` according to the content of `b`.
_mm_shuffle_epi32 ^⚠	Shuffle 32-bit integers in `a` using the control in `imm8`.
_mm_shuffle_ps ^⚠	Shuffle packed single-precision (32-bit) floating-point elements in `a` and `b` using `mask`.
_mm_shufflehi_epi16 ^⚠	Shuffle 16-bit integers in the high 64 bits of `a` using the control in `imm8`.
_mm_shufflelo_epi16 ^⚠	Shuffle 16-bit integers in the low 64 bits of `a` using the control in `imm8`.
_mm_sign_epi8 ^⚠	Negate packed 8-bit integers in `a` when the corresponding signed 8-bit integer in `b` is negative, and return the result. Elements in result are zeroed out when the corresponding element in `b` is zero.
_mm_sign_epi16 ^⚠	Negate packed 16-bit integers in `a` when the corresponding signed 16-bit integer in `b` is negative, and return the results. Elements in result are zeroed out when the corresponding element in `b` is zero.
_mm_sign_epi32 ^⚠	Negate packed 32-bit integers in `a` when the corresponding signed 32-bit integer in `b` is negative, and return the results. Element in result are zeroed out when the corresponding element in `b` is zero.
_mm_sll_epi16 ^⚠	Shift packed 16-bit integers in `a` left by `count` while shifting in zeros.
_mm_sll_epi32 ^⚠	Shift packed 32-bit integers in `a` left by `count` while shifting in zeros.
_mm_sll_epi64 ^⚠	Shift packed 64-bit integers in `a` left by `count` while shifting in zeros.
_mm_slli_epi16 ^⚠	Shift packed 16-bit integers in `a` left by `imm8` while shifting in zeros.
_mm_slli_epi32 ^⚠	Shift packed 32-bit integers in `a` left by `imm8` while shifting in zeros.
_mm_slli_epi64 ^⚠	Shift packed 64-bit integers in `a` left by `imm8` while shifting in zeros.
_mm_slli_si128 ^⚠	Shift `a` left by `imm8` bytes while shifting in zeros.
_mm_sllv_epi32 ^⚠	Shift packed 32-bit integers in `a` left by the amount specified by the corresponding element in `count` while shifting in zeros, and return the result.
_mm_sllv_epi64 ^⚠	Shift packed 64-bit integers in `a` left by the amount specified by the corresponding element in `count` while shifting in zeros, and return the result.
_mm_sqrt_pd ^⚠	Return a new vector with the square root of each of the values in `a`.
_mm_sqrt_ps ^⚠	Return the square root of packed single-precision (32-bit) floating-point elements in `a`.
_mm_sqrt_sd ^⚠	Return a new vector with the low element of `a` replaced by the square root of the lower element `b`.
_mm_sqrt_ss ^⚠	Return the square root of the first single-precision (32-bit) floating-point element in `a`, the other elements are unchanged.
_mm_sra_epi16 ^⚠	Shift packed 16-bit integers in `a` right by `count` while shifting in sign bits.
_mm_sra_epi32 ^⚠	Shift packed 32-bit integers in `a` right by `count` while shifting in sign bits.
_mm_srai_epi16 ^⚠	Shift packed 16-bit integers in `a` right by `imm8` while shifting in sign bits.
_mm_srai_epi32 ^⚠	Shift packed 32-bit integers in `a` right by `imm8` while shifting in sign bits.
_mm_srav_epi32 ^⚠	Shift packed 32-bit integers in `a` right by the amount specified by the corresponding element in `count` while shifting in sign bits.
_mm_srl_epi16 ^⚠	Shift packed 16-bit integers in `a` right by `count` while shifting in zeros.
_mm_srl_epi32 ^⚠	Shift packed 32-bit integers in `a` right by `count` while shifting in zeros.
_mm_srl_epi64 ^⚠	Shift packed 64-bit integers in `a` right by `count` while shifting in zeros.
_mm_srli_epi16 ^⚠	Shift packed 16-bit integers in `a` right by `imm8` while shifting in zeros.
_mm_srli_epi32 ^⚠	Shift packed 32-bit integers in `a` right by `imm8` while shifting in zeros.
_mm_srli_epi64 ^⚠	Shift packed 64-bit integers in `a` right by `imm8` while shifting in zeros.
_mm_srli_si128 ^⚠	Shift `a` right by `imm8` bytes while shifting in zeros.
_mm_srlv_epi32 ^⚠	Shift packed 32-bit integers in `a` right by the amount specified by the corresponding element in `count` while shifting in zeros,
_mm_srlv_epi64 ^⚠	Shift packed 64-bit integers in `a` right by the amount specified by the corresponding element in `count` while shifting in zeros,
_mm_store1_pd ^⚠	Store the lower double-precision (64-bit) floating-point element from `a` into 2 contiguous elements in memory. `mem_addr` must be aligned on a 16-byte boundary or a general-protection exception may be generated.
_mm_store1_ps ^⚠	Store the lowest 32 bit float of `a` repeated four times into aligned memory.
_mm_store_pd ^⚠	Store 128-bits (composed of 2 packed double-precision (64-bit) floating-point elements) from `a` into memory. `mem_addr` must be aligned on a 16-byte boundary or a general-protection exception may be generated.
_mm_store_pd1 ^⚠	Store the lower double-precision (64-bit) floating-point element from `a` into 2 contiguous elements in memory. `mem_addr` must be aligned on a 16-byte boundary or a general-protection exception may be generated.
_mm_store_ps ^⚠	Store four 32-bit floats into aligned memory.
_mm_store_ps1 ^⚠	Alias for `_mm_store1_ps`
_mm_store_si128 ^⚠	Store 128-bits of integer data from `a` into memory.
_mm_store_ss ^⚠	Store the lowest 32 bit float of `a` into memory.
_mm_storeh_pi ^⚠	Store the upper half of `a` (64 bits) into memory.
_mm_storel_epi64 ^⚠	Store the lower 64-bit integer `a` to a memory location.
_mm_storel_pi ^⚠	Store the lower half of `a` (64 bits) into memory.
_mm_storer_pd ^⚠	Store 2 double-precision (64-bit) floating-point elements from `a` into memory in reverse order. `mem_addr` must be aligned on a 16-byte boundary or a general-protection exception may be generated.
_mm_storer_ps ^⚠	Store four 32-bit floats into aligned memory in reverse order.
_mm_storeu_pd ^⚠	Store 128-bits (composed of 2 packed double-precision (64-bit) floating-point elements) from `a` into memory. `mem_addr` does not need to be aligned on any particular boundary.
_mm_storeu_ps ^⚠	Store four 32-bit floats into memory. There are no restrictions on memory alignment. For aligned memory `_mm_store_ps` may be faster.
_mm_storeu_si128 ^⚠	Store 128-bits of integer data from `a` into memory.
_mm_sub_epi8 ^⚠	Subtract packed 8-bit integers in `b` from packed 8-bit integers in `a`.
_mm_sub_epi16 ^⚠	Subtract packed 16-bit integers in `b` from packed 16-bit integers in `a`.
_mm_sub_epi32 ^⚠	Subtract packed 32-bit integers in `b` from packed 32-bit integers in `a`.
_mm_sub_epi64 ^⚠	Subtract packed 64-bit integers in `b` from packed 64-bit integers in `a`.
_mm_sub_pd ^⚠	Subtract packed double-precision (64-bit) floating-point elements in `b` from `a`.
_mm_sub_ps ^⚠	Subtracts f32x4 vectors.
_mm_sub_sd ^⚠	Return a new vector with the low element of `a` replaced by subtracting the low element by `b` from the low element of `a`.
_mm_sub_ss ^⚠	Subtracts the first component of `b` from `a`, the other components are copied from `a`.
_mm_subs_epi8 ^⚠	Subtract packed 8-bit integers in `b` from packed 8-bit integers in `a` using saturation.
_mm_subs_epi16 ^⚠	Subtract packed 16-bit integers in `b` from packed 16-bit integers in `a` using saturation.
_mm_subs_epu8 ^⚠	Subtract packed unsigned 8-bit integers in `b` from packed unsigned 8-bit integers in `a` using saturation.
_mm_subs_epu16 ^⚠	Subtract packed unsigned 16-bit integers in `b` from packed unsigned 16-bit integers in `a` using saturation.
_mm_testc_pd ^⚠	Compute the bitwise AND of 128 bits (representing double-precision (64-bit) floating-point elements) in `a` and `b`, producing an intermediate 128-bit value, and set `ZF` to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set `ZF` to 0. Compute the bitwise NOT of `a` and then AND with `b`, producing an intermediate value, and set `CF` to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set `CF` to 0. Return the `CF` value.
_mm_testc_ps ^⚠	Compute the bitwise AND of 128 bits (representing single-precision (32-bit) floating-point elements) in `a` and `b`, producing an intermediate 128-bit value, and set `ZF` to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set `ZF` to 0. Compute the bitwise NOT of `a` and then AND with `b`, producing an intermediate value, and set `CF` to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set `CF` to 0. Return the `CF` value.
_mm_testnzc_pd ^⚠	Compute the bitwise AND of 128 bits (representing double-precision (64-bit) floating-point elements) in `a` and `b`, producing an intermediate 128-bit value, and set `ZF` to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set `ZF` to 0. Compute the bitwise NOT of `a` and then AND with `b`, producing an intermediate value, and set `CF` to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set `CF` to 0. Return 1 if both the `ZF` and `CF` values are zero, otherwise return 0.
_mm_testnzc_ps ^⚠	Compute the bitwise AND of 128 bits (representing single-precision (32-bit) floating-point elements) in `a` and `b`, producing an intermediate 128-bit value, and set `ZF` to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set `ZF` to 0. Compute the bitwise NOT of `a` and then AND with `b`, producing an intermediate value, and set `CF` to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set `CF` to 0. Return 1 if both the `ZF` and `CF` values are zero, otherwise return 0.
_mm_testz_pd ^⚠	Compute the bitwise AND of 128 bits (representing double-precision (64-bit) floating-point elements) in `a` and `b`, producing an intermediate 128-bit value, and set `ZF` to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set `ZF` to 0. Compute the bitwise NOT of `a` and then AND with `b`, producing an intermediate value, and set `CF` to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set `CF` to 0. Return the `ZF` value.
_mm_testz_ps ^⚠	Compute the bitwise AND of 128 bits (representing single-precision (32-bit) floating-point elements) in `a` and `b`, producing an intermediate 128-bit value, and set `ZF` to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set `ZF` to 0. Compute the bitwise NOT of `a` and then AND with `b`, producing an intermediate value, and set `CF` to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set `CF` to 0. Return the `ZF` value.
_mm_tzcnt_u32 ^⚠	Counts the number of trailing least significant zero bits.
_mm_tzcnt_u64 ^⚠	Counts the number of trailing least significant zero bits.
_mm_ucomieq_sd ^⚠	Compare the lower element of `a` and `b` for equality.
_mm_ucomieq_ss ^⚠	Compare two 32-bit floats from the low-order bits of `a` and `b`. Returns `1` if they are equal, or `0` otherwise. This instruction will not signal an exception if either argument is a quiet NaN.
_mm_ucomige_sd ^⚠	Compare the lower element of `a` and `b` for greater-than-or-equal.
_mm_ucomige_ss ^⚠	Compare two 32-bit floats from the low-order bits of `a` and `b`. Returns `1` if the value from `a` is greater than or equal to the one from `b`, or `0` otherwise. This instruction will not signal an exception if either argument is a quiet NaN.
_mm_ucomigt_sd ^⚠	Compare the lower element of `a` and `b` for greater-than.
_mm_ucomigt_ss ^⚠	Compare two 32-bit floats from the low-order bits of `a` and `b`. Returns `1` if the value from `a` is greater than the one from `b`, or `0` otherwise. This instruction will not signal an exception if either argument is a quiet NaN.
_mm_ucomile_sd ^⚠	Compare the lower element of `a` and `b` for less-than-or-equal.
_mm_ucomile_ss ^⚠	Compare two 32-bit floats from the low-order bits of `a` and `b`. Returns `1` if the value from `a` is less than or equal to the one from `b`, or `0` otherwise. This instruction will not signal an exception if either argument is a quiet NaN.
_mm_ucomilt_sd ^⚠	Compare the lower element of `a` and `b` for less-than.
_mm_ucomilt_ss ^⚠	Compare two 32-bit floats from the low-order bits of `a` and `b`. Returns `1` if the value from `a` is less than the one from `b`, or `0` otherwise. This instruction will not signal an exception if either argument is a quiet NaN.
_mm_ucomineq_sd ^⚠	Compare the lower element of `a` and `b` for not-equal.
_mm_ucomineq_ss ^⚠	Compare two 32-bit floats from the low-order bits of `a` and `b`. Returns `1` if they are not equal, or `0` otherwise. This instruction will not signal an exception if either argument is a quiet NaN.
_mm_undefined_pd ^⚠	Return vector of type __m128d with undefined elements.
_mm_undefined_ps ^⚠	Return vector of type __m128 with undefined elements.
_mm_undefined_si128 ^⚠	Return vector of type __m128i with undefined elements.
_mm_unpackhi_epi8 ^⚠	Unpack and interleave 8-bit integers from the high half of `a` and `b`.
_mm_unpackhi_epi16 ^⚠	Unpack and interleave 16-bit integers from the high half of `a` and `b`.
_mm_unpackhi_epi32 ^⚠	Unpack and interleave 32-bit integers from the high half of `a` and `b`.
_mm_unpackhi_epi64 ^⚠	Unpack and interleave 64-bit integers from the high half of `a` and `b`.
_mm_unpackhi_ps ^⚠	Unpack and interleave single-precision (32-bit) floating-point elements from the higher half of `a` and `b`.
_mm_unpacklo_epi8 ^⚠	Unpack and interleave 8-bit integers from the low half of `a` and `b`.
_mm_unpacklo_epi16 ^⚠	Unpack and interleave 16-bit integers from the low half of `a` and `b`.
_mm_unpacklo_epi32 ^⚠	Unpack and interleave 32-bit integers from the low half of `a` and `b`.
_mm_unpacklo_epi64 ^⚠	Unpack and interleave 64-bit integers from the low half of `a` and `b`.
_mm_unpacklo_ps ^⚠	Unpack and interleave single-precision (32-bit) floating-point elements from the lower half of `a` and `b`.
_mm_xor_pd ^⚠	Compute the bitwise OR of `a` and `b`.
_mm_xor_ps ^⚠	Bitwise exclusive OR of packed single-precision (32-bit) floating-point elements.
_mm_xor_si128 ^⚠	Compute the bitwise XOR of 128 bits (representing integer data) in `a` and `b`.
_mulx_u32 ^⚠	Unsigned multiply without affecting flags.
_pdep_u32 ^⚠	Scatter contiguous low order bits of `a` to the result at the positions specified by the `mask`.
_pext_u32 ^⚠	Gathers the bits of `x` specified by the `mask` into the contiguous low order bit positions of the result.
_popcnt32 ^⚠	Counts the bits that are set.
_popcnt64 ^⚠	Counts the bits that are set.
_t1mskc_u32 ^⚠	Clears all bits below the least significant zero of `x` and sets all other bits.
_tzcnt_u16 ^⚠	Counts the number of trailing least significant zero bits.
_tzcnt_u32 ^⚠	Counts the number of trailing least significant zero bits.
_tzcnt_u64 ^⚠	Counts the number of trailing least significant zero bits.
_tzmsk_u32 ^⚠	Sets all bits below the least significant one of `x` and clears all other bits.

Type Definitions

__m128i
__m256i