pub struct InnerProduct;Expand description
Compute the inner-product between vector-like types.
Trait Implementations§
Source§impl Clone for InnerProduct
impl Clone for InnerProduct
Source§fn clone(&self) -> InnerProduct
fn clone(&self) -> InnerProduct
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreSource§impl Debug for InnerProduct
impl Debug for InnerProduct
Source§impl<A, B, To> DistanceFunction<A, B, To> for InnerProductwhere
InnerProduct: PureDistanceFunction<A, B, To>,
impl<A, B, To> DistanceFunction<A, B, To> for InnerProductwhere
InnerProduct: PureDistanceFunction<A, B, To>,
Source§fn evaluate_similarity(&self, a: A, b: B) -> To
fn evaluate_similarity(&self, a: A, b: B) -> To
Source§impl PureDistanceFunction<&[f32], BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<f32>, UnequalLengths>> for InnerProduct
Compute the inner product between x and y.
impl PureDistanceFunction<&[f32], BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<f32>, UnequalLengths>> for InnerProduct
Compute the inner product between x and y.
Returns an error if the arguments have different lengths.
Source§impl PureDistanceFunction<&[f32], BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<f32>, UnequalLengths>> for InnerProduct
Compute the inner product between x and y.
impl PureDistanceFunction<&[f32], BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<f32>, UnequalLengths>> for InnerProduct
Compute the inner product between x and y.
Returns an error if the arguments have different lengths.
Source§impl PureDistanceFunction<&[f32], BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<f32>, UnequalLengths>> for InnerProduct
Compute the inner product between x and y.
impl PureDistanceFunction<&[f32], BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<f32>, UnequalLengths>> for InnerProduct
Compute the inner product between x and y.
Returns an error if the arguments have different lengths.
Source§impl PureDistanceFunction<&[f32], BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<f32>, UnequalLengths>> for InnerProduct
Compute the inner product between x and y.
impl PureDistanceFunction<&[f32], BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<f32>, UnequalLengths>> for InnerProduct
Compute the inner product between x and y.
Returns an error if the arguments have different lengths.
Source§impl PureDistanceFunction<&[f32], BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<f32>, UnequalLengths>> for InnerProduct
Compute the inner product between x and y.
impl PureDistanceFunction<&[f32], BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<f32>, UnequalLengths>> for InnerProduct
Compute the inner product between x and y.
Returns an error if the arguments have different lengths.
Source§impl PureDistanceFunction<&[f32], BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<f32>, UnequalLengths>> for InnerProduct
Compute the inner product between x and y.
impl PureDistanceFunction<&[f32], BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<f32>, UnequalLengths>> for InnerProduct
Compute the inner product between x and y.
Returns an error if the arguments have different lengths.
Source§impl PureDistanceFunction<&[f32], BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<f32>, UnequalLengths>> for InnerProduct
Compute the inner product between x and y.
impl PureDistanceFunction<&[f32], BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<f32>, UnequalLengths>> for InnerProduct
Compute the inner product between x and y.
Returns an error if the arguments have different lengths.
Source§impl PureDistanceFunction<&[f32], BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<f32>, UnequalLengths>> for InnerProduct
Compute the inner product between x and y.
impl PureDistanceFunction<&[f32], BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<f32>, UnequalLengths>> for InnerProduct
Compute the inner product between x and y.
Returns an error if the arguments have different lengths.
Source§impl PureDistanceFunction<BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct
Compute the squared L2 distance between x and y.
impl PureDistanceFunction<BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct
Compute the squared L2 distance between x and y.
Source§impl PureDistanceFunction<BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct
Compute the squared L2 distance between x and y.
impl PureDistanceFunction<BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct
Compute the squared L2 distance between x and y.
Source§impl PureDistanceFunction<BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct
Compute the squared L2 distance between x and y.
impl PureDistanceFunction<BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct
Compute the squared L2 distance between x and y.
Source§impl PureDistanceFunction<BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct
Compute the squared L2 distance between x and y.
impl PureDistanceFunction<BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct
Compute the squared L2 distance between x and y.
Source§impl PureDistanceFunction<BitSliceBase<4, Unsigned, SlicePtr<'_, u8>, BitTranspose>, BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct
impl PureDistanceFunction<BitSliceBase<4, Unsigned, SlicePtr<'_, u8>, BitTranspose>, BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct
Source§impl PureDistanceFunction<BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct
Compute the squared L2 distance between x and y.
impl PureDistanceFunction<BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct
Compute the squared L2 distance between x and y.
Source§impl PureDistanceFunction<BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct
Compute the squared L2 distance between x and y.
impl PureDistanceFunction<BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct
Compute the squared L2 distance between x and y.
Source§impl PureDistanceFunction<BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct
Compute the squared L2 distance between x and y.
impl PureDistanceFunction<BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct
Compute the squared L2 distance between x and y.
Source§impl PureDistanceFunction<BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct
Compute the squared L2 distance between x and y.
impl PureDistanceFunction<BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct
Compute the squared L2 distance between x and y.
Source§impl<A> Target2<A, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>> for InnerProductwhere
A: Architecture,
Compute the inner product between bitvectors x and y.
impl<A> Target2<A, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>> for InnerProductwhere
A: Architecture,
Compute the inner product between bitvectors x and y.
Returns an error if the arguments have different lengths.
Source§impl<A> Target2<A, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>, BitTranspose>, BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>> for InnerProductwhere
A: Architecture,
The strategy is to compute the inner product <x, y> by decomposing the problem into
groups of 64-dimensions.
impl<A> Target2<A, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>, BitTranspose>, BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>> for InnerProductwhere
A: Architecture,
The strategy is to compute the inner product <x, y> by decomposing the problem into
groups of 64-dimensions.
For each group, we load the 64-bits of y into a word bits. And the four 64-bit words
of the group in x in b0, b1, b2, and b3`.
Note that bit i in b0 is bit-0 of the i-th value in ths group. Likewise, bit i
in b1 is bit-1 of the same word.
This means that we can compute the partial inner product for this group as
(bits & b0).count_ones() // Contribution of bit 0
+ 2 * (bits & b1).count_ones() // Contribution of bit 1
+ 4 * (bits & b2).count_ones() // Contribution of bit 2
+ 8 * (bits & b3).count_ones() // Contribution of bit 3We process as many full groups as we can.
To handle the remainder, we need to be careful about acessing y because BitSlice
only guarantees the validity of reads at the byte level. That is - we cannot assume that
a full 64-bit read is valid.
The bit-tranposed x, on the other hand, guarantees allocations in blocks of
4 * 64-bits, so it can be treated as normal.
Source§impl<A> Target2<A, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>> for InnerProductwhere
A: Architecture,
InnerProduct: for<'a> Target2<A, MathematicalValue<f32>, &'a [u8], &'a [u8]>,
Compute the inner product between x and y.
impl<A> Target2<A, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>> for InnerProductwhere
A: Architecture,
InnerProduct: for<'a> Target2<A, MathematicalValue<f32>, &'a [u8], &'a [u8]>,
Compute the inner product between x and y.
Returns an error if the arguments have different lengths.
§Implementation Notes
This can directly invoke the methods implemented in vector because
BitSlice<'_, 8, Unsigned> is isomorphic to &[u8].
Source§impl<const N: usize> Target2<Scalar, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<N, Unsigned, SlicePtr<'_, u8>>> for InnerProductwhere
Unsigned: Representation<N>,
impl<const N: usize> Target2<Scalar, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<N, Unsigned, SlicePtr<'_, u8>>> for InnerProductwhere
Unsigned: Representation<N>,
Source§impl Target2<Scalar, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Compute the inner product between x and y.
impl Target2<Scalar, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Compute the inner product between x and y.
Returns an error if the arguments have different lengths.
§Performance
This function uses a generic implementation and therefore is not very fast.
Source§impl Target2<Scalar, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Compute the inner product between x and y.
impl Target2<Scalar, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Compute the inner product between x and y.
Returns an error if the arguments have different lengths.
§Performance
This function uses a generic implementation and therefore is not very fast.
Source§impl Target2<Scalar, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Compute the inner product between x and y.
impl Target2<Scalar, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Compute the inner product between x and y.
Returns an error if the arguments have different lengths.
§Performance
This function uses a generic implementation and therefore is not very fast.
Source§impl Target2<Scalar, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Compute the inner product between x and y.
impl Target2<Scalar, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Compute the inner product between x and y.
Returns an error if the arguments have different lengths.
§Performance
This function uses a generic implementation and therefore is not very fast.
Source§impl Target2<Scalar, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Compute the inner product between x and y.
impl Target2<Scalar, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Compute the inner product between x and y.
Returns an error if the arguments have different lengths.
§Performance
This function uses a generic implementation and therefore is not very fast.
Source§impl Target2<Scalar, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Compute the inner product between x and y.
impl Target2<Scalar, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Compute the inner product between x and y.
Returns an error if the arguments have different lengths.
§Performance
This function uses a generic implementation and therefore is not very fast.
Source§impl Target2<V3, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Available on x86-64 only.The main trick here is avoiding explicit conversion from 1 bit integers to 32-bit
floating-point numbers by using _mm256_permutevar_ps, which performs a shuffle on two
independent 128-bit lanes of f32 values in a register A using the lower 2-bits of
each 32-bit integer in a register B.
impl Target2<V3, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
The main trick here is avoiding explicit conversion from 1 bit integers to 32-bit
floating-point numbers by using _mm256_permutevar_ps, which performs a shuffle on two
independent 128-bit lanes of f32 values in a register A using the lower 2-bits of
each 32-bit integer in a register B.
Importantly, this instruction only takes a single cycle and we can avoid any kind of
masking. Going the route of conversion would require and AND operation to isolate
bottom bits and a somewhat lengthy 32-bit integer to f32 conversion instruction.
The overall strategy broadcasts a 32-bit integer (consisting of 32, 1-bit values) across
8 lanes into a register A.
Each lane is then shifted by a different amount so:
- Lane 0 has value 0 as its least significant bit (LSB)
- Lane 1 has value 1 as its LSB.
- Lane 2 has value 2 as its LSB.
- etc.
These LSB’s are used to power the shuffle function to convert to f32 values (either
0.0 or 1.0) and we can FMA as needed.
To process the next group of 8 values, we shift all lanes in A by 8-bits so lane 0
has value 8 as its LSB, lane 1 has value 9 etc.
A total of three shifts are applied to extract all 32 1-bit value as f32 in order.
Source§impl Target2<V3, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Available on x86-64 only.The strategy used here is almost identical to that used for 1-bit distances. The main
difference is that now we use the full 2-bit shuffle capabilities of _mm256_permutevar_ps
and ths relatives sizes of the shifts are slightly different.
impl Target2<V3, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
The strategy used here is almost identical to that used for 1-bit distances. The main
difference is that now we use the full 2-bit shuffle capabilities of _mm256_permutevar_ps
and ths relatives sizes of the shifts are slightly different.
Source§impl Target2<V3, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
impl Target2<V3, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Source§impl Target2<V3, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Available on x86-64 only.The strategy here is similar to the 1 and 2-bit strategies. However, instead of using
_mm256_permutevar_ps, we now go directly for 32-bit integer to 32-bit floating point.
impl Target2<V3, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
The strategy here is similar to the 1 and 2-bit strategies. However, instead of using
_mm256_permutevar_ps, we now go directly for 32-bit integer to 32-bit floating point.
This is because the shuffle intrinsic only supports 2-bit shuffles.
Source§impl Target2<V3, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
impl Target2<V3, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Source§impl Target2<V3, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
impl Target2<V3, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Source§impl Target2<V3, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
impl Target2<V3, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Source§impl Target2<V3, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
impl Target2<V3, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Source§impl Target2<V3, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Available on x86-64 only.Compute the inner product between x and y.
impl Target2<V3, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Compute the inner product between x and y.
Returns an error if the arguments have different lengths.
§Implementation Notes
This implementation is optimized around x86 with the AVX2 vector extension.
Specifically, we try to hit Wide::<i32, 8> as SIMDDotProduct<Wide<i16, 8>> so we can
hit the _mm256_madd_epi16 intrinsic.
Also note that AVX2 does not have 16-bit integer bit-shift instructions. Instead, we have to use 32-bit integer shifts and then bit-cast to 16-bit intrinsics. This works because we need to apply the same shift to all lanes.
Source§impl Target2<V3, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
impl Target2<V3, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Source§impl Target2<V3, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Available on x86-64 only.Compute the inner product between x and y.
impl Target2<V3, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Compute the inner product between x and y.
Returns an error if the arguments have different lengths.
§Implementation Notes
This implementation is optimized around x86 with the AVX2 vector extension.
Specifically, we try to hit Wide::<i32, 8> as SIMDDotProduct<Wide<i16, 8>> so we can
hit the _mm256_madd_epi16 intrinsic.
Also note that AVX2 does not have 16-bit integer bit-shift instructions. Instead, we have to use 32-bit integer shifts and then bit-cast to 16-bit intrinsics. This works because we need to apply the same shift to all lanes.
Source§impl Target2<V3, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
impl Target2<V3, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Source§impl Target2<V3, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
impl Target2<V3, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Source§impl Target2<V3, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
impl Target2<V3, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Source§impl Target2<V4, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
impl Target2<V4, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Source§impl Target2<V4, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
impl Target2<V4, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Source§impl Target2<V4, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
impl Target2<V4, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Source§impl Target2<V4, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
impl Target2<V4, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Source§impl Target2<V4, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
impl Target2<V4, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Source§impl Target2<V4, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
impl Target2<V4, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Source§impl Target2<V4, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
impl Target2<V4, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Source§impl Target2<V4, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
impl Target2<V4, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Source§impl Target2<V4, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Available on x86-64 only.Compute the inner product between x and y.
impl Target2<V4, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Compute the inner product between x and y.
Returns an error if the arguments have different lengths.
§Implementation Notes
This is optimized around the __mm512_dpbusd_epi32 VNNI instruction, which computes the
pairwise dot product between vectors of 8-bit integers and accumulates groups of 4 with
an i32 accumulation vector.
One quirk of this instruction is that one argument must be unsigned and the other must be signed. Since thie kernsl works on 2-bit integers, this is not a limitation. Just something to be aware of.
Since AVX512 does not have an 8-bit shift instruction, we generally load data as
u32x16 (which has a native shift) and bit-cast it to u8x64 as needed.
Source§impl Target2<V4, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
impl Target2<V4, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Source§impl Target2<V4, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
impl Target2<V4, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Source§impl Target2<V4, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
impl Target2<V4, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Source§impl Target2<V4, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
impl Target2<V4, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Source§impl Target2<V4, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
impl Target2<V4, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
impl Copy for InnerProduct
Auto Trait Implementations§
impl Freeze for InnerProduct
impl RefUnwindSafe for InnerProduct
impl Send for InnerProduct
impl Sync for InnerProduct
impl Unpin for InnerProduct
impl UnsafeUnpin for InnerProduct
impl UnwindSafe for InnerProduct
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more