pub struct InnerProduct;Expand description
Compute the inner-product between vector-like types.
Trait Implementations§
Source§impl Clone for InnerProduct
impl Clone for InnerProduct
Source§fn clone(&self) -> InnerProduct
fn clone(&self) -> InnerProduct
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreSource§impl Debug for InnerProduct
impl Debug for InnerProduct
Source§impl<A, B, To> DistanceFunction<A, B, To> for InnerProductwhere
InnerProduct: PureDistanceFunction<A, B, To>,
impl<A, B, To> DistanceFunction<A, B, To> for InnerProductwhere
InnerProduct: PureDistanceFunction<A, B, To>,
Source§fn evaluate_similarity(&self, a: A, b: B) -> To
fn evaluate_similarity(&self, a: A, b: B) -> To
Source§impl PureDistanceFunction<&[f32], BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<f32>, UnequalLengths>> for InnerProduct
Compute the inner product between x and y.
impl PureDistanceFunction<&[f32], BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<f32>, UnequalLengths>> for InnerProduct
Compute the inner product between x and y.
Returns an error if the arguments have different lengths.
Source§impl PureDistanceFunction<&[f32], BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<f32>, UnequalLengths>> for InnerProduct
Compute the inner product between x and y.
impl PureDistanceFunction<&[f32], BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<f32>, UnequalLengths>> for InnerProduct
Compute the inner product between x and y.
Returns an error if the arguments have different lengths.
Source§impl PureDistanceFunction<&[f32], BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<f32>, UnequalLengths>> for InnerProduct
Compute the inner product between x and y.
impl PureDistanceFunction<&[f32], BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<f32>, UnequalLengths>> for InnerProduct
Compute the inner product between x and y.
Returns an error if the arguments have different lengths.
Source§impl PureDistanceFunction<&[f32], BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<f32>, UnequalLengths>> for InnerProduct
Compute the inner product between x and y.
impl PureDistanceFunction<&[f32], BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<f32>, UnequalLengths>> for InnerProduct
Compute the inner product between x and y.
Returns an error if the arguments have different lengths.
Source§impl PureDistanceFunction<&[f32], BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<f32>, UnequalLengths>> for InnerProduct
Compute the inner product between x and y.
impl PureDistanceFunction<&[f32], BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<f32>, UnequalLengths>> for InnerProduct
Compute the inner product between x and y.
Returns an error if the arguments have different lengths.
Source§impl PureDistanceFunction<&[f32], BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<f32>, UnequalLengths>> for InnerProduct
Compute the inner product between x and y.
impl PureDistanceFunction<&[f32], BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<f32>, UnequalLengths>> for InnerProduct
Compute the inner product between x and y.
Returns an error if the arguments have different lengths.
Source§impl PureDistanceFunction<&[f32], BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<f32>, UnequalLengths>> for InnerProduct
Compute the inner product between x and y.
impl PureDistanceFunction<&[f32], BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<f32>, UnequalLengths>> for InnerProduct
Compute the inner product between x and y.
Returns an error if the arguments have different lengths.
Source§impl PureDistanceFunction<&[f32], BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<f32>, UnequalLengths>> for InnerProduct
Compute the inner product between x and y.
impl PureDistanceFunction<&[f32], BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<f32>, UnequalLengths>> for InnerProduct
Compute the inner product between x and y.
Returns an error if the arguments have different lengths.
Source§impl PureDistanceFunction<BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct
impl PureDistanceFunction<BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct
Source§impl PureDistanceFunction<BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct
impl PureDistanceFunction<BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct
Source§impl PureDistanceFunction<BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct
impl PureDistanceFunction<BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct
Source§impl PureDistanceFunction<BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct
impl PureDistanceFunction<BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct
Source§impl PureDistanceFunction<BitSliceBase<4, Unsigned, SlicePtr<'_, u8>, BitTranspose>, BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct
impl PureDistanceFunction<BitSliceBase<4, Unsigned, SlicePtr<'_, u8>, BitTranspose>, BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct
Source§impl PureDistanceFunction<BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct
impl PureDistanceFunction<BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct
Source§impl PureDistanceFunction<BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct
impl PureDistanceFunction<BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct
Source§impl PureDistanceFunction<BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct
impl PureDistanceFunction<BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct
Source§impl PureDistanceFunction<BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct
impl PureDistanceFunction<BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct
Source§impl PureDistanceFunction<BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct
impl PureDistanceFunction<BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct
Source§impl PureDistanceFunction<BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct
impl PureDistanceFunction<BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct
Source§impl PureDistanceFunction<BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct
impl PureDistanceFunction<BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct
Source§impl<A> Target2<A, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>> for InnerProductwhere
A: Architecture,
Compute the inner product between bitvectors x and y.
impl<A> Target2<A, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>> for InnerProductwhere
A: Architecture,
Compute the inner product between bitvectors x and y.
Returns an error if the arguments have different lengths.
Source§impl<A> Target2<A, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>, BitTranspose>, BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>> for InnerProductwhere
A: Architecture,
The strategy is to compute the inner product <x, y> by decomposing the problem into
groups of 64-dimensions.
impl<A> Target2<A, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>, BitTranspose>, BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>> for InnerProductwhere
A: Architecture,
The strategy is to compute the inner product <x, y> by decomposing the problem into
groups of 64-dimensions.
For each group, we load the 64-bits of y into a word bits. And the four 64-bit words
of the group in x in b0, b1, b2, and b3`.
Note that bit i in b0 is bit-0 of the i-th value in ths group. Likewise, bit i
in b1 is bit-1 of the same word.
This means that we can compute the partial inner product for this group as
(bits & b0).count_ones() // Contribution of bit 0
+ 2 * (bits & b1).count_ones() // Contribution of bit 1
+ 4 * (bits & b2).count_ones() // Contribution of bit 2
+ 8 * (bits & b3).count_ones() // Contribution of bit 3We process as many full groups as we can.
To handle the remainder, we need to be careful about acessing y because BitSlice
only guarantees the validity of reads at the byte level. That is - we cannot assume that
a full 64-bit read is valid.
The bit-tranposed x, on the other hand, guarantees allocations in blocks of
4 * 64-bits, so it can be treated as normal.
Source§impl<A> Target2<A, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>> for InnerProductwhere
A: Architecture,
InnerProduct: for<'a> Target2<A, MathematicalValue<f32>, &'a [u8], &'a [u8]>,
Compute the inner product between x and y.
impl<A> Target2<A, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>> for InnerProductwhere
A: Architecture,
InnerProduct: for<'a> Target2<A, MathematicalValue<f32>, &'a [u8], &'a [u8]>,
Compute the inner product between x and y.
Returns an error if the arguments have different lengths.
§Implementation Notes
This can directly invoke the methods implemented in vector because
BitSlice<'_, 8, Unsigned> is isomorphic to &[u8].
Source§impl<const N: usize> Target2<Scalar, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<N, Unsigned, SlicePtr<'_, u8>>> for InnerProductwhere
Unsigned: Representation<N>,
impl<const N: usize> Target2<Scalar, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<N, Unsigned, SlicePtr<'_, u8>>> for InnerProductwhere
Unsigned: Representation<N>,
Source§impl Target2<Scalar, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Compute the inner product between x and y.
impl Target2<Scalar, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Compute the inner product between x and y.
Returns an error if the arguments have different lengths.
§Performance
This function uses a generic implementation and therefore is not very fast.
Source§impl Target2<Scalar, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Compute the inner product between x and y.
impl Target2<Scalar, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Compute the inner product between x and y.
Returns an error if the arguments have different lengths.
§Performance
This function uses a generic implementation and therefore is not very fast.
Source§impl Target2<Scalar, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Compute the inner product between x and y.
impl Target2<Scalar, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Compute the inner product between x and y.
Returns an error if the arguments have different lengths.
§Performance
This function uses a generic implementation and therefore is not very fast.
Source§impl Target2<Scalar, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Compute the inner product between x and y.
impl Target2<Scalar, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Compute the inner product between x and y.
Returns an error if the arguments have different lengths.
§Performance
This function uses a generic implementation and therefore is not very fast.
Source§impl Target2<Scalar, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Compute the inner product between x and y.
impl Target2<Scalar, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Compute the inner product between x and y.
Returns an error if the arguments have different lengths.
§Performance
This function uses a generic implementation and therefore is not very fast.
Source§impl Target2<Scalar, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Compute the inner product between x and y.
impl Target2<Scalar, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Compute the inner product between x and y.
Returns an error if the arguments have different lengths.
§Performance
This function uses a generic implementation and therefore is not very fast.
Source§impl Target2<Scalar, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Compute the inner product between x and y.
impl Target2<Scalar, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Compute the inner product between x and y.
Returns an error if the arguments have different lengths.
§Performance
This function uses a generic implementation and therefore is not very fast.
Source§impl Target2<Scalar, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Compute the inner product between x and y.
impl Target2<Scalar, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Compute the inner product between x and y.
Returns an error if the arguments have different lengths.
§Performance
This function uses a generic implementation and therefore is not very fast.
Source§impl Target2<Scalar, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Compute the inner product between x and y.
impl Target2<Scalar, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Compute the inner product between x and y.
Returns an error if the arguments have different lengths.
§Performance
This function uses a generic implementation and therefore is not very fast.
Source§impl Target2<V3, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Available on x86-64 only.The main trick here is avoiding explicit conversion from 1 bit integers to 32-bit
floating-point numbers by using _mm256_permutevar_ps, which performs a shuffle on two
independent 128-bit lanes of f32 values in a register A using the lower 2-bits of
each 32-bit integer in a register B.
impl Target2<V3, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
The main trick here is avoiding explicit conversion from 1 bit integers to 32-bit
floating-point numbers by using _mm256_permutevar_ps, which performs a shuffle on two
independent 128-bit lanes of f32 values in a register A using the lower 2-bits of
each 32-bit integer in a register B.
Importantly, this instruction only takes a single cycle and we can avoid any kind of
masking. Going the route of conversion would require and AND operation to isolate
bottom bits and a somewhat lengthy 32-bit integer to f32 conversion instruction.
The overall strategy broadcasts a 32-bit integer (consisting of 32, 1-bit values) across
8 lanes into a register A.
Each lane is then shifted by a different amount so:
- Lane 0 has value 0 as its least significant bit (LSB)
- Lane 1 has value 1 as its LSB.
- Lane 2 has value 2 as its LSB.
- etc.
These LSB’s are used to power the shuffle function to convert to f32 values (either
0.0 or 1.0) and we can FMA as needed.
To process the next group of 8 values, we shift all lanes in A by 8-bits so lane 0
has value 8 as its LSB, lane 1 has value 9 etc.
A total of three shifts are applied to extract all 32 1-bit value as f32 in order.
Source§impl Target2<V3, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Available on x86-64 only.The strategy used here is almost identical to that used for 1-bit distances. The main
difference is that now we use the full 2-bit shuffle capabilities of _mm256_permutevar_ps
and ths relatives sizes of the shifts are slightly different.
impl Target2<V3, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
The strategy used here is almost identical to that used for 1-bit distances. The main
difference is that now we use the full 2-bit shuffle capabilities of _mm256_permutevar_ps
and ths relatives sizes of the shifts are slightly different.
Source§impl Target2<V3, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
impl Target2<V3, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Source§impl Target2<V3, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Available on x86-64 only.The strategy here is similar to the 1 and 2-bit strategies. However, instead of using
_mm256_permutevar_ps, we now go directly for 32-bit integer to 32-bit floating point.
impl Target2<V3, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
The strategy here is similar to the 1 and 2-bit strategies. However, instead of using
_mm256_permutevar_ps, we now go directly for 32-bit integer to 32-bit floating point.
This is because the shuffle intrinsic only supports 2-bit shuffles.
Source§impl Target2<V3, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
impl Target2<V3, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Source§impl Target2<V3, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
impl Target2<V3, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Source§impl Target2<V3, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
impl Target2<V3, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Source§impl Target2<V3, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
impl Target2<V3, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Source§impl Target2<V3, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Available on x86-64 only.Compute the inner product between x and y.
impl Target2<V3, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Compute the inner product between x and y.
Returns an error if the arguments have different lengths.
§Implementation Notes
This implementation is optimized around x86 with the AVX2 vector extension.
Specifically, we try to hit Wide::<i32, 8> as SIMDDotProduct<Wide<i16, 8>> so we can
hit the _mm256_madd_epi16 intrinsic.
Also note that AVX2 does not have 16-bit integer bit-shift instructions. Instead, we have to use 32-bit integer shifts and then bit-cast to 16-bit intrinsics. This works because we need to apply the same shift to all lanes.
Source§impl Target2<V3, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
impl Target2<V3, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Source§impl Target2<V3, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Available on x86-64 only.Compute the inner product between x and y.
impl Target2<V3, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Compute the inner product between x and y.
Returns an error if the arguments have different lengths.
§Implementation Notes
This implementation is optimized around x86 with the AVX2 vector extension.
Specifically, we try to hit Wide::<i32, 8> as SIMDDotProduct<Wide<i16, 8>> so we can
hit the _mm256_madd_epi16 intrinsic.
Also note that AVX2 does not have 16-bit integer bit-shift instructions. Instead, we have to use 32-bit integer shifts and then bit-cast to 16-bit intrinsics. This works because we need to apply the same shift to all lanes.
Source§impl Target2<V3, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
impl Target2<V3, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Source§impl Target2<V3, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
impl Target2<V3, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Source§impl Target2<V3, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
impl Target2<V3, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Source§impl Target2<V3, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Available on x86-64 only.
impl Target2<V3, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Source§fn run(
self,
arch: V3,
x: BitSlice<'_, N, Unsigned, Dense>,
y: BitSlice<'_, N, Unsigned, Dense>,
) -> MathematicalResult<u32>
fn run( self, arch: V3, x: BitSlice<'_, N, Unsigned, Dense>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>
Computes the inner product of 8-bit unsigned × 1-bit unsigned vectors using V3 intrinsics.
For each 32-element block we load 32 bytes from x and 4 bytes (32 bits) from y.
ANDing the data with the mask created from 4 bytes from y zeroes unselected lanes.
Finally, _mm256_sad_epu8 horizontally sums the masked bytes in groups of 8.
The main loop is 4× unrolled, processing 128 elements per iteration.
§Overflow
Each sad output lane holds at most 8 × 255 = 2_040. Accumulated across d/32
blocks, the per-lane max is (d/32) × 2_040. At dim = 3072: 96 × 2_040 = 195_840,
well within i32 range.
Source§impl Target2<V3, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Available on x86-64 only.
impl Target2<V3, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Source§fn run(
self,
arch: V3,
x: BitSlice<'_, N, Unsigned, Dense>,
y: BitSlice<'_, N, Unsigned, Dense>,
) -> MathematicalResult<u32>
fn run( self, arch: V3, x: BitSlice<'_, N, Unsigned, Dense>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>
Computes the inner product of 8-bit unsigned × 2-bit unsigned vectors using AVX2.
§Strategy
Unpack each 16-byte chunk of y into 64 crumb values via a two-level cascade:
first [unpack_half_bytes] splits bytes into nibbles, then a second pass splits
nibbles into crumbs (masked with 0x03). Each unpacked half is paired with 32
bytes of x and multiplied via _mm256_maddubs_epi16.
The main loop is 4× unrolled: eight i16 products (4 blocks × 2 halves) are summed
in i16 before a single _mm256_madd_epi16(…, 1) widens to i32. This is safe
because 8 × (255 × 3 × 2) = 12_240 < i16::MAX.
Source§impl Target2<V3, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Available on x86-64 only.
impl Target2<V3, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Source§fn run(
self,
arch: V3,
x: BitSlice<'_, N, Unsigned, Dense>,
y: BitSlice<'_, N, Unsigned, Dense>,
) -> MathematicalResult<u32>
fn run( self, arch: V3, x: BitSlice<'_, N, Unsigned, Dense>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>
Computes the inner product of 8-bit unsigned × 4-bit unsigned vectors using V3 intrinsics.
§Strategy
Unpack each 16-byte chunk of y into 32 nibble values via [unpack_half_bytes],
then multiply with the corresponding 32 bytes of x using _mm256_maddubs_epi16
(u8 × u8 → i16, pairwise horizontal add).
The main loop is 4× unrolled: four i16 products are summed in i16 before a single
_mm256_madd_epi16(…, 1) widens to i32. This is safe because
4 × (255 × 15 × 2) = 30_600 < i16::MAX.
Source§impl Target2<V4, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
impl Target2<V4, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Source§impl Target2<V4, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
impl Target2<V4, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Source§impl Target2<V4, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
impl Target2<V4, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Source§impl Target2<V4, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
impl Target2<V4, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Source§impl Target2<V4, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
impl Target2<V4, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Source§impl Target2<V4, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
impl Target2<V4, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Source§impl Target2<V4, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
impl Target2<V4, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Source§impl Target2<V4, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
impl Target2<V4, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Source§impl Target2<V4, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Available on x86-64 only.Compute the inner product between x and y.
impl Target2<V4, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Compute the inner product between x and y.
Returns an error if the arguments have different lengths.
§Implementation Notes
This is optimized around the __mm512_dpbusd_epi32 VNNI instruction, which computes the
pairwise dot product between vectors of 8-bit integers and accumulates groups of 4 with
an i32 accumulation vector.
One quirk of this instruction is that one argument must be unsigned and the other must be signed. Since thie kernsl works on 2-bit integers, this is not a limitation. Just something to be aware of.
Since AVX512 does not have an 8-bit shift instruction, we generally load data as
u32x16 (which has a native shift) and bit-cast it to u8x64 as needed.
Source§impl Target2<V4, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
impl Target2<V4, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Source§impl Target2<V4, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
impl Target2<V4, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Source§impl Target2<V4, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
impl Target2<V4, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Source§impl Target2<V4, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
impl Target2<V4, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Source§impl Target2<V4, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
impl Target2<V4, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Source§impl Target2<V4, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
impl Target2<V4, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Source§impl Target2<V4, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
impl Target2<V4, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
Source§impl Target2<V4, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
impl Target2<V4, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
impl Copy for InnerProduct
Auto Trait Implementations§
impl Freeze for InnerProduct
impl RefUnwindSafe for InnerProduct
impl Send for InnerProduct
impl Sync for InnerProduct
impl Unpin for InnerProduct
impl UnsafeUnpin for InnerProduct
impl UnwindSafe for InnerProduct
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more