Struct InnerProduct

(bits & b0).count_ones()                // Contribution of bit 0
    + 2 * (bits & b1).count_ones()      // Contribution of bit 1
    + 4 * (bits & b2).count_ones()      // Contribution of bit 2
    + 8 * (bits & b3).count_ones()      // Contribution of bit 3

We process as many full groups as we can.

To handle the remainder, we need to be careful about acessing y because BitSlice only guarantees the validity of reads at the byte level. That is - we cannot assume that a full 64-bit read is valid.

The bit-tranposed x, on the other hand, guarantees allocations in blocks of 4 * 64-bits, so it can be treated as normal.

Source §

fn run( self, _: A, x: BitSlice<'_, N, Unsigned, BitTranspose>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>

Run the operation with the provided Architecture.

Source §

impl<A> Target2<A, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
where A: Architecture, InnerProduct: for<'a> Target2<A, MathematicalValue<f32>, &'a [u8], &'a [u8]>,

Compute the inner product between x and y.

Returns an error if the arguments have different lengths.

§Implementation Notes

This can directly invoke the methods implemented in vector because BitSlice<'_, 8, Unsigned> is isomorphic to &[u8].

Source §

fn run( self, arch: A, x: BitSlice<'_, N, Unsigned, Dense>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>

Run the operation with the provided Architecture.

Source §

impl<const N: usize> Target2<Scalar, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<N, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
where Unsigned: Representation<N>,

Source §

fn run( self, _: Scalar, x: &[f32], y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<f32>

A fallback implementation that uses scaler indexing to retrieve values from the corresponding BitSlice.

Source §

impl Target2<Scalar, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>> for InnerProduct

Compute the inner product between x and y.

Returns an error if the arguments have different lengths.

§Performance

This function uses a generic implementation and therefore is not very fast.

Source §

fn run( self, _: Scalar, x: BitSlice<'_, N, Unsigned, Dense>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>

Run the operation with the provided Architecture.

Source §

impl Target2<Scalar, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>> for InnerProduct

Compute the inner product between x and y.

Returns an error if the arguments have different lengths.

§Performance

This function uses a generic implementation and therefore is not very fast.

Source §

fn run( self, _: Scalar, x: BitSlice<'_, N, Unsigned, Dense>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>

Run the operation with the provided Architecture.

Source §

impl Target2<Scalar, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>> for InnerProduct

Compute the inner product between x and y.

Returns an error if the arguments have different lengths.

§Performance

This function uses a generic implementation and therefore is not very fast.

Source §

fn run( self, _: Scalar, x: BitSlice<'_, N, Unsigned, Dense>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>

Run the operation with the provided Architecture.

Source §

impl Target2<Scalar, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>> for InnerProduct

Compute the inner product between x and y.

Returns an error if the arguments have different lengths.

§Performance

This function uses a generic implementation and therefore is not very fast.

Source §

fn run( self, _: Scalar, x: BitSlice<'_, N, Unsigned, Dense>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>

Run the operation with the provided Architecture.

Source §

impl Target2<Scalar, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>> for InnerProduct

Compute the inner product between x and y.

Returns an error if the arguments have different lengths.

§Performance

This function uses a generic implementation and therefore is not very fast.

Source §

fn run( self, _: Scalar, x: BitSlice<'_, N, Unsigned, Dense>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>

Run the operation with the provided Architecture.

Source §

impl Target2<Scalar, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>> for InnerProduct

Compute the inner product between x and y.

Returns an error if the arguments have different lengths.

§Performance

This function uses a generic implementation and therefore is not very fast.

Source §

fn run( self, _: Scalar, x: BitSlice<'_, N, Unsigned, Dense>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>

Run the operation with the provided Architecture.

Source §

impl Target2<Scalar, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>> for InnerProduct

Compute the inner product between x and y.

Returns an error if the arguments have different lengths.

§Performance

This function uses a generic implementation and therefore is not very fast.

Source §

fn run( self, _: Scalar, x: BitSlice<'_, N, Unsigned, Dense>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>

Run the operation with the provided Architecture.

Source §

impl Target2<Scalar, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>> for InnerProduct

Compute the inner product between x and y.

Returns an error if the arguments have different lengths.

§Performance

This function uses a generic implementation and therefore is not very fast.

Source §

fn run( self, _: Scalar, x: BitSlice<'_, N, Unsigned, Dense>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>

Run the operation with the provided Architecture.

Source §

impl Target2<Scalar, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>> for InnerProduct

Compute the inner product between x and y.

Returns an error if the arguments have different lengths.

§Performance

This function uses a generic implementation and therefore is not very fast.

Source §

fn run( self, _: Scalar, x: BitSlice<'_, N, Unsigned, Dense>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>

Run the operation with the provided Architecture.

Source §

impl Target2<V3, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>> for InnerProduct

Available on x86-64 only.

The main trick here is avoiding explicit conversion from 1 bit integers to 32-bit floating-point numbers by using _mm256_permutevar_ps, which performs a shuffle on two independent 128-bit lanes of f32 values in a register A using the lower 2-bits of each 32-bit integer in a register B.

Importantly, this instruction only takes a single cycle and we can avoid any kind of masking. Going the route of conversion would require and AND operation to isolate bottom bits and a somewhat lengthy 32-bit integer to f32 conversion instruction.

The overall strategy broadcasts a 32-bit integer (consisting of 32, 1-bit values) across 8 lanes into a register A.

Each lane is then shifted by a different amount so:

Lane 0 has value 0 as its least significant bit (LSB)
Lane 1 has value 1 as its LSB.
Lane 2 has value 2 as its LSB.
etc.

These LSB’s are used to power the shuffle function to convert to f32 values (either 0.0 or 1.0) and we can FMA as needed.

To process the next group of 8 values, we shift all lanes in A by 8-bits so lane 0 has value 8 as its LSB, lane 1 has value 9 etc.

A total of three shifts are applied to extract all 32 1-bit value as f32 in order.

Source §

fn run( self, arch: V3, x: &[f32], y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<f32>

Run the operation with the provided Architecture.

Source §

impl Target2<V3, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>> for InnerProduct

Available on x86-64 only.

The strategy used here is almost identical to that used for 1-bit distances. The main difference is that now we use the full 2-bit shuffle capabilities of _mm256_permutevar_ps and ths relatives sizes of the shifts are slightly different.

Source §

fn run( self, arch: V3, x: &[f32], y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<f32>

Run the operation with the provided Architecture.

Source §

impl Target2<V3, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>> for InnerProduct

Source §

fn run( self, arch: V3, x: &[f32], y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<f32>

Run the operation with the provided Architecture.

Source §

impl Target2<V3, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>> for InnerProduct

Available on x86-64 only.

The strategy here is similar to the 1 and 2-bit strategies. However, instead of using _mm256_permutevar_ps, we now go directly for 32-bit integer to 32-bit floating point.

This is because the shuffle intrinsic only supports 2-bit shuffles.

Source §

fn run( self, arch: V3, x: &[f32], y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<f32>

Run the operation with the provided Architecture.

Source §

impl Target2<V3, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>> for InnerProduct

Source §

fn run( self, arch: V3, x: &[f32], y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<f32>

Run the operation with the provided Architecture.

Source §

impl Target2<V3, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>> for InnerProduct

Source §

fn run( self, arch: V3, x: &[f32], y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<f32>

Run the operation with the provided Architecture.

Source §

impl Target2<V3, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>> for InnerProduct

Source §

fn run( self, arch: V3, x: &[f32], y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<f32>

Run the operation with the provided Architecture.

Source §

impl Target2<V3, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>> for InnerProduct

Source §

fn run( self, arch: V3, x: &[f32], y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<f32>

Run the operation with the provided Architecture.

Source §

impl Target2<V3, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>> for InnerProduct

Available on x86-64 only.

Compute the inner product between x and y.

Returns an error if the arguments have different lengths.

§Implementation Notes

This implementation is optimized around x86 with the AVX2 vector extension. Specifically, we try to hit Wide::<i32, 8> as SIMDDotProduct<Wide<i16, 8>> so we can hit the _mm256_madd_epi16 intrinsic.

Also note that AVX2 does not have 16-bit integer bit-shift instructions. Instead, we have to use 32-bit integer shifts and then bit-cast to 16-bit intrinsics. This works because we need to apply the same shift to all lanes.

Source §

fn run( self, arch: V3, x: BitSlice<'_, N, Unsigned, Dense>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>

Run the operation with the provided Architecture.

Source §

impl Target2<V3, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>> for InnerProduct

Source §

fn run( self, arch: V3, x: BitSlice<'_, N, Unsigned, Dense>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>

Run the operation with the provided Architecture.

Source §

impl Target2<V3, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>> for InnerProduct

Available on x86-64 only.

Compute the inner product between x and y.

Returns an error if the arguments have different lengths.

§Implementation Notes

This implementation is optimized around x86 with the AVX2 vector extension. Specifically, we try to hit Wide::<i32, 8> as SIMDDotProduct<Wide<i16, 8>> so we can hit the _mm256_madd_epi16 intrinsic.

Also note that AVX2 does not have 16-bit integer bit-shift instructions. Instead, we have to use 32-bit integer shifts and then bit-cast to 16-bit intrinsics. This works because we need to apply the same shift to all lanes.

Source §

fn run( self, arch: V3, x: BitSlice<'_, N, Unsigned, Dense>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>

Run the operation with the provided Architecture.

Source §

impl Target2<V3, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>> for InnerProduct

Source §

fn run( self, arch: V3, x: BitSlice<'_, N, Unsigned, Dense>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>

Run the operation with the provided Architecture.

Source §

impl Target2<V3, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>> for InnerProduct

Source §

fn run( self, arch: V3, x: BitSlice<'_, N, Unsigned, Dense>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>

Run the operation with the provided Architecture.

Source §

impl Target2<V3, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>> for InnerProduct

Source §

fn run( self, arch: V3, x: BitSlice<'_, N, Unsigned, Dense>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>

Run the operation with the provided Architecture.

Source §

impl Target2<V3, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>> for InnerProduct

Available on x86-64 only.

Source §

fn run( self, arch: V3, x: BitSlice<'_, N, Unsigned, Dense>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>

Computes the inner product of 8-bit unsigned × 1-bit unsigned vectors using V3 intrinsics.

For each 32-element block we load 32 bytes from x and 4 bytes (32 bits) from y. ANDing the data with the mask created from 4 bytes from y zeroes unselected lanes. Finally, _mm256_sad_epu8 horizontally sums the masked bytes in groups of 8.

The main loop is 4× unrolled, processing 128 elements per iteration.

§Overflow

Each sad output lane holds at most 8 × 255 = 2_040. Accumulated across d/32 blocks, the per-lane max is (d/32) × 2_040. At dim = 3072: 96 × 2_040 = 195_840, well within i32 range.

Source §

impl Target2<V3, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>> for InnerProduct

Available on x86-64 only.

Source §

fn run( self, arch: V3, x: BitSlice<'_, N, Unsigned, Dense>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>

Computes the inner product of 8-bit unsigned × 2-bit unsigned vectors using AVX2.

§Strategy

Unpack each 16-byte chunk of y into 64 crumb values via a two-level cascade: first [unpack_half_bytes] splits bytes into nibbles, then a second pass splits nibbles into crumbs (masked with 0x03). Each unpacked half is paired with 32 bytes of x and multiplied via _mm256_maddubs_epi16.

The main loop is 4× unrolled: eight i16 products (4 blocks × 2 halves) are summed in i16 before a single _mm256_madd_epi16(…, 1) widens to i32. This is safe because 8 × (255 × 3 × 2) = 12_240 < i16::MAX.

Source §

impl Target2<V3, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>> for InnerProduct

Available on x86-64 only.

Source §

fn run( self, arch: V3, x: BitSlice<'_, N, Unsigned, Dense>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>

Computes the inner product of 8-bit unsigned × 4-bit unsigned vectors using V3 intrinsics.

§Strategy

Unpack each 16-byte chunk of y into 32 nibble values via [unpack_half_bytes], then multiply with the corresponding 32 bytes of x using _mm256_maddubs_epi16 (u8 × u8 → i16, pairwise horizontal add).

The main loop is 4× unrolled: four i16 products are summed in i16 before a single _mm256_madd_epi16(…, 1) widens to i32. This is safe because 4 × (255 × 15 × 2) = 30_600 < i16::MAX.

Source §

impl Target2<V4, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>> for InnerProduct

Source §

fn run( self, arch: V4, x: &[f32], y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<f32>

Run the operation with the provided Architecture.

Source §

impl Target2<V4, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>> for InnerProduct

Source §

fn run( self, arch: V4, x: &[f32], y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<f32>

Run the operation with the provided Architecture.

Source §

impl Target2<V4, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>> for InnerProduct

Source §

fn run( self, arch: V4, x: &[f32], y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<f32>

Run the operation with the provided Architecture.

Source §

impl Target2<V4, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>> for InnerProduct

Source §

fn run( self, arch: V4, x: &[f32], y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<f32>

Run the operation with the provided Architecture.

Source §

impl Target2<V4, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>> for InnerProduct

Source §

fn run( self, arch: V4, x: &[f32], y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<f32>

Run the operation with the provided Architecture.

Source §

impl Target2<V4, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>> for InnerProduct

Source §

fn run( self, arch: V4, x: &[f32], y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<f32>

Run the operation with the provided Architecture.

Source §

impl Target2<V4, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>> for InnerProduct

Source §

fn run( self, arch: V4, x: &[f32], y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<f32>

Run the operation with the provided Architecture.

Source §

impl Target2<V4, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>> for InnerProduct

Source §

fn run( self, arch: V4, x: &[f32], y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<f32>

Run the operation with the provided Architecture.

Source §

impl Target2<V4, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>> for InnerProduct

Available on x86-64 only.

Compute the inner product between x and y.

Returns an error if the arguments have different lengths.

§Implementation Notes

This is optimized around the __mm512_dpbusd_epi32 VNNI instruction, which computes the pairwise dot product between vectors of 8-bit integers and accumulates groups of 4 with an i32 accumulation vector.

One quirk of this instruction is that one argument must be unsigned and the other must be signed. Since thie kernsl works on 2-bit integers, this is not a limitation. Just something to be aware of.

Since AVX512 does not have an 8-bit shift instruction, we generally load data as u32x16 (which has a native shift) and bit-cast it to u8x64 as needed.

Source §

fn run( self, arch: V4, x: BitSlice<'_, N, Unsigned, Dense>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>

Run the operation with the provided Architecture.

Source §

impl Target2<V4, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>> for InnerProduct

Source §

fn run( self, arch: V4, x: BitSlice<'_, N, Unsigned, Dense>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>

Run the operation with the provided Architecture.

Source §

impl Target2<V4, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>> for InnerProduct

Source §

fn run( self, arch: V4, x: BitSlice<'_, N, Unsigned, Dense>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>

Run the operation with the provided Architecture.

Source §

impl Target2<V4, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>> for InnerProduct

Source §

fn run( self, arch: V4, x: BitSlice<'_, N, Unsigned, Dense>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>

Run the operation with the provided Architecture.

Source §

impl Target2<V4, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>> for InnerProduct

Source §

fn run( self, arch: V4, x: BitSlice<'_, N, Unsigned, Dense>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>

Run the operation with the provided Architecture.

Source §

impl Target2<V4, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>> for InnerProduct

Source §

fn run( self, arch: V4, x: BitSlice<'_, N, Unsigned, Dense>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>

Run the operation with the provided Architecture.

Source §

impl Target2<V4, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>> for InnerProduct

Source §

fn run( self, arch: V4, x: BitSlice<'_, N, Unsigned, Dense>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>

Run the operation with the provided Architecture.

Source §

impl Target2<V4, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>> for InnerProduct

Source §

fn run( self, arch: V4, x: BitSlice<'_, N, Unsigned, Dense>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>

Run the operation with the provided Architecture.

Source §

impl Target2<V4, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>> for InnerProduct

Source §

fn run( self, arch: V4, x: BitSlice<'_, N, Unsigned, Dense>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>

Run the operation with the provided Architecture.

Source §

impl Copy for InnerProduct

Auto Trait Implementations§

§

impl UnwindSafe for InnerProduct

Blanket Implementations§

Source §

impl<T> Any for T
where T: 'static + ?Sized,

Source §

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more

Source §

impl<T> Borrow<T> for T
where T: ?Sized,

Source §

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more

Source §

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source §

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more

Source §

impl<T> ByRef<T> for T

Source §

fn by_ref(&self) -> &T

Source §

impl<T> CloneToUninit for T
where T: Clone,

Source §

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)

Performs copy-assignment from self to dest. Read more

Source §

impl<T> From<T> for T

Source §

fn from(t: T) -> T

Returns the argument unchanged.

Source §

impl<T> Generator<T> for T
where T: Clone,

Source §

fn generate(&mut self) -> T

Source §

impl<T, U> Into for T
where U: From<T>,

Source §

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source §

impl<T> IntoEither for T

Source §

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

Source §

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

Source §

impl<T> Pointable for T

Source §

const ALIGN: usize

The alignment of pointer.

Source §

type Init = T

The type for initializers.

Source §

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more

Source §

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more

Source §

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more

Source §

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more

Source §

impl<T> ToOwned for T
where T: Clone,

Source §

type Owned = T

The resulting type after obtaining ownership.

Source §

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more

Source §

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more

Source §

impl<T, U> TryFrom for T
where U: Into<T>,

Source §

type Error = Infallible

The type returned in the event of a conversion error.

Source §

fn try_from(value: U) -> Result<T, <T as TryFrom>::Error>

Performs the conversion.

Source §

impl<T, U> TryInto for T
where U: TryFrom<T>,

Source §

type Error = >::Error

The type returned in the event of a conversion error.

Source §

fn try_into(self) -> Result<U, >::Error>

Performs the conversion.

Source §

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source §

fn vzip(self) -> V

Source §

InnerProduct

Struct InnerProduct Copy item path

Trait Implementations§

impl Clone for InnerProduct

fn clone(&self) -> InnerProduct

fn clone_from(&mut self, source: &Self)

impl Debug for InnerProduct

fn fmt(&self, f: &mut Formatter<'_>) -> Result

impl<A, B, To> DistanceFunction<A, B, To> for InnerProductwhere InnerProduct: PureDistanceFunction<A, B, To>,

fn evaluate_similarity(&self, a: A, b: B) -> To

impl PureDistanceFunction<&[f32], BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<f32>, UnequalLengths>> for InnerProduct

fn evaluate( x: &[f32], y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<f32>

impl PureDistanceFunction<&[f32], BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<f32>, UnequalLengths>> for InnerProduct

fn evaluate( x: &[f32], y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<f32>

impl PureDistanceFunction<&[f32], BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<f32>, UnequalLengths>> for InnerProduct

fn evaluate( x: &[f32], y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<f32>

impl PureDistanceFunction<&[f32], BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<f32>, UnequalLengths>> for InnerProduct

fn evaluate( x: &[f32], y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<f32>

impl PureDistanceFunction<&[f32], BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<f32>, UnequalLengths>> for InnerProduct

fn evaluate( x: &[f32], y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<f32>

impl PureDistanceFunction<&[f32], BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<f32>, UnequalLengths>> for InnerProduct

fn evaluate( x: &[f32], y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<f32>

impl PureDistanceFunction<&[f32], BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<f32>, UnequalLengths>> for InnerProduct

fn evaluate( x: &[f32], y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<f32>

impl PureDistanceFunction<&[f32], BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<f32>, UnequalLengths>> for InnerProduct

fn evaluate( x: &[f32], y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<f32>

impl PureDistanceFunction<BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct

fn evaluate( x: BitSlice<'_, N, Unsigned, Dense>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>

impl PureDistanceFunction<BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct

fn evaluate( x: BitSlice<'_, N, Unsigned, Dense>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>

impl PureDistanceFunction<BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct

fn evaluate( x: BitSlice<'_, N, Unsigned, Dense>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>

impl PureDistanceFunction<BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct

fn evaluate( x: BitSlice<'_, N, Unsigned, Dense>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>

impl PureDistanceFunction<BitSliceBase<4, Unsigned, SlicePtr<'_, u8>, BitTranspose>, BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct

fn evaluate( x: BitSlice<'_, N, Unsigned, BitTranspose>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>

impl PureDistanceFunction<BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct

fn evaluate( x: BitSlice<'_, N, Unsigned, Dense>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>

impl PureDistanceFunction<BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct

fn evaluate( x: BitSlice<'_, N, Unsigned, Dense>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>

impl PureDistanceFunction<BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct

fn evaluate( x: BitSlice<'_, N, Unsigned, Dense>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>

impl PureDistanceFunction<BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct

fn evaluate( x: BitSlice<'_, N, Unsigned, Dense>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>

impl PureDistanceFunction<BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct

fn evaluate( x: BitSlice<'_, N, Unsigned, Dense>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>

impl PureDistanceFunction<BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct

fn evaluate( x: BitSlice<'_, N, Unsigned, Dense>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>

impl PureDistanceFunction<BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, Result<MathematicalValue<u32>, UnequalLengths>> for InnerProduct

fn evaluate( x: BitSlice<'_, N, Unsigned, Dense>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>

impl<A> Target2<A, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>> for InnerProductwhere A: Architecture,

fn run( self, _: A, x: BitSlice<'_, N, Unsigned, Dense>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>

impl<A> Target2<A, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>, BitTranspose>, BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>> for InnerProductwhere A: Architecture,

fn run( self, _: A, x: BitSlice<'_, N, Unsigned, BitTranspose>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>

impl<A> Target2<A, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>> for InnerProductwhere A: Architecture, InnerProduct: for<'a> Target2<A, MathematicalValue<f32>, &'a [u8], &'a [u8]>,

§Implementation Notes

fn run( self, arch: A, x: BitSlice<'_, N, Unsigned, Dense>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>

impl<const N: usize> Target2<Scalar, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<N, Unsigned, SlicePtr<'_, u8>>> for InnerProductwhere Unsigned: Representation<N>,

fn run( self, _: Scalar, x: &[f32], y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<f32>

impl Target2<Scalar, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<2, Unsigned, SlicePtr<'_, u8>>> for InnerProduct

§Performance

fn run( self, _: Scalar, x: BitSlice<'_, N, Unsigned, Dense>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>

impl Target2<Scalar, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<3, Unsigned, SlicePtr<'_, u8>>> for InnerProduct

§Performance

fn run( self, _: Scalar, x: BitSlice<'_, N, Unsigned, Dense>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>

impl Target2<Scalar, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>>> for InnerProduct

§Performance

fn run( self, _: Scalar, x: BitSlice<'_, N, Unsigned, Dense>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>

impl Target2<Scalar, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<5, Unsigned, SlicePtr<'_, u8>>> for InnerProduct

§Performance

fn run( self, _: Scalar, x: BitSlice<'_, N, Unsigned, Dense>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>

impl Target2<Scalar, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<6, Unsigned, SlicePtr<'_, u8>>> for InnerProduct

§Performance

fn run( self, _: Scalar, x: BitSlice<'_, N, Unsigned, Dense>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>

impl Target2<Scalar, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<7, Unsigned, SlicePtr<'_, u8>>> for InnerProduct

§Performance

fn run( self, _: Scalar, x: BitSlice<'_, N, Unsigned, Dense>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>

impl Target2<Scalar, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>> for InnerProduct

§Performance

fn run( self, _: Scalar, x: BitSlice<'_, N, Unsigned, Dense>, y: BitSlice<'_, N, Unsigned, Dense>, ) -> MathematicalResult<u32>

Struct InnerProduct

impl<A, B, To> DistanceFunction<A, B, To> for InnerProduct
where InnerProduct: PureDistanceFunction<A, B, To>,

impl<A> Target2<A, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
where A: Architecture,

impl<A> Target2<A, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<4, Unsigned, SlicePtr<'_, u8>, BitTranspose>, BitSliceBase<1, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
where A: Architecture,

impl<A> Target2<A, Result<MathematicalValue<u32>, UnequalLengths>, BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>, BitSliceBase<8, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
where A: Architecture, InnerProduct: for<'a> Target2<A, MathematicalValue<f32>, &'a [u8], &'a [u8]>,

impl<const N: usize> Target2<Scalar, Result<MathematicalValue<f32>, UnequalLengths>, &[f32], BitSliceBase<N, Unsigned, SlicePtr<'_, u8>>> for InnerProduct
where Unsigned: Representation<N>,