Module distances

Expand description

§Low-level functions

The methods here are meant to be primitives used by the distance functions for the various scalar-quantized-like quantizers.

As such, they typically return integer distance results since they largely operate over raw bit-slices.

There are two interfaces for interacting with the distance primitives:

diskann_wide::arch::Target2: A micro-architecture aware interface where the target micro-architecture is provided as an explicit argument.

This can be used in conjunction with diskann_wide::Architecture::run2 to apply the necessary target-features to opt-into newer architecture code generation when compiling the whole binary for an older architecture.

This interface is also composable with micro-architecture dispatching done higher in the callstack, and so should be preferred when incorporating into quantizer distance computations.
diskann_vector::PureDistanceFunction: If micro-architecture awareness is not needed, this provides a simple interface targeting diskann_wide::ARCH (the current compilation architecture).

This interface will always yield a binary compatible with the compilation architecture target, but will not enable faster code-paths when compiling for older architectures.

The following table summarizes the implementation status of kernels. All kernels have diskann_wide::arch::Scalar implementation fallbacks.

Implementation Kind:

“Fallback”: A fallback implementation using scalar indexing.
“Optimized”: A better implementation than “fallback” that does not contain target-depeendent code, instead relying on compiler optimizations.

Micro-architecture dispatch is still relevant as it allows the compiler to generate better code for newer machines.
“Yes”: Architecture specific SIMD implementation exists.
“No”: Architecture specific implementation does not exist - the next most-specific implementation is used. For example, if a x86-64-v3 implementation does not exist, then the “scalar” implementation will be used instead.

Type Aliases

§Inner Product

LHS	RHS	Result	Scalar	x86-64-v3	x86-64-v4	Neon
`USlice<1>`	`USlice<1>`	`MV<u32>`	Optimized	Optimized	Uses V3	Optimized
`USlice<2>`	`USlice<2>`	`MV<u32>`	Fallback	Yes	Yes	Fallback
`USlice<3>`	`USlice<3>`	`MV<u32>`	Fallback	No	Uses V3	Fallback
`USlice<4>`	`USlice<4>`	`MV<u32>`	Fallback	Yes	Uses V3	Fallback
`USlice<5>`	`USlice<5>`	`MV<u32>`	Fallback	No	Uses V3	Fallback
`USlice<6>`	`USlice<6>`	`MV<u32>`	Fallback	No	Uses V3	Fallback
`USlice<7>`	`USlice<7>`	`MV<u32>`	Fallback	No	Uses V3	Fallback
`USlice<8>`	`USlice<8>`	`MV<u32>`	Yes	Yes	Yes	Fallback

`TSlice<4>`	`USlice<1>`	`MV<u32>`	Optimized	Optimized	Optimized	Optimized

`&[f32]`	`USlice<1>`	`MV<f32>`	Fallback	Yes	Uses V3	Fallback
`&[f32]`	`USlice<2>`	`MV<f32>`	Fallback	Yes	Uses V3	Fallback
`&[f32]`	`USlice<3>`	`MV<f32>`	Fallback	No	Uses V3	Fallback
`&[f32]`	`USlice<4>`	`MV<f32>`	Fallback	Yes	Uses V3	Fallback
`&[f32]`	`USlice<5>`	`MV<f32>`	Fallback	No	Uses V3	Fallback
`&[f32]`	`USlice<6>`	`MV<f32>`	Fallback	No	Uses V3	Fallback
`&[f32]`	`USlice<7>`	`MV<f32>`	Fallback	No	Uses V3	Fallback
`&[f32]`	`USlice<8>`	`MV<f32>`	Fallback	No	Uses V3	Fallback

§Squared L2

LHS	RHS	Result	Scalar	x86-64-v3	x86-64-v4	Neon
`USlice<1>`	`USlice<1>`	`MV<u32>`	Optimized	Optimized	Uses V3	Optimized
`USlice<2>`	`USlice<2>`	`MV<u32>`	Fallback	Yes	Uses V3	Fallback
`USlice<3>`	`USlice<3>`	`MV<u32>`	Fallback	No	Uses V3	Fallback
`USlice<4>`	`USlice<4>`	`MV<u32>`	Fallback	Yes	Uses V3	Fallback
`USlice<5>`	`USlice<5>`	`MV<u32>`	Fallback	No	Uses V3	Fallback
`USlice<6>`	`USlice<6>`	`MV<u32>`	Fallback	No	Uses V3	Fallback
`USlice<7>`	`USlice<7>`	`MV<u32>`	Fallback	No	Uses V3	Fallback
`USlice<8>`	`USlice<8>`	`MV<u32>`	Yes	Yes	Yes	Fallback

LHS	RHS	Result	Scalar	x86-64-v3	x86-64-v4	Neon
`BSlice`	`BSlice`	`MV<u32>`	Optimized	Optimized	Uses V3	Optimized