Skip to main content

Crate numkong

Crate numkong 

Source
Expand description

§NumKong - Hardware-Accelerated Numerics

Provides SIMD-accelerated distance metrics, elementwise operations, and tensor algebra targeting ARM NEON/SVE/SME and x86 AVX2/AVX-512 backends.

§Modules

  • types: Mixed-precision scalar types (f16, bf16, FP8, packed integers) and FloatLike trait
  • spatial: Dot products, angular (cosine), and Euclidean distances
  • each: Elementwise operations and trigonometry
  • reduce: Statistical reductions (moments, min/max)
  • set: Binary set similarity (Hamming, Jaccard)
  • probability: Probability divergences (KL, JS)
  • curved: Curved metric spaces (Bilinear, Mahalanobis)
  • mesh: Mesh alignment (Kabsch, Umeyama, RMSD)
  • geospatial: Geospatial distances (Haversine, Vincenty)
  • sparse: Sparse set operations
  • [cast]: Type casting between scalar formats
  • capabilities: Runtime SIMD feature detection
  • matrix: Batch matrix operations (GEMM, packed spatial distances)
  • tensor: N-dimensional tensors with elementwise/reduction operations

§Implemented operations include:

  • Euclidean (L2), inner product, and angular (cosine) spatial distances.
  • Hamming and Jaccard binary distances.
  • Kullback-Leibler divergence and Jensen-Shannon distance.
  • Elementwise scale, sum, blend, and FMA operations.
  • Trigonometric functions (sin, cos, atan).
  • Type casting between all scalar formats.
  • Matrix multiplication with pre-packing (GEMM).

§Example

use numkong::{Dot, Angular, Euclidean};

let a = &[1.0_f32, 2.0, 3.0];
let b = &[4.0_f32, 5.0, 6.0];

let dot_product = f32::dot(a, b);
let angular_dist = f32::angular(a, b);
let l2sq_dist = f32::sqeuclidean(a, b);

// Enable AMX and other platform-specific SIMD features
numkong::capabilities::configure_thread();

§Mixed Precision Support

use numkong::{Angular, f16, bf16};

// Work with half-precision floats
let half_a: Vec<f16> = vec![1.0, 2.0, 3.0].iter().map(|&x| f16::from_f32(x)).collect();
let half_b: Vec<f16> = vec![4.0, 5.0, 6.0].iter().map(|&x| f16::from_f32(x)).collect();
let half_angular_dist = f16::angular(&half_a, &half_b);

// Work with brain floats
let brain_a: Vec<bf16> = vec![1.0, 2.0, 3.0].iter().map(|&x| bf16::from_f32(x)).collect();
let brain_b: Vec<bf16> = vec![4.0, 5.0, 6.0].iter().map(|&x| bf16::from_f32(x)).collect();
let brain_angular_dist = bf16::angular(&brain_a, &brain_b);

// Direct bit manipulation
let half = f16::from_f32(3.14);
let bits = half.0; // Access raw u16 representation
let reconstructed = f16(bits);

§Traits

The SpatialSimilarity trait (combining Dot, Angular, Euclidean) covers:

  • dot(a, b): Computes dot product between two slices.
  • angular(a, b) / cosine(a, b): Computes angular distance (1 − cosine similarity).
  • sqeuclidean(a, b): Computes squared Euclidean distance.
  • euclidean(a, b): Computes Euclidean distance.

The BinarySimilarity trait (combining Hamming, Jaccard) covers:

  • hamming(a, b): Computes Hamming distance between two slices.
  • jaccard(a, b): Computes Jaccard distance between two slices.

The ProbabilitySimilarity trait (combining KullbackLeibler, JensenShannon) covers:

  • jensenshannon(a, b): Computes Jensen-Shannon distance.
  • kullbackleibler(a, b): Computes Kullback-Leibler divergence.

The elementwise traits (including EachScale, EachSum, EachBlend, EachFMA) covers:

  • scale(a, alpha, beta, result): Element-wise result[i] = α × a[i] + β.
  • sum(a, b, result): Element-wise result[i] = a[i] + b[i].
  • blend(a, b, alpha, beta, result): Blend result[i] = α × a[i] + β × b[i].
  • fma(a, b, c, alpha, beta, result): Fused multiply-add result[i] = α × a[i] × b[i] + β × c[i].

The Trigonometry trait (combining EachSin, EachCos, EachATan) covers:

  • sin(input, result): Element-wise sine.
  • cos(input, result): Element-wise cosine.
  • atan(input, result): Element-wise arctangent.

Additional traits: VDot, Roots, SparseIntersect, SparseDot.

Re-exports§

pub use types::bf16;
pub use types::bf16c;
pub use types::e2m3;
pub use types::e3m2;
pub use types::e4m3;
pub use types::e5m2;
pub use types::f16;
pub use types::f16c;
pub use types::f32c;
pub use types::f64c;
pub use types::i4x2;
pub use types::is_close;
pub use types::u1x8;
pub use types::u4x2;
pub use types::DimMut;
pub use types::DimRef;
pub use types::FloatConvertible;
pub use types::FloatLike;
pub use types::NumberLike;
pub use types::StorageElement;
pub use spatial::Angular;
pub use spatial::Dot;
pub use spatial::Euclidean;
pub use spatial::Roots;
pub use spatial::SpatialSimilarity;
pub use spatial::VDot;
pub use set::BinarySimilarity;
pub use set::Hamming;
pub use set::Jaccard;
pub use probability::JensenShannon;
pub use probability::KullbackLeibler;
pub use probability::ProbabilitySimilarity;
pub use each::EachATan;
pub use each::EachBlend;
pub use each::EachCos;
pub use each::EachFMA;
pub use each::EachScale;
pub use each::EachSin;
pub use each::EachSum;
pub use each::Trigonometry;
pub use reduce::ReduceMinMax;
pub use reduce::ReduceMoments;
pub use reduce::Reductions;
pub use curved::Bilinear;
pub use curved::Mahalanobis;
pub use mesh::MeshAlignment;
pub use mesh::MeshAlignmentResult;
pub use geospatial::Geospatial;
pub use geospatial::Haversine;
pub use geospatial::Vincenty;
pub use sparse::SparseDot;
pub use sparse::SparseIntersect;
pub use cast::cast;
pub use cast::CastDtype;
pub use capabilities::cap;
pub use capabilities::available;
pub use capabilities::configure_thread;
pub use capabilities::uses_dynamic_dispatch;
pub use tensor::AllCloseOps;
pub use tensor::Allocator;
pub use tensor::AxisIterator;
pub use tensor::AxisIteratorMut;
pub use tensor::BlendOps;
pub use tensor::CastOps;
pub use tensor::FmaOps;
pub use tensor::Global;
pub use tensor::Matrix;
pub use tensor::MatrixSpan;
pub use tensor::MatrixView;
pub use tensor::MinMaxOps;
pub use tensor::MinMaxResult;
pub use tensor::MomentsOps;
pub use tensor::RangeStep;
pub use tensor::ScaleOps;
pub use tensor::SliceArg;
pub use tensor::SliceRange;
pub use tensor::SliceSpec;
pub use tensor::SumOps;
pub use tensor::Tensor;
pub use tensor::TensorDims;
pub use tensor::TensorError;
pub use tensor::TensorIterator;
pub use tensor::TensorMut;
pub use tensor::TensorRef;
pub use tensor::TensorSpan;
pub use tensor::TensorSpanDims;
pub use tensor::TensorSpanIterator;
pub use tensor::TensorView;
pub use tensor::TensorViewDims;
pub use tensor::TensorViewIterator;
pub use tensor::TrigAtanOps;
pub use tensor::TrigCosOps;
pub use tensor::TrigSinOps;
pub use tensor::DEFAULT_MAX_RANK;
pub use tensor::SIMD_ALIGNMENT;
pub use matrix::Angulars;
pub use matrix::Dots;
pub use matrix::Euclideans;
pub use matrix::Hammings;
pub use matrix::Jaccards;
pub use matrix::PackedMatrix;
pub use matrix::SymmetricAngulars;
pub use matrix::SymmetricDots;
pub use matrix::SymmetricEuclideans;
pub use matrix::SymmetricHammings;
pub use matrix::SymmetricJaccards;
pub use vector::Vector;
pub use vector::VectorIndex;
pub use vector::VectorIterator;
pub use vector::VectorSpan;
pub use vector::VectorSpanIterator;
pub use vector::VectorView;
pub use vector::VectorViewIterator;
pub use maxsim::MaxSim;
pub use maxsim::MaxSimPackedMatrix;

Modules§

capabilities
Runtime CPU capability detection.
cast
Type casting between scalar formats.
curved
Curved metric spaces: Bilinear forms and Mahalanobis distance.
each
Elementwise operations and trigonometry.
geospatial
Geospatial distance functions: Haversine and Vincenty.
matrix
Batch matrix operations: GEMM, packed spatial distances.
maxsim
MaxSim (ColBERT late-interaction) scoring with pre-packed matrices.
mesh
Mesh superposition and alignment: Kabsch, Umeyama, RMSD.
probability
Probability measures: Kullback-Leibler divergence and Jensen-Shannon distance.
reduce
Statistical reductions: moments (sum/sum-of-squares) and min/max.
set
Binary set similarity: Hamming and Jaccard distances.
sparse
Sparse set intersection and weighted dot products.
spatial
Spatial similarity: dot products, angular (cosine), and Euclidean distances.
tensor
Core N-dimensional tensor types with elementwise, trigonometric, reduction, and cast operations.
types
Scalar types and conversion trait for mixed-precision computing.
vector
Owning and non-owning vector types with signed indexing and sub-byte support.