innr
SIMD-accelerated vector similarity primitives: dot, cosine, and
Euclidean distance over f32 / i8 / u8, plus binary, ternary,
and scalar quantization. Targets x86 AVX2/AVX-512 and aarch64 NEON,
with scalar fallback.
Quickstart
[]
= "0.2"
use ;
let a = ;
let b = ;
let d = dot; // 0.707
let c = cosine; // 0.707
let n = norm; // 1.0
Batch search
use ;
// 4 vectors of dimension 3
let corpus = vec!;
let batch = from_rows;
let query = ;
let result = batch_knn_dot;
// result.indices: top-2 nearest by dot product
// result.scores: corresponding similarity scores
Operations
Core: dot, cosine, norm, l2_distance, l2_distance_squared, l1_distance, angular_distance, normalize. Portable fallbacks in innr::dense (e.g. dot_portable).
Matryoshka: matryoshka_dot, matryoshka_cosine -- similarity on a prefix of the embedding.
Binary quantization (1-bit): encode_binary to packed bits, binary_dot, binary_hamming, binary_jaccard. 32x memory reduction over f32.
Ternary quantization (1.58-bit): ternary::encode_ternary to {-1, 0, +1}, ternary_dot, ternary_hamming, asymmetric_dot (float query x ternary doc). 16-20x compression.
Scalar quantization (uint8): scalar::QuantizationParams (from fit(), fit_quantile(), or from_range()), quantize_u8, asymmetric_dot_u8. Precomputed query path via query_context() + asymmetric_dot_u8_precomputed. Batch search via batch_knn_u8. 4x compression.
Fast approximate math: fast_cosine_dispatch (SIMD-dispatched), fast_cosine (portable Quake III), fast_rsqrt, fast_rsqrt_precise.
Batch operations: batch::VerticalBatch (PDX-style columnar layout) with batch_dot, batch_l2_squared, batch_l2_squared_pruning, batch_cosine, batch_norms, batch_knn, batch_knn_cosine, batch_knn_dot, batch_knn_filtered (predicate pushdown), batch_knn_reordered (variance-ordered pruning), batch_knn_adaptive (approximate early-exit), batch_dimension_variance.
Sparse vectors: sparse_dot.
Late interaction: maxsim, maxsim_cosine (ColBERT-style), sparse_maxsim (sparse late interaction).
SIMD Dispatch
| Architecture | Instructions | Detection |
|---|---|---|
| x86_64 | AVX-512F | Runtime |
| x86_64 | AVX2 + FMA | Runtime |
| aarch64 | NEON | Always |
| Other | Portable | LLVM auto-vec |
Vectors < 16 dimensions use portable code. MSRV 1.75 applies to aarch64 and portable targets; x86_64 requires Rust 1.89+ (AVX-512 intrinsic stabilization).
Performance

Apple Silicon (NEON). Run cargo bench to reproduce on your hardware.
For maximum performance, build with native CPU features:
RUSTFLAGS="-C target-cpu=native"
Run benchmarks:
Examples
01_basic_ops.rs -- Core similarity metrics and their mathematical relationships. Proves L2^2(a,b) = 2(1 - cosine(a,b)) for normalized vectors.
batch_demo.rs -- PDX-style columnar layout for batch retrieval. Transposes 10K vectors (128d), runs 100 queries, verifies k-NN against brute-force.
binary_demo.rs -- Binary quantization for first-stage retrieval. 32x memory reduction, measures recall@10 against full-precision search.
See examples/ for more: fast_math_demo, matryoshka_search, maxsim_colbert, ternary_demo.
Tests
License
Dual-licensed under MIT or Apache-2.0.