# innr
[](https://crates.io/crates/innr)
[](https://docs.rs/innr)
[](https://github.com/arclabs561/innr/actions/workflows/ci.yml)
SIMD-accelerated vector similarity primitives. Pure Rust, zero runtime dependencies.
Unlike simsimd (C bindings) or ndarray (full linear algebra), innr is pure Rust, zero-dep, and focused on the similarity primitives that retrieval and embedding systems need.
Dual-licensed under MIT or Apache-2.0.
## Quickstart
```toml
[dependencies]
innr = "0.1.6"
```
```rust
use innr::{dot, cosine, norm};
let a = [1.0_f32, 0.0, 0.0];
let b = [0.707, 0.707, 0.0];
let d = dot(&a, &b); // 0.707
let c = cosine(&a, &b); // 0.707
let n = norm(&a); // 1.0
```
## Operations
### Core (always available)
| `dot`, `dot_portable` | Inner product (SIMD / portable) |
| `cosine` | Cosine similarity |
| `norm` | L2 norm |
| `l2_distance` | Euclidean distance |
| `l2_distance_squared` | Squared Euclidean distance (avoids sqrt) |
| `l1_distance` | Manhattan distance |
| `angular_distance` | Angular distance (arccos-based) |
| `pool_mean` | Mean pooling over a set of vectors |
| `bilinear` | Scaled dot product (`phi^T * psi / sqrt(d)`) |
| `geometric_outer_product` | Tensor (outer) product of two vectors |
| `metric_residual` | MRN distance (symmetric + asymmetric components) |
### Matryoshka embeddings
| `matryoshka_dot` | Dot product on a prefix of the embedding |
| `matryoshka_cosine` | Cosine similarity on a prefix of the embedding |
### Binary quantization (1-bit)
| `encode_binary` | Quantize `f32` vector to packed bits |
| `PackedBinary` | Packed bit-vector type |
| `binary_dot` | Dot product on packed binary vectors |
| `binary_hamming` | Hamming distance |
| `binary_jaccard` | Jaccard similarity |
### Ternary quantization (1.58-bit)
| `ternary::encode_ternary` | Quantize `f32` to {-1, 0, +1} |
| `ternary::PackedTernary` | Packed ternary vector type |
| `ternary::ternary_dot` | Inner product on packed ternary vectors |
| `ternary::ternary_hamming` | Hamming distance on ternary vectors |
| `ternary::asymmetric_dot` | Float query x ternary doc product |
| `ternary::sparsity` | Fraction of zero entries |
### Fast approximate math
| `fast_cosine` | Approximate cosine via `fast_rsqrt` |
| `fast_rsqrt` | Fast inverse square root (hardware rsqrt + Newton-Raphson) |
| `fast_rsqrt_precise` | Two-iteration Newton-Raphson variant |
| `fast_cosine_distance` | `1 - fast_cosine` |
### Batch operations (PDX-style columnar layout)
| `batch::VerticalBatch` | Columnar (SoA) vector store |
| `batch::batch_dot` | Batch dot products against a query |
| `batch::batch_l2_squared` | Batch squared L2 distances |
| `batch::batch_cosine` | Batch cosine similarities |
| `batch::batch_norms` | Norms for all vectors in the batch |
| `batch::batch_knn` | Exact k-NN over a batch |
| `batch::batch_knn_adaptive` | Adaptive early-exit k-NN |
| `batch::batch_l2_squared_pruning` | Batch L2 with early termination |
| `batch::BatchKnnResult` | k-NN result (indices + distances) |
### Metric traits
| `SymmetricMetric` | Symmetric distance interface (`d(a,b) = d(b,a)`) |
| `Quasimetric` | Directed distance interface (`d(a,b) != d(b,a)`) |
### Clifford algebra
| `clifford::Rotor2D` | 2D rotor (even subalgebra of Cl(2)) |
| `clifford::wedge_2d` | 2D wedge (outer) product |
| `clifford::geometric_product_2d` | 2D geometric product (scalar + bivector) |
### Feature-gated
| `sparse_dot`, `sparse_dot_portable` | `sparse` | Sparse vector dot (sorted-index merge) |
| `sparse_maxsim` | `sparse` | Sparse MaxSim scoring |
| `maxsim`, `maxsim_cosine` | `maxsim` | ColBERT late interaction scoring |
## SIMD Dispatch
| x86_64 | AVX-512F | Runtime |
| x86_64 | AVX2 + FMA | Runtime |
| aarch64 | NEON | Always |
| Other | Portable | LLVM auto-vec |
Vectors < 16 dimensions use portable code.
## Features
- `sparse` -- sparse vector operations
- `maxsim` -- ColBERT late interaction scoring
- `full` -- all features
## Performance

*Apple Silicon (NEON). Run `cargo bench` to reproduce on your hardware.*
For maximum performance, build with native CPU features:
```bash
RUSTFLAGS="-C target-cpu=native" cargo build --release
```
Or specify a portable baseline with SIMD:
```bash
# AVX2 (89% of x86_64 CPUs)
RUSTFLAGS="-C target-cpu=x86-64-v3" cargo build --release
# SSE2 only (100% compatible)
RUSTFLAGS="-C target-cpu=x86-64" cargo build --release
```
Run benchmarks:
```bash
cargo bench
```
Generate flamegraphs (requires `cargo-flamegraph`):
```bash
./scripts/profile.sh dense
```
## Examples
[**01_basic_ops.rs**](examples/01_basic_ops.rs) -- The three core similarity metrics (dot product, cosine, L2 distance) and their mathematical relationships. Proves the identity `L2^2(a,b) = 2(1 - cosine(a,b))` for normalized vectors, showing that cosine and L2 are interchangeable for ranking.
[**batch_demo.rs**](examples/batch_demo.rs) -- PDX-style columnar layout for batch retrieval. Transposes 10K vectors (128d) into column-major order, runs 100 queries, and verifies k-NN results against brute-force. Demonstrates the cache-friendly memory access pattern that enables auto-vectorization.
[**binary_demo.rs**](examples/binary_demo.rs) -- Binary (1-bit) quantization for first-stage retrieval. Quantizes 384d vectors to packed bits (32x memory reduction: 150 MB vs 4.6 GB for 1M documents), computes Hamming distance and binary dot product, and measures recall@10 against full-precision search.
[**fast_math_demo.rs**](examples/fast_math_demo.rs) -- Newton-Raphson rsqrt approximation for fast cosine similarity. Benchmarks the hot path in ANN search (640 distance calls per query in HNSW at 1M scale), measures 3-10x speedup over standard cosine at `<1e-4` error, and shows architecture-specific SIMD dispatch.
[**matryoshka_search.rs**](examples/matryoshka_search.rs) -- Two-stage retrieval using Matryoshka embeddings. Uses a 128d prefix for coarse filtering (100 candidates from 10K corpus), then rescores with full 768d vectors to produce the final top-10. Measures recall and speedup vs single-stage search.
[**maxsim_colbert.rs**](examples/maxsim_colbert.rs) -- ColBERT-style late interaction scoring. Computes MaxSim (sum of per-query-token maximum similarities across document tokens) for 32 query tokens x 128 doc tokens at 128d. Demonstrates non-commutativity and batch scoring of 1000 documents.
[**ternary_demo.rs**](examples/ternary_demo.rs) -- Ternary (1.58-bit) quantization for extreme compression. Quantizes 768d vectors to `{-1, 0, +1}` (16-20x memory reduction: 90 MB vs 3.1 GB for 1M documents), measures recall trade-offs, and analyzes sparsity patterns.
```bash
cargo run --example 01_basic_ops
cargo run --example batch_demo
cargo run --example binary_demo
cargo run --example fast_math_demo
cargo run --example matryoshka_search
cargo run --example maxsim_colbert --features maxsim
cargo run --example ternary_demo
```
## Tests
```bash
cargo test -p innr
```
## License
Dual-licensed under MIT or Apache-2.0.