1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
//! SIMD-accelerated vector similarity primitives.
//!
//! Fast building blocks for embedding similarity with automatic hardware dispatch.
//!
//! # Which Function Should I Use?
//!
//! | Task | Function | Notes |
//! |------|----------|-------|
//! | **Similarity (normalized)** | [`cosine`] | Most embeddings are normalized |
//! | **Similarity (raw)** | [`dot`] | When you know norms |
//! | **Distance (L2)** | [`l2_distance`] | For k-NN, clustering |
//! | **Token-level matching** | [`maxsim`] | ColBERT-style late interaction |
//! | **Sparse vectors** | [`sparse_dot`] | BM25 scores, SPLADE |
//! | **INT8 embeddings** | [`dot_u8`] | Quantized vector search |
//! | **Binary embeddings** | [`hamming_distance`] | Byte-packed bit vectors |
//!
//! # SIMD Dispatch
//!
//! All functions automatically dispatch to the fastest available instruction set:
//!
//! | Architecture | Instructions | Detection |
//! |--------------|--------------|-----------|
//! | x86_64 | AVX-512F | Runtime |
//! | x86_64 | AVX2 + FMA | Runtime |
//! | aarch64 | NEON | Always available |
//! | Other | Portable | LLVM auto-vectorizes |
//!
//! Vectors shorter than 16 dimensions use portable code (SIMD overhead not worthwhile).
//!
//! # Historical Context
//!
//! The inner product (dot product) dates to Grassmann's 1844 "Ausdehnungslehre" and
//! Hamilton's quaternions, formalized in Gibbs and Heaviside's vector calculus (~1880s).
//! Modern embedding similarity (Word2Vec 2013, BERT 2018) relies on inner products
//! in high-dimensional spaces where SIMD acceleration is essential.
//!
//! ColBERT's MaxSim (Khattab & Zaharia, 2020) extends this to token-level late
//! interaction, requiring O(|Q| x |D|) inner products per query-document pair.
//!
//! # Example
//!
//! ```rust
//! use innr::{dot, cosine, norm};
//!
//! let a = [1.0_f32, 0.0, 0.0];
//! let b = [0.707, 0.707, 0.0];
//!
//! // Dot product
//! let d = dot(&a, &b);
//! assert!((d - 0.707).abs() < 0.01);
//!
//! // Cosine similarity (normalized dot product)
//! let c = cosine(&a, &b);
//! assert!((c - 0.707).abs() < 0.01);
//!
//! // L2 norm
//! let n = norm(&a);
//! assert!((n - 1.0).abs() < 1e-6);
//! ```
//!
//! # References
//!
//! - Gibbs, J.W. (1881). "Elements of Vector Analysis"
//! - Mikolov et al. (2013). "Efficient Estimation of Word Representations" (Word2Vec)
//! - Khattab & Zaharia (2020). "ColBERT: Efficient and Effective Passage Search"
/// Binary (1-bit) quantization: encode, Hamming distance, dot product, Jaccard.
/// Dense vector primitives: dot, cosine, norm, L2/L1 distance, matryoshka.
/// Fast math operations using hardware-aware approximations (rsqrt, NR iteration).
/// Batch vector operations with columnar (PDX-style) layout.
/// Sparse vector dot product via sorted-index merge join.
/// ColBERT MaxSim late interaction scoring for multi-vector retrieval.
// Re-export core operations
pub use ;
// Re-export binary operations
pub use ;
// Re-export fast math (rsqrt-based approximations)
pub use ;
/// Ternary quantization (1.58-bit) for ultra-compressed embeddings.
/// Scalar quantization (uint8) for memory-efficient asymmetric similarity.
/// Integer quantization primitives: u8 dot product and Hamming distance.
/// Fixed-capacity top-K nearest neighbor tracker for ANN inner-loop use.
pub use ;
pub use ;
pub use ;
pub use TopK;
/// Minimum vector dimension for SIMD to be worthwhile.
///
/// Below this, function call overhead outweighs SIMD benefits.
pub const MIN_DIM_SIMD: usize = 16;
/// Threshold for treating a norm as "effectively zero".
///
/// Chosen to be larger than `f32::EPSILON` (~1.19e-7) to provide numerical
/// headroom while remaining small enough to only catch degenerate cases.
///
/// Used by [`normalize`](dense::normalize) to avoid division by zero.
pub const NORM_EPSILON: f32 = 1e-9;
/// Squared norm threshold for cosine similarity.
///
/// Cosine kernels accumulate `||a||²` and `||b||²` (squared norms), so
/// they compare against `NORM_EPSILON²` rather than `NORM_EPSILON`.
pub const NORM_EPSILON_SQ: f32 = NORM_EPSILON * NORM_EPSILON;