1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
//! # ELID - Embedding Locality IDentifier
//!
//! ELID enables vector search without a vector store by encoding high-dimensional embeddings
//! into sortable string IDs that preserve locality. Similar vectors produce similar IDs,
//! allowing you to use standard database indexes for similarity search.
//!
//! ELID also includes a complete suite of fast, zero-dependency string similarity algorithms.
//!
//! ## Feature Sets
//!
//! ### Embedding Encoding (`embeddings` feature)
//!
//! Convert embeddings from any ML model into compact, sortable identifiers:
//!
//! - **Mini128**: 128-bit SimHash using signed random projections (fast, Hamming distance)
//! - **Morton10x10**: Z-order curve encoding (database range queries)
//! - **Hilbert10x10**: Hilbert curve encoding (maximum locality preservation)
//!
//! ### String Similarity (`strings` feature, default)
//!
//! - **Levenshtein Distance**: Classic edit distance algorithm
//! - **Normalized Levenshtein**: Returns similarity as a value between 0.0 and 1.0
//! - **Jaro-Winkler Similarity**: Better for short strings like names
//! - **Hamming Distance**: For equal-length strings
//! - **Optimal String Alignment (OSA)**: Levenshtein with transpositions
//! - **SimHash**: Locality-sensitive hashing for string similarity queries
//!
//! ## Feature Flags
//!
//! - `strings` (default): Zero-dependency string similarity algorithms
//! - `embeddings` (default): Vector encoding with Mini128, Morton, and Hilbert profiles
//! - `models`: Base ONNX model support using tract-onnx (WASM compatible)
//! - `models-text`: Text embedding models (Model2Vec potion-base-8M)
//! - `models-image`: Image embedding models (MobileNetV3-Small)
//! - `wasm`: WebAssembly bindings (includes embeddings)
//! - `python`: Python bindings via PyO3 (includes embeddings + numpy)
//! - `ffi`: C FFI bindings
//!
//! ## Embedding Encoding Example
//!
//! ```rust,ignore
//! use elid::embeddings::{encode, Profile, hamming_distance};
//!
//! // Get embeddings from your ML model
//! let embedding1 = model.embed("Hello, world!")?;
//! let embedding2 = model.embed("Hello, universe!")?;
//!
//! // Encode to sortable ELIDs
//! let profile = Profile::default(); // Mini128
//! let elid1 = encode(&embedding1, &profile)?;
//! let elid2 = encode(&embedding2, &profile)?;
//!
//! // Compare via Hamming distance (lower = more similar)
//! let distance = hamming_distance(&elid1, &elid2)?;
//! ```
//!
//! ## String Similarity Example
//!
//! ```rust
//! use elid::{levenshtein, normalized_levenshtein, jaro_winkler, simhash, simhash_similarity};
//!
//! let distance = levenshtein("kitten", "sitting");
//! assert_eq!(distance, 3);
//!
//! let similarity = normalized_levenshtein("kitten", "sitting");
//! assert!(similarity > 0.5 && similarity < 0.7);
//!
//! let jw_similarity = jaro_winkler("martha", "marhta");
//! assert!(jw_similarity > 0.9);
//!
//! // SimHash for numeric database queries
//! let hash1 = simhash("iPhone 14");
//! let hash2 = simhash("iPhone 15");
//! let sim = simhash_similarity("iPhone 14", "iPhone 15");
//! assert!(sim > 0.8);
//! ```
// Re-export everything from strings for backwards compatibility
pub use ;
/// Compute the best matching similarity between two strings using multiple algorithms
/// and return the highest score.
///
/// This function runs multiple algorithms and returns the best result, useful when
/// you're not sure which algorithm will work best for your data.
///
/// # Example
///
/// ```rust
/// use elid::best_match;
///
/// let score = best_match("hello", "hallo");
/// assert!(score > 0.7);
/// ```
/// Find the best match for a query string in a list of candidates.
///
/// Returns the index and similarity score of the best match.
///
/// # Example
///
/// ```rust
/// use elid::find_best_match;
///
/// let candidates = vec!["apple", "application", "apply"];
/// let (idx, score) = find_best_match("app", &candidates);
/// assert!(score > 0.5);
/// ```
/// Find all matches above a threshold score.
///
/// Returns a vector of (index, score) tuples for all candidates above the threshold.
///
/// # Example
///
/// ```rust
/// use elid::find_matches_above_threshold;
///
/// let candidates = vec!["apple", "application", "apply", "banana"];
/// let matches = find_matches_above_threshold("app", &candidates, 0.5);
/// assert!(matches.len() >= 2); // Should match at least "apple" and "apply"
/// ```
// Python module is defined in python.rs and exported via #[pymodule]