ELID - Embedding Locality IDentifier
ELID enables vector search without a vector store by encoding high-dimensional embeddings into sortable string IDs that preserve locality. Similar vectors produce similar IDs, allowing you to use standard database indexes for similarity search.
ELID also includes a complete suite of fast string similarity algorithms.
Features
Embedding Encoding (Vector Search Without Vector Stores)
Convert embeddings from any ML model into compact, sortable identifiers:
| Profile | Output | Best For |
|---|---|---|
| Mini128 | 26-char base32hex | Fast similarity via Hamming distance |
| Morton10x10 | 20-char base32hex | Database range queries (Z-order) |
| Hilbert10x10 | 20-char base32hex | Maximum locality preservation |
Key benefits:
- Similar vectors produce similar IDs (locality preservation)
- IDs are lexicographically sortable for database indexing
- No vector store required - use any database with string indexes
- Deterministic: same embedding always produces the same ID
String Similarity Algorithms
| Algorithm | Type | Best For |
|---|---|---|
| Levenshtein | Edit distance | General-purpose comparison, spell checking |
| Normalized Levenshtein | Similarity (0-1) | When you need a percentage match |
| Jaro | Similarity (0-1) | Short strings |
| Jaro-Winkler | Similarity (0-1) | Names and record linkage |
| Hamming | Distance | Fixed-length strings, DNA, error codes |
| OSA | Edit distance | Typo detection (counts transpositions) |
| SimHash | LSH fingerprint | Database-queryable similarity, near-duplicate detection |
| Best Match | Composite (0-1) | When unsure which algorithm fits |
Installation
Rust
# String similarity only (zero dependencies)
[]
= "0.1"
# Embedding encoding
[]
= { = "0.1", = ["embeddings"] }
# Both features
[]
= { = "0.1", = ["strings", "embeddings"] }
Python
JavaScript (WASM)
C/C++
Build with cargo build --release --features ffi to get libelid.so and elid.h.
Quick Start
Embedding Encoding (Rust)
use ;
// Get an embedding from your ML model (e.g., OpenAI, Cohere, sentence-transformers)
let embedding: = model.embed?;
// Encode to a sortable ELID
let profile = default; // Mini128
let elid: Elid = encode?;
println!; // e.g., "01a3f5g7h9jklmnopqrstuv"
// Similar texts produce similar ELIDs
let elid2 = encode?;
// Compare similarity via Hamming distance
use hamming_distance;
let distance = hamming_distance?; // Lower = more similar
Encoding Profiles
use Profile;
// Mini128: 128-bit SimHash (default)
// Best for: Fast similarity search via Hamming distance
let mini = Mini128 ;
// Morton10x10: Z-order curve encoding
// Best for: Database range queries
let morton = Morton10x10 ;
// Hilbert10x10: Hilbert curve encoding
// Best for: Maximum locality preservation
let hilbert = Hilbert10x10 ;
String Similarity (Rust)
use *;
// Edit distance
let distance = levenshtein; // 3
// Normalized similarity (0.0 to 1.0)
let similarity = normalized_levenshtein; // 0.8
// Name matching
let similarity = jaro_winkler; // 0.961
// SimHash for database queries
let hash = simhash;
let sim = simhash_similarity; // ~0.92
// Find best match in a list
let candidates = vec!;
let = find_best_match;
Python
# String similarity
# 3
# 0.961
# 0.922
# Embedding encoding (with embeddings feature)
=
=
JavaScript
import init from 'elid';
await ;
; // 3
; // 0.961
; // 0.922
Configuration
Use SimilarityOpts for case-insensitive or whitespace-trimmed comparisons:
use ;
let opts = SimilarityOpts ;
let distance = levenshtein_with_opts; // 0
Feature Flags
| Feature | Description | Dependencies |
|---|---|---|
strings |
String similarity algorithms (default) | None |
embeddings |
Embedding encoding (default) | rand, blake3, etc. |
models |
Base ONNX model support | tract-onnx |
models-text |
Text embedding (Model2Vec, 256-dim) | models |
models-image |
Image embedding (MobileNetV3, 1024-dim) | models, image |
wasm |
WebAssembly bindings (includes embeddings) | wasm-bindgen, js-sys, getrandom |
python |
Python bindings via PyO3 (includes embeddings) | pyo3, numpy, rayon |
ffi |
C FFI bindings | None (enables unsafe) |
Performance
- Zero external dependencies for string-only use
- O(min(m,n)) space-optimized Levenshtein
- 1.4M+ string comparisons per second (Python benchmarks)
- ~96KB WASM binary (strings only)
- Embedding encoding: <1ms per vector
Built-in Embedding Models
ELID includes optional ONNX models for generating embeddings directly, without external API calls. Models are bundled via separate packages:
| Package | Model | Dimensions | Size |
|---|---|---|---|
elid-text |
Model2Vec potion-base-8M | 256 | ~8MB |
elid-image |
MobileNetV3-Small | 1024 | ~5MB |
Text embeddings:
use embed_text;
let embedding = embed_text?;
assert_eq!;
Image embeddings:
use embed_image;
let bytes = read?;
let embedding = embed_image?;
assert_eq!;
LSH Bands for Database Querying
Convert embeddings to LSH bands for efficient database similarity search:
import from 'elid';
// Split embedding into 4 bands (32 bits each)
const bands = ;
// Store bands in database columns
// Query with OR across bands for approximate nearest neighbors:
// SELECT * FROM embeddings WHERE band0 = ? OR band1 = ? OR band2 = ? OR band3 = ?
use embedding_to_bands;
let bands = embedding_to_bands;
// bands: Vec<String> with 4 base32hex-encoded band strings
Use Cases
Vector Search Without Vector Stores
Store ELIDs directly in PostgreSQL, SQLite, or any database:
-- Create index on ELID column
(elid);
-- Find similar documents using string prefix matching
SELECT * FROM documents
WHERE elid LIKE 'abc%' -- Prefix match for locality
ORDER BY elid;
Deduplication
Use SimHash to find near-duplicate content:
let hash1 = simhash;
let hash2 = simhash;
let similarity = simhash_similarity_from_hashes;
if similarity > 0.9
Fuzzy Search
Find matches with typo tolerance:
let candidates = vec!;
let matches = find_matches_above_threshold;
// Returns: [("apple", 0.8), ...]
Building
License
Dual-licensed under MIT or Apache-2.0 at your option.