Skip to main content

Crate elid

Crate elid 

Source
Expand description

§ELID - Embedding Locality IDentifier

ELID enables vector search without a vector store by encoding high-dimensional embeddings into sortable string IDs that preserve locality. Similar vectors produce similar IDs, allowing you to use standard database indexes for similarity search.

ELID also includes a complete suite of fast, zero-dependency string similarity algorithms.

§Feature Sets

§Embedding Encoding (embeddings feature)

Convert embeddings from any ML model into compact, sortable identifiers:

  • Mini128: 128-bit SimHash using signed random projections (fast, Hamming distance)
  • Morton10x10: Z-order curve encoding (database range queries)
  • Hilbert10x10: Hilbert curve encoding (maximum locality preservation)

§String Similarity (strings feature, default)

  • Levenshtein Distance: Classic edit distance algorithm
  • Normalized Levenshtein: Returns similarity as a value between 0.0 and 1.0
  • Jaro-Winkler Similarity: Better for short strings like names
  • Hamming Distance: For equal-length strings
  • Optimal String Alignment (OSA): Levenshtein with transpositions
  • SimHash: Locality-sensitive hashing for string similarity queries

§Feature Flags

  • strings (default): Zero-dependency string similarity algorithms
  • embeddings (default): Vector encoding with Mini128, Morton, and Hilbert profiles
  • models: Base ONNX model support using tract-onnx (WASM compatible)
  • models-text: Text embedding models (Model2Vec potion-base-8M)
  • models-image: Image embedding models (MobileNetV3-Small)
  • wasm: WebAssembly bindings (includes embeddings)
  • python: Python bindings via PyO3 (includes embeddings + numpy)
  • ffi: C FFI bindings

§Embedding Encoding Example

use elid::embeddings::{encode, Profile, hamming_distance};

// Get embeddings from your ML model
let embedding1 = model.embed("Hello, world!")?;
let embedding2 = model.embed("Hello, universe!")?;

// Encode to sortable ELIDs
let profile = Profile::default(); // Mini128
let elid1 = encode(&embedding1, &profile)?;
let elid2 = encode(&embedding2, &profile)?;

// Compare via Hamming distance (lower = more similar)
let distance = hamming_distance(&elid1, &elid2)?;

§String Similarity Example

use elid::{levenshtein, normalized_levenshtein, jaro_winkler, simhash, simhash_similarity};

let distance = levenshtein("kitten", "sitting");
assert_eq!(distance, 3);

let similarity = normalized_levenshtein("kitten", "sitting");
assert!(similarity > 0.5 && similarity < 0.7);

let jw_similarity = jaro_winkler("martha", "marhta");
assert!(jw_similarity > 0.9);

// SimHash for numeric database queries
let hash1 = simhash("iPhone 14");
let hash2 = simhash("iPhone 15");
let sim = simhash_similarity("iPhone 14", "iPhone 15");
assert!(sim > 0.8);

Modules§

embeddings
Embedding encoding module for ELID

Structs§

SimilarityOpts
Options for configuring string similarity algorithms

Functions§

best_match
Compute the best matching similarity between two strings using multiple algorithms and return the highest score.
find_best_match
Find the best match for a query string in a list of candidates.
find_matches_above_threshold
Find all matches above a threshold score.
find_similar_hashes
Find all items within a given SimHash distance threshold.
hamming
Compute the Hamming distance between two strings.
jaro
Compute the Jaro similarity between two strings.
jaro_winkler
Compute the Jaro-Winkler similarity between two strings.
jaro_winkler_with_prefix
Compute the Jaro-Winkler similarity with a custom prefix scale.
levenshtein
Compute the Levenshtein distance between two strings.
levenshtein_with_opts
Compute Levenshtein distance with configurable options.
normalized_hamming
Compute the normalized Hamming similarity between two strings.
normalized_levenshtein
Compute the normalized Levenshtein similarity between two strings.
normalized_osa
Compute the normalized OSA similarity between two strings.
osa_distance
Compute the Optimal String Alignment distance between two strings.
simhash
Compute the SimHash fingerprint of a string.
simhash_distance
Compute the Hamming distance between two SimHash values.
simhash_similarity
Compute the normalized SimHash similarity between two strings.