Expand description
FNV-1a feature hashing embedder.
This module provides a deterministic, fast embedder that uses FNV-1a hashing to project text into a fixed-dimension vector space. While not “truly” semantic (it captures lexical overlap rather than meaning), it provides:
- Instant embedding: No model loading, no initialization delay
- Deterministic output: Same input always produces same output
- Zero network dependency: Works offline, no downloads required
§Algorithm
- Tokenize: Lowercase, split on non-alphanumeric, filter tokens with len < 2
- Hash: Apply FNV-1a to each token
- Project: Use hash to determine dimension index and sign (+1 or -1)
- Normalize: L2 normalize the resulting vector to unit length
§When to Use
- When ML model is not installed
- When user explicitly opts for hash mode (
CASS_SEMANTIC_EMBEDDER=hash) - As a fallback when ML inference fails
§Example
ⓘ
use crate::search::embedder::Embedder;
use crate::search::hash_embedder::HashEmbedder;
let embedder = HashEmbedder::new(384);
let embedding = embedder.embed_sync("hello world").unwrap();
assert_eq!(embedding.len(), 384);Structs§
- Hash
Embedder - FNV-1a feature hashing embedder.
Constants§
- DEFAULT_
DIMENSION - Default embedding dimension (matches MiniLM for compatibility).