Skip to main content

Module hash_embedder

Module hash_embedder 

Source
Expand description

FNV-1a feature hashing embedder.

This module provides a deterministic, fast embedder that uses FNV-1a hashing to project text into a fixed-dimension vector space. While not “truly” semantic (it captures lexical overlap rather than meaning), it provides:

  • Instant embedding: No model loading, no initialization delay
  • Deterministic output: Same input always produces same output
  • Zero network dependency: Works offline, no downloads required

§Algorithm

  1. Tokenize: Lowercase, split on non-alphanumeric, filter tokens with len < 2
  2. Hash: Apply FNV-1a to each token
  3. Project: Use hash to determine dimension index and sign (+1 or -1)
  4. Normalize: L2 normalize the resulting vector to unit length

§When to Use

  • When ML model is not installed
  • When user explicitly opts for hash mode (CASS_SEMANTIC_EMBEDDER=hash)
  • As a fallback when ML inference fails

§Example

use crate::search::embedder::Embedder;
use crate::search::hash_embedder::HashEmbedder;

let embedder = HashEmbedder::new(384);
let embedding = embedder.embed_sync("hello world").unwrap();
assert_eq!(embedding.len(), 384);

Structs§

HashEmbedder
FNV-1a feature hashing embedder.

Constants§

DEFAULT_DIMENSION
Default embedding dimension (matches MiniLM for compatibility).