1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
//! Embedding function abstractions for converting text to vector representations.
//!
//! This module provides the [`EmbeddingFunction`] trait that defines how to transform
//! text strings into embeddings. Implementations are available for various
//! embedding models, including dense embeddings (Ollama) and sparse embeddings (BM25).
use ;
/// BM25 sparse embedding implementation.
/// Text tokenization utilities for BM25.
/// MurmurHash3 absolute value hasher for token hashing.
/// Transforms text strings into embeddings.
///
/// Embedding functions are the bridge between human-readable text and the vector space
/// where similarity search operates. This trait supports both dense embeddings (e.g., from
/// neural models) and sparse embeddings (e.g., BM25 token weights). Implementations must
/// be thread-safe and support batch processing for efficiency.
///
/// # Examples
///
/// ```ignore
/// use chroma::embed::EmbeddingFunction;
///
/// async fn process_documents<E: EmbeddingFunction>(embedder: E, docs: Vec<&str>) {
/// let embeddings = embedder.embed_strs(&docs).await.unwrap();
/// assert_eq!(embeddings.len(), docs.len());
/// }
/// ```
/// Generic tokenizer interface for text processing.
/// Hashes tokens to u32 identifiers for sparse representations.