pub struct Embedder { /* private fields */ }Expand description
Text embedding generator using a configurable model (default: BGE-large-en-v1.5)
Automatically downloads the model from HuggingFace Hub on first use. Detects GPU availability and uses CUDA/TensorRT when available.
§Example
use cqs::Embedder;
use cqs::embedder::ModelConfig;
let embedder = Embedder::new(ModelConfig::resolve(None, None))?;
let embedding = embedder.embed_query("parse configuration file")?;
println!("Embedding dimension: {}", embedding.len()); // 768Implementations§
Source§impl Embedder
impl Embedder
Sourcepub fn new(model_config: ModelConfig) -> Result<Self, EmbedderError>
pub fn new(model_config: ModelConfig) -> Result<Self, EmbedderError>
Create a new embedder with lazy model loading.
When force_cpu is false, automatically detects GPU and uses CUDA/TensorRT
when available, falling back to CPU if no GPU is found.
When force_cpu is true, always uses CPU – use this for single-query
embedding where CPU is faster than GPU due to CUDA context setup overhead.
Note: Model download and ONNX session are lazy-loaded on first embedding request. This avoids HuggingFace API calls for commands that don’t need embeddings.
Sourcepub fn new_cpu(model_config: ModelConfig) -> Result<Self, EmbedderError>
pub fn new_cpu(model_config: ModelConfig) -> Result<Self, EmbedderError>
Create a CPU-only embedder with lazy model loading.
Convenience wrapper for new() — use this for single-query embedding
where CPU is faster than GPU due to CUDA context setup overhead.
Sourcepub fn model_config(&self) -> &ModelConfig
pub fn model_config(&self) -> &ModelConfig
Get the model configuration
Sourcepub fn token_count(&self, text: &str) -> Result<usize, EmbedderError>
pub fn token_count(&self, text: &str) -> Result<usize, EmbedderError>
Counts the number of tokens in the given text using the configured tokenizer.
§Arguments
text- The text string to tokenize and count
§Returns
Returns Ok(usize) containing the number of tokens in the text, or Err(EmbedderError) if tokenization fails.
§Errors
Returns EmbedderError::Tokenizer if the tokenizer is unavailable or if encoding the text fails.
Sourcepub fn token_counts_batch(
&self,
texts: &[&str],
) -> Result<Vec<usize>, EmbedderError>
pub fn token_counts_batch( &self, texts: &[&str], ) -> Result<Vec<usize>, EmbedderError>
Count tokens for multiple texts in a single batch.
Uses encode_batch for potentially better throughput than individual
token_count calls when processing many texts.
Sourcepub fn split_into_windows(
&self,
text: &str,
max_tokens: usize,
overlap: usize,
) -> Result<Vec<(String, u32)>, EmbedderError>
pub fn split_into_windows( &self, text: &str, max_tokens: usize, overlap: usize, ) -> Result<Vec<(String, u32)>, EmbedderError>
Split text into overlapping windows of max_tokens with overlap tokens of context. Returns Vec of (window_content, window_index). If text fits in max_tokens, returns single window with index 0.
§Panics
Panics if overlap >= max_tokens / 2 as this creates exponential window count.
Sourcepub fn embed_documents(
&self,
texts: &[&str],
) -> Result<Vec<Embedding>, EmbedderError>
pub fn embed_documents( &self, texts: &[&str], ) -> Result<Vec<Embedding>, EmbedderError>
Embed documents (code chunks). Adds model-specific document prefix.
Large inputs are processed in batches of 64 to cap GPU memory usage.
pub fn embed_query(&self, text: &str) -> Result<Embedding, EmbedderError>
Sourcepub fn provider(&self) -> ExecutionProvider
pub fn provider(&self) -> ExecutionProvider
Get the execution provider being used
Sourcepub fn clear_session(&self)
pub fn clear_session(&self)
Clear the ONNX session to free memory (~500MB).
The session will be lazily re-initialized on the next embedding request. Use this in long-running processes during idle periods to reduce memory footprint.
§Safety constraint
Must only be called during idle periods – not while embedding is in progress. Watch mode guarantees single-threaded access.
Sourcepub fn warm(&self) -> Result<(), EmbedderError>
pub fn warm(&self) -> Result<(), EmbedderError>
Warm up the model with a dummy inference
Sourcepub fn embedding_dim(&self) -> usize
pub fn embedding_dim(&self) -> usize
Returns the embedding dimension detected from the model. Falls back to the model config’s declared dimension if no inference has been run yet.
Auto Trait Implementations§
impl !Freeze for Embedder
impl RefUnwindSafe for Embedder
impl Send for Embedder
impl Sync for Embedder
impl Unpin for Embedder
impl UnsafeUnpin for Embedder
impl UnwindSafe for Embedder
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more