Expand description
§Seasoning
Retrieval-focused embedding and reranking infrastructure with explicit model semantics, rate limiting, retries, and optional local llama.cpp execution.
Config-driven local setups accept the llama.cpp, llamacpp, llama-cpp,
or llama_cpp dialect spellings when converting into Dialect::LlamaCpp.
Seasoning separates backend/runtime selection from retrieval formatting:
Dialect selects transport or local execution, ModelFamily selects
retrieval-family formatting, and EmbeddingRole identifies whether a
semantic embedding input is a query or document.
Embedding execution keeps a semantic public API. The crate formats and prepares the final model payload internally after the API boundary.
§Embeddings
use std::time::Duration;
use std::sync::Arc;
use secrecy::SecretString;
use seasoning::EmbeddingProvider;
use seasoning::embedding::{
Client as EmbedClient, Dialect, EmbedderConfig, EmbeddingInput, EmbeddingRole,
ModelFamily, RemoteEmbedderConfig, Tokenizer,
};
let embedder = EmbedClient::new(EmbedderConfig::remote(
ModelFamily::Qwen3,
Tokenizer::Tiktoken {
encoding: "cl100k_base".to_string(),
tokenizer: Arc::new(tiktoken_rs::cl100k_base().map_err(|e| seasoning::Error::InvalidConfiguration { message: e.to_string() })?),
},
"Qwen/Qwen3-Embedding-0.6B",
None,
RemoteEmbedderConfig {
api_key: Some(SecretString::from("YOUR_API_KEY")),
base_url: "https://api.deepinfra.com/v1/openai".to_string(),
timeout: Duration::from_secs(10),
dialect: Dialect::DeepInfra,
embedding_dim: 1024,
requests_per_minute: 1000,
max_concurrent_requests: 50,
tokens_per_minute: 1_000_000,
},
)?)?;
let inputs = vec![EmbeddingInput {
role: EmbeddingRole::Query,
text: "memory-safe systems programming".to_string(),
title: None,
token_count: 4,
}];
let _ = embedder.embed(&inputs).await?;§Reranking
use std::time::Duration;
use secrecy::SecretString;
use seasoning::RerankingProvider;
use seasoning::embedding::{Dialect, ModelFamily};
use seasoning::reranker::{Client as RerankerClient, RerankerConfig};
let reranker = RerankerClient::new(RerankerConfig {
api_key: Some(SecretString::from("YOUR_API_KEY")),
base_url: "https://api.deepinfra.com/v1".to_string(),
timeout: Duration::from_secs(10),
dialect: Dialect::DeepInfra,
model_family: ModelFamily::Qwen3,
model: "Qwen/Qwen3-Reranker-0.6B".to_string(),
instruction: None,
requests_per_minute: 1000,
max_concurrent_requests: 50,
tokens_per_minute: 1_000_000,
})?;
let query = seasoning::RerankQuery {
text: "memory-safe systems programming".to_string(),
token_count: 4,
};
let documents = vec![seasoning::RerankDocument {
text: "Rust uses ownership and borrowing".to_string(),
token_count: 6,
}];
let scores = reranker.rerank(&query, &documents).await?;
assert_eq!(scores.len(), documents.len());Modules§
- batching
- embedding
- Text embedding generation with rate limiting and retrieval-aware formatting.
- reranker
- Document reranking based on query relevance.
- service
Structs§
- AppConfig
- Top-level application config for embedding and reranking clients.
- Batch
Item - One semantic embedding item for
crate::service::EmbedderService. - Embed
Output - Output from an embedding request.
- Embedding
- Embedding client configuration.
- Embedding
Input - Input for a single embedding request.
- Rerank
Document - Rerank
Query - Reranker
- Reranker client configuration.
Enums§
- AddDecision
- Batching strategy response for a newly added item.
- Dialect
- Backend/runtime dialect for embedding and reranking execution.
- Embedding
Role - Retrieval role for an embedding input.
- Error
- Crate-wide error type.
- Model
Family - Retrieval-family semantics used to format embedding and reranking inputs.
- Tokenizer
- Preloaded tokenizer instances used by the embedding model layer.
Traits§
- Batching
Strategy - Strategy interface for token-aware batch assembly.
- Embedding
Provider - Trait for embedding providers.
- Reranking
Provider - Trait for reranking providers.
Type Aliases§
- Provider
Dialect - Backwards-compatible alias for the previous public name.
- Result
- Crate-wide result alias.