seasoning
Retrieval-focused embedding and reranking infrastructure for Rust.
What this gives you
- Semantic embedding inputs with explicit query/document roles.
- Model-family-aware formatting for Gemma and Qwen3 retrieval models.
- Remote OpenAI-compatible and DeepInfra backends.
- Optional local llama.cpp execution for Hugging Face GGUF models.
- Rate limiting, retry handling, and async public APIs.
Semantic model
Seasoning separates backend dialect from model-family behavior:
Dialect::{OpenAI, DeepInfra, LlamaCpp} chooses the backend/runtime.
ModelFamily::{Gemma, Qwen3} chooses retrieval formatting semantics.
EmbeddingRole::{Query, Document} tells the crate how to format each embedding input.
You pass semantic embedding inputs. Seasoning renders the model-specific text and tokenizes it internally before execution.
For example:
- Gemma queries become
task: <task> | query: <text>.
- Gemma documents become
title: <title-or-none> | text: <text>.
- Qwen3 queries become
Instruct: <instruction>\nQuery: <text>.
- Qwen3 documents stay plain text unless a title is present, in which case the title is prepended on its own line.
Install
Remote-only usage:
[dependencies]
seasoning = { path = "." }
Local llama.cpp usage:
[dependencies]
seasoning = { path = ".", features = ["local"] }
Accelerator passthrough features imply local:
[dependencies]
seasoning = { path = ".", features = ["cuda"] }
Usage
Embeddings
use std::sync::Arc;
use std::time::Duration;
use secrecy::SecretString;
use seasoning::EmbeddingProvider;
use seasoning::embedding::{
Client as EmbedClient, Dialect, EmbedderConfig, EmbeddingInput, EmbeddingRole, ModelFamily,
RemoteEmbedderConfig, Tokenizer,
};
# async fn example() -> seasoning::Result<()> {
let embedder = EmbedClient::new(EmbedderConfig::remote(
ModelFamily::Qwen3,
Tokenizer::Tiktoken {
encoding: "cl100k_base".to_string(),
tokenizer: Arc::new(tiktoken_rs::cl100k_base()?),
},
"Qwen/Qwen3-Embedding-0.6B",
Some("Given a user query, retrieve matching passages".to_string()),
RemoteEmbedderConfig {
api_key: Some(SecretString::from("YOUR_API_KEY")),
base_url: "https://api.deepinfra.com/v1/openai".to_string(),
timeout: Duration::from_secs(10),
dialect: Dialect::DeepInfra,
embedding_dim: 1024,
requests_per_minute: 1000,
max_concurrent_requests: 50,
tokens_per_minute: 1_000_000,
},
)?)?;
let inputs = vec![EmbeddingInput {
role: EmbeddingRole::Query,
text: "memory safety without garbage collection".to_string(),
title: None,
token_count: 6,
}];
let result = embedder.embed(&inputs).await?;
println!("got {} embeddings", result.embeddings.len());
# Ok(())
# }
Reranking
use std::time::Duration;
use secrecy::SecretString;
use seasoning::RerankingProvider;
use seasoning::embedding::{Dialect, ModelFamily};
use seasoning::reranker::{Client as RerankerClient, RerankerConfig};
# async fn example() -> seasoning::Result<()> {
let reranker = RerankerClient::new(RerankerConfig {
api_key: Some(SecretString::from("YOUR_API_KEY")),
base_url: "https://api.deepinfra.com/v1".to_string(),
timeout: Duration::from_secs(10),
dialect: Dialect::DeepInfra,
model_family: ModelFamily::Qwen3,
model: "Qwen/Qwen3-Reranker-0.6B".to_string(),
instruction: None,
requests_per_minute: 1000,
max_concurrent_requests: 50,
tokens_per_minute: 1_000_000,
})?;
let query = seasoning::RerankQuery {
text: "memory-safe systems programming".to_string(),
token_count: 4,
};
let docs = vec![
seasoning::RerankDocument {
text: "Rust offers ownership and borrowing".to_string(),
token_count: 6,
},
seasoning::RerankDocument {
text: "Python emphasizes developer ergonomics".to_string(),
token_count: 5,
},
];
let scores = reranker.rerank(&query, &docs).await?;
println!("{scores:?}");
# Ok(())
# }
Local llama.cpp embeddings and reranking
use std::sync::Arc;
use std::time::Duration;
use seasoning::embedding::{Client as EmbedClient, Dialect, EmbedderConfig, ModelFamily, Tokenizer};
use seasoning::reranker::{Client as RerankerClient, RerankerConfig};
let embedder = EmbedClient::new(EmbedderConfig::local(
ModelFamily::Gemma,
Tokenizer::HuggingFace {
model_id: "google/embeddinggemma-300m".to_string(),
tokenizer: Arc::new(tokenizers::Tokenizer::from_pretrained("google/embeddinggemma-300m", None)?),
},
"hf:ggml-org/embeddinggemma-300M-GGUF/embeddinggemma-300M-Q8_0.gguf",
None,
))?;
let reranker = RerankerClient::new(RerankerConfig {
api_key: None,
base_url: String::new(),
timeout: Duration::from_secs(30),
dialect: Dialect::LlamaCpp,
model_family: ModelFamily::Qwen3,
model: "hf:ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF/qwen3-reranker-0.6b-q8_0.gguf".to_string(),
instruction: None,
requests_per_minute: 1,
max_concurrent_requests: 1,
tokens_per_minute: 1_000_000,
})?;
# let _ = (embedder, reranker);
# Ok::<(), seasoning::Error>(())
Example local GGUF model ids:
When a local hf: GGUF artifact needs to be fetched from Hugging Face, download progress is enabled by default. You can control it with environment variables:
SEASONING_HF_HUB_PROGRESS=0|1|false|true|off|on
HF_HUB_DISABLE_PROGRESS_BARS=1|0|true|false|off|on
If both are set, SEASONING_HF_HUB_PROGRESS wins.
hf:ggml-org/embeddinggemma-300M-GGUF/embeddinggemma-300M-Q8_0.gguf
hf:Qwen/Qwen3-Embedding-0.6B-GGUF/Qwen3-Embedding-0.6B-Q8_0.gguf
hf:ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF/qwen3-reranker-0.6b-q8_0.gguf
Notes
- Build with the
local feature to use Dialect::LlamaCpp.
- Gemma documents use
title: none | text: ... when no title is supplied.
- Qwen3 query instructions apply only to query embeddings.
ModelFamily controls embedding formatting semantics and local reranker prompt formatting.
Modules
seasoning::embedding for embeddings and retrieval formatting inputs
seasoning::reranker for reranking
seasoning::reqwestx for the rate-limited API client
seasoning::{AppConfig, Embedding, Reranker} for config structs