seasoning
Retrieval-focused embedding and reranking infrastructure for Rust.
What this gives you
- Semantic embedding inputs with explicit query/document roles.
- Model-family-aware formatting for Gemma and Qwen3 retrieval models.
- Remote OpenAI-compatible and DeepInfra backends.
- Feature-gated local llama.cpp execution for the scoped Hugging Face GGUF models.
- Rate limiting, retry handling, and async public APIs.
Semantic model
Seasoning now separates backend dialect from model-family behavior:
Dialect::{OpenAI, DeepInfra, LlamaCpp}chooses the transport/runtime (config parsing acceptsllama.cpp,llamacpp,llama-cpp, orllama_cpp).ModelFamily::{Gemma, Qwen3}chooses retrieval formatting semantics.EmbeddingRole::{Query, Document}tells the crate how to format each semantic embedding input.
Callers render semantic inputs first, then tokenize the rendered payload with the tokenizer for the target embedding model before execution.
For example:
- Gemma queries become
task: <task> | query: <text>. - Gemma documents become
title: <title-or-none> | text: <text>. - Qwen3 queries become
Instruct: <instruction>\nQuery: <text>. - Qwen3 documents stay plain text unless a title is present, in which case the title is prepended on its own line.
Install
Remote-only usage:
[]
= { = "." }
Local llama.cpp usage:
[]
= { = ".", = ["local"] }
Accelerator passthrough features imply local:
[]
= { = ".", = ["cuda"] }
# or: ["metal"], ["vulkan"]
Usage
Embeddings
use Duration;
use SecretString;
use EmbeddingProvider;
use ;
# async
Reranking
use Duration;
use SecretString;
use RerankingProvider;
use ;
use ;
# async
Local llama.cpp embeddings and reranking
use Duration;
use ;
use ;
let embedder = new?;
let reranker = new?;
# let _ = ;
# Ok::
Supported local GGUF artifacts are intentionally narrow for this change, and unsupported local models fail during client construction. Config-driven setups may spell the local dialect as llama.cpp, llamacpp, llama-cpp, or llama_cpp.
When a local hf: GGUF artifact needs to be fetched from Hugging Face, download progress is enabled by default. You can control it with environment variables:
SEASONING_HF_HUB_PROGRESS=0|1|false|true|off|onHF_HUB_DISABLE_PROGRESS_BARS=1|0|true|false|off|on
If both are set, SEASONING_HF_HUB_PROGRESS wins.
Supported model ids:
hf:ggml-org/embeddinggemma-300M-GGUF/embeddinggemma-300M-Q8_0.ggufhf:Qwen/Qwen3-Embedding-0.6B-GGUF/Qwen3-Embedding-0.6B-Q8_0.ggufhf:ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF/qwen3-reranker-0.6b-q8_0.gguf
Notes
- Local
Dialect::LlamaCppconstruction fails explicitly when the crate is built without thelocalfeature. - Gemma document formatting uses
title: none | text: ...when no title is supplied. - Qwen3 query instructions apply only to query embeddings; document embeddings ignore them.
- Embedding execution consumes
PreparedEmbeddingInput; semantic rendering happens before tokenization. - Retrieval semantics come from
ModelFamilyrather than transport labels.
Modules
seasoning::embeddingfor embeddings and retrieval formatting inputsseasoning::rerankerfor rerankingseasoning::reqwestxfor the rate-limited API clientseasoning::configfor config structs (no I/O)