seasoning 0.2.1

Embedding and reranking infrastructure with rate limiting and retry logic
Documentation

seasoning

Retrieval-focused embedding and reranking infrastructure for Rust.

What this gives you

  • Semantic embedding inputs with explicit query/document roles.
  • Model-family-aware formatting for Gemma and Qwen3 retrieval models.
  • Remote OpenAI-compatible and DeepInfra backends.
  • Optional local llama.cpp execution for Hugging Face GGUF models.
  • Rate limiting, retry handling, and async public APIs.

Semantic model

Seasoning separates backend dialect from model-family behavior:

  • Dialect::{OpenAI, DeepInfra, LlamaCpp} chooses the backend/runtime.
  • ModelFamily::{Gemma, Qwen3} chooses retrieval formatting semantics.
  • EmbeddingRole::{Query, Document} tells the crate how to format each embedding input.

You pass semantic embedding inputs. Seasoning renders the model-specific text and tokenizes it internally before execution.

For example:

  • Gemma queries become task: <task> | query: <text>.
  • Gemma documents become title: <title-or-none> | text: <text>.
  • Qwen3 queries become Instruct: <instruction>\nQuery: <text>.
  • Qwen3 documents stay plain text unless a title is present, in which case the title is prepended on its own line.

Install

Remote-only usage:

[dependencies]
seasoning = { path = "." }

Local llama.cpp usage:

[dependencies]
seasoning = { path = ".", features = ["local"] }

Accelerator passthrough features imply local:

[dependencies]
seasoning = { path = ".", features = ["cuda"] }
# or: ["metal"], ["vulkan"]

Usage

Embeddings

use std::sync::Arc;
use std::time::Duration;

use secrecy::SecretString;
use seasoning::EmbeddingProvider;
use seasoning::embedding::{
    Client as EmbedClient, Dialect, EmbedderConfig, EmbeddingInput, EmbeddingRole, ModelFamily,
    RemoteEmbedderConfig, Tokenizer,
};

# async fn example() -> seasoning::Result<()> {
let embedder = EmbedClient::new(EmbedderConfig::remote(
    ModelFamily::Qwen3,
    Tokenizer::Tiktoken {
        encoding: "cl100k_base".to_string(),
        tokenizer: Arc::new(tiktoken_rs::cl100k_base()?),
    },
    "Qwen/Qwen3-Embedding-0.6B",
    Some("Given a user query, retrieve matching passages".to_string()),
    RemoteEmbedderConfig {
        api_key: Some(SecretString::from("YOUR_API_KEY")),
        base_url: "https://api.deepinfra.com/v1/openai".to_string(),
        timeout: Duration::from_secs(10),
        dialect: Dialect::DeepInfra,
        embedding_dim: 1024,
        requests_per_minute: 1000,
        max_concurrent_requests: 50,
        tokens_per_minute: 1_000_000,
    },
)?)?;

let inputs = vec![EmbeddingInput {
    role: EmbeddingRole::Query,
    text: "memory safety without garbage collection".to_string(),
    title: None,
    token_count: 6,
}];
let result = embedder.embed(&inputs).await?;
println!("got {} embeddings", result.embeddings.len());
# Ok(())
# }

Reranking

use std::time::Duration;

use secrecy::SecretString;
use seasoning::RerankingProvider;
use seasoning::embedding::{Dialect, ModelFamily};
use seasoning::reranker::{Client as RerankerClient, RerankerConfig};

# async fn example() -> seasoning::Result<()> {
let reranker = RerankerClient::new(RerankerConfig {
    api_key: Some(SecretString::from("YOUR_API_KEY")),
    base_url: "https://api.deepinfra.com/v1".to_string(),
    timeout: Duration::from_secs(10),
    dialect: Dialect::DeepInfra,
    model_family: ModelFamily::Qwen3,
    model: "Qwen/Qwen3-Reranker-0.6B".to_string(),
    instruction: None,
    requests_per_minute: 1000,
    max_concurrent_requests: 50,
    tokens_per_minute: 1_000_000,
})?;

let query = seasoning::RerankQuery {
    text: "memory-safe systems programming".to_string(),
    token_count: 4,
};
let docs = vec![
    seasoning::RerankDocument {
        text: "Rust offers ownership and borrowing".to_string(),
        token_count: 6,
    },
    seasoning::RerankDocument {
        text: "Python emphasizes developer ergonomics".to_string(),
        token_count: 5,
    },
];

let scores = reranker.rerank(&query, &docs).await?;
println!("{scores:?}");
# Ok(())
# }

Local llama.cpp embeddings and reranking

use std::sync::Arc;
use std::time::Duration;

use seasoning::embedding::{Client as EmbedClient, Dialect, EmbedderConfig, ModelFamily, Tokenizer};
use seasoning::reranker::{Client as RerankerClient, RerankerConfig};

let embedder = EmbedClient::new(EmbedderConfig::local(
    ModelFamily::Gemma,
    Tokenizer::HuggingFace {
        model_id: "google/embeddinggemma-300m".to_string(),
        tokenizer: Arc::new(tokenizers::Tokenizer::from_pretrained("google/embeddinggemma-300m", None)?),
    },
    "hf:ggml-org/embeddinggemma-300M-GGUF/embeddinggemma-300M-Q8_0.gguf",
    None,
))?;

let reranker = RerankerClient::new(RerankerConfig {
    api_key: None,
    base_url: String::new(),
    timeout: Duration::from_secs(30),
    dialect: Dialect::LlamaCpp,
    model_family: ModelFamily::Qwen3,
    model: "hf:ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF/qwen3-reranker-0.6b-q8_0.gguf".to_string(),
    instruction: None,
    requests_per_minute: 1,
    max_concurrent_requests: 1,
    tokens_per_minute: 1_000_000,
})?;
# let _ = (embedder, reranker);
# Ok::<(), seasoning::Error>(())

Example local GGUF model ids:

When a local hf: GGUF artifact needs to be fetched from Hugging Face, download progress is enabled by default. You can control it with environment variables:

  • SEASONING_HF_HUB_PROGRESS=0|1|false|true|off|on
  • HF_HUB_DISABLE_PROGRESS_BARS=1|0|true|false|off|on

If both are set, SEASONING_HF_HUB_PROGRESS wins.

  • hf:ggml-org/embeddinggemma-300M-GGUF/embeddinggemma-300M-Q8_0.gguf
  • hf:Qwen/Qwen3-Embedding-0.6B-GGUF/Qwen3-Embedding-0.6B-Q8_0.gguf
  • hf:ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF/qwen3-reranker-0.6b-q8_0.gguf

Notes

  • Build with the local feature to use Dialect::LlamaCpp.
  • Gemma documents use title: none | text: ... when no title is supplied.
  • Qwen3 query instructions apply only to query embeddings.
  • ModelFamily controls embedding formatting semantics and local reranker prompt formatting.

Modules

  • seasoning::embedding for embeddings and retrieval formatting inputs
  • seasoning::reranker for reranking
  • seasoning::reqwestx for the rate-limited API client
  • seasoning::{AppConfig, Embedding, Reranker} for config structs