Crate thryd

Expand description

Thryd - A lightweight, embedded LLM request router with caching.

Thryd is a Rust library for routing requests to multiple LLM providers with built-in caching, token usage tracking, rate limiting, and load balancing. It provides a unified interface for working with different LLM APIs while handling rate limits, token counting, and request optimization automatically.

§Key Features

Multi-provider routing: Support for OpenAI-compatible APIs and custom providers
Intelligent caching: Persistent request caching with automatic deduplication
Token tracking: Accurate token counting using tiktoken
Rate limiting: Configurable RPM (requests per minute) and TPM (tokens per minute) quotas
Load balancing: Multiple routing strategies (round-robin, least-loaded, first-available)
Async-first: Built on Tokio for high-performance concurrent requests
Extensible: Easy to add new providers and model types

§Crate Organization

The crate is organized into the following public modules:

Module	Description
`cache`	Persistent request caching backed by redb
`connections`	HTTP client connection management and pooling
`constants`	Rate limiting and configuration constants
`error`	Error types and result handling
`models`	LLM model definitions and implementations
`provider`	Provider implementations and factory functions
`route`	Request routing and load balancing logic
`tracker`	Token usage tracking and rate limiting

§Usage Example

Here’s a comprehensive example demonstrating how to use thryd:

use thryd::*;
use secrecy::SecretString;
use std::sync::Arc;
use std::path::PathBuf;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // 1. Create a provider
    let api_key = SecretString::from("your-api-key-here".to_string());
    let openai_provider = Arc::new(OpenaiCompatible::openai(api_key));

    // 2. Create a router for completions
    let mut router = Router::<CompletionTag>::default();

    // 3. Mount persistent cache (optional but recommended)
    router.mount_cache(PathBuf::from("./llm-cache.db"))?;

    // 4. Add the provider to the router
    router.add_provider(openai_provider)?;

    // 5. Deploy a model with rate limits
    router.deploy(
        "default".to_string(),
        "openai::gpt-4".to_string(),
        Some(60),          // 60 requests per minute
        Some(100_000),     // 100,000 tokens per minute
    )?;

    // 6. Create and make a request
    let request = CompletionRequest {
        message: "Explain the difference between async and sync programming.".to_string(),
        top_p: 0.9,
        temperature: 0.7,
        stream: false,
        max_completion_tokens: 200,
        presence_penalty: 0.0,
        frequency_penalty: 0.0,
    };

    // First request hits the API
    let response = router.invoke("default".to_string(), request).await?;
    println!("Response: {}", response);

    // Subsequent identical requests are served from cache

    Ok(())
}

§Architecture Overview

§Providers

Providers represent LLM API endpoints. The main provider types are:

OpenaiCompatible - Works with OpenAI API and compatible services
DummyProvider - For testing without API calls

§Models

Models represent specific LLM instances. The traits are:

CompletionModel trait - For text generation tasks
EmbeddingModel trait - For text embedding tasks

§Deployments

Deployments wrap models with usage tracking and rate limiting. Created via Router::deploy.

§Routers

Routers manage multiple deployments and route requests based on configured strategies:

Router<CompletionTag> - For completion/chat requests
Router<EmbeddingTag> - For embedding requests

§Feature Flags

pyo3: Enables Python bindings via PyO3
stubgen: Generates Python type stubs for better IDE support

§Rate Limiting

Thryd uses a sliding window algorithm for rate limiting. The constants BUCKET_COUNT and BUCKETS_WINDOW_S in the constants module control the granularity of rate limit tracking.

Re-exports§

pub use error::Result;
pub use error::ThrydError;
pub use provider::ProviderType;
pub use provider::create_provider;
pub use tracker::UsageTracker;
pub use tracker::count_token;
pub use cache::*;
pub use constants::*;
pub use models::dummy::*;
pub use models::openai::*;
pub use provider::dummy::*;
pub use provider::openai::*;
pub use route::*;

Modules§

cache
connections
constants
deployment: Deployment wrapper with usage tracking and rate limiting.
error: Error types for the Thryd system.
models: Concrete model implementations for LLM providers.
provider
route: Request Routing System
tracker
utils: Internal utility functions for the Thryd system.

Structs§

CompletionRequest: Model-related re-exports.
EmbeddingRequest: Model-related re-exports.
RerankerRequest: Model-related re-exports.

Traits§

CompletionModel: Model-related re-exports.
EmbeddingModel: Model-related re-exports.
Model: Model-related re-exports.
RerankerModel: Model-related re-exports.

Type Aliases§

Completion: Model-related re-exports.
Embedding: Model-related re-exports.
Embeddings: Model-related re-exports.
Ranking: Model-related re-exports.

Attribute Macros§

async_trait

Crate thryd

Crate thryd Copy item path

§Key Features

§Crate Organization

§Usage Example

§Architecture Overview

§Providers

§Models

§Deployments

§Routers

§Feature Flags

§Rate Limiting

Re-exports§

Modules§

Structs§

Traits§

Type Aliases§

Attribute Macros§

Crate thryd