Skip to main content

Crate thryd

Crate thryd 

Source
Expand description

Thryd - A lightweight, embedded LLM request router with caching.

Thryd is a Rust library for routing requests to multiple LLM providers with built-in caching, token usage tracking, rate limiting, and load balancing. It provides a unified interface for working with different LLM APIs while handling rate limits, token counting, and request optimization automatically.

§Key Features

  • Multi-provider routing: Support for OpenAI-compatible APIs and custom providers
  • Intelligent caching: Persistent request caching with automatic deduplication
  • Token tracking: Accurate token counting using tiktoken
  • Rate limiting: Configurable RPM (requests per minute) and TPM (tokens per minute) quotas
  • Load balancing: Multiple routing strategies (round-robin, least-loaded, first-available)
  • Async-first: Built on Tokio for high-performance concurrent requests
  • Extensible: Easy to add new providers and model types

§Crate Organization

The crate is organized into the following public modules:

ModuleDescription
cachePersistent request caching backed by redb
connectionsHTTP client connection management and pooling
constantsRate limiting and configuration constants
errorError types and result handling
modelsLLM model definitions and implementations
providerProvider implementations and factory functions
routeRequest routing and load balancing logic
trackerToken usage tracking and rate limiting

§Usage Example

Here’s a comprehensive example demonstrating how to use thryd:

use thryd::*;
use secrecy::SecretString;
use std::sync::Arc;
use std::path::PathBuf;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // 1. Create a provider
    let api_key = SecretString::from("your-api-key-here".to_string());
    let openai_provider = Arc::new(OpenaiCompatible::openai(api_key));

    // 2. Create a router for completions
    let mut router = Router::<CompletionTag>::default();

    // 3. Mount persistent cache (optional but recommended)
    router.mount_cache(PathBuf::from("./llm-cache.db"))?;

    // 4. Add the provider to the router
    router.add_provider(openai_provider)?;

    // 5. Deploy a model with rate limits
    router.deploy(
        "default".to_string(),
        "openai::gpt-4".to_string(),
        Some(60),          // 60 requests per minute
        Some(100_000),     // 100,000 tokens per minute
    )?;

    // 6. Create and make a request
    let request = CompletionRequest {
        message: "Explain the difference between async and sync programming.".to_string(),
        top_p: 0.9,
        temperature: 0.7,
        stream: false,
        max_completion_tokens: 200,
        presence_penalty: 0.0,
        frequency_penalty: 0.0,
    };

    // First request hits the API
    let response = router.invoke("default".to_string(), request).await?;
    println!("Response: {}", response);

    // Subsequent identical requests are served from cache

    Ok(())
}

§Architecture Overview

§Providers

Providers represent LLM API endpoints. The main provider types are:

§Models

Models represent specific LLM instances. The traits are:

  • CompletionModel trait - For text generation tasks
  • EmbeddingModel trait - For text embedding tasks

§Deployments

Deployments wrap models with usage tracking and rate limiting. Created via Router::deploy.

§Routers

Routers manage multiple deployments and route requests based on configured strategies:

§Feature Flags

  • pyo3: Enables Python bindings via PyO3
  • stubgen: Generates Python type stubs for better IDE support

§Rate Limiting

Thryd uses a sliding window algorithm for rate limiting. The constants BUCKET_COUNT and BUCKETS_WINDOW_S in the constants module control the granularity of rate limit tracking.

Re-exports§

pub use error::Result;
pub use error::ThrydError;
pub use provider::ProviderType;
pub use provider::create_provider;
pub use tracker::UsageTracker;
pub use tracker::count_token;
pub use cache::*;
pub use constants::*;
pub use models::dummy::*;
pub use models::openai::*;
pub use provider::dummy::*;
pub use provider::openai::*;
pub use route::*;

Modules§

cache
connections
constants
deployment
Deployment wrapper with usage tracking and rate limiting.
error
Error types for the Thryd system.
models
Concrete model implementations for LLM providers.
provider
route
Request Routing System
tracker
utils
Internal utility functions for the Thryd system.

Structs§

CompletionRequest
Model-related re-exports.
EmbeddingRequest
Model-related re-exports.
RerankerRequest
Model-related re-exports.

Traits§

CompletionModel
Model-related re-exports.
EmbeddingModel
Model-related re-exports.
Model
Model-related re-exports.
RerankerModel
Model-related re-exports.

Type Aliases§

Completion
Model-related re-exports.
Embedding
Model-related re-exports.
Embeddings
Model-related re-exports.
Ranking
Model-related re-exports.

Attribute Macros§

async_trait