Crate ultrafast_models_sdk

Crate ultrafast_models_sdk 

Source
Expand description

§Ultrafast Models SDK

A high-performance Rust SDK for interacting with multiple AI/LLM providers through a unified interface. The SDK provides seamless integration with various AI services including OpenAI, Anthropic, Google, and more.

§Overview

The Ultrafast Models SDK provides:

  • Unified Interface: Single API for multiple AI providers
  • Intelligent Routing: Automatic provider selection and load balancing
  • Circuit Breakers: Automatic failover and recovery mechanisms
  • Caching Layer: Built-in response caching for performance
  • Rate Limiting: Per-provider rate limiting and throttling
  • Error Handling: Comprehensive error handling and retry logic
  • Metrics Collection: Performance monitoring and analytics

§Supported Providers

The SDK supports a wide range of AI providers:

  • OpenAI: GPT-4, GPT-3.5, and other OpenAI models
  • Anthropic: Claude-3, Claude-2, and Claude Instant
  • Google: Gemini Pro, Gemini Pro Vision, and PaLM
  • Azure OpenAI: Azure-hosted OpenAI models
  • Ollama: Local and remote Ollama instances
  • Mistral AI: Mistral 7B, Mixtral, and other models
  • Cohere: Command, Command R, and other Cohere models
  • Custom Providers: Extensible provider system

§Quick Start

use ultrafast_models_sdk::{UltrafastClient, ChatRequest, Message};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create a client with multiple providers
    let client = UltrafastClient::standalone()
        .with_openai("your-openai-key")
        .with_anthropic("your-anthropic-key")
        .with_ollama("http://localhost:11434")
        .build()?;

    // Create a chat request
    let request = ChatRequest {
        model: "gpt-4".to_string(),
        messages: vec![Message::user("Hello, world!")],
        temperature: Some(0.7),
        max_tokens: Some(100),
        stream: Some(false),
        ..Default::default()
    };

    // Send the request
    let response = client.chat_completion(request).await?;
    println!("Response: {}", response.choices[0].message.content);

    Ok(())
}

§Client Modes

The SDK supports two client modes:

§Standalone Mode

Direct provider communication without gateway:

let client = UltrafastClient::standalone()
    .with_openai("your-key")
    .with_anthropic("your-key")
    .build()?;

§Gateway Mode

Communication through the Ultrafast Gateway:

let client = UltrafastClient::gateway("http://localhost:3000")
    .with_api_key("your-gateway-key")
    .build()?;

§Routing Strategies

The SDK provides multiple routing strategies:

  • Single Provider: Route all requests to one provider
  • Load Balancing: Distribute requests across providers
  • Failover: Primary provider with automatic fallback
  • Conditional Routing: Route based on request characteristics
  • A/B Testing: Route requests for testing different providers
use ultrafast_models_sdk::routing::RoutingStrategy;

// Load balancing with custom weights
let client = UltrafastClient::standalone()
    .with_openai("openai-key")
    .with_anthropic("anthropic-key")
    .with_routing_strategy(RoutingStrategy::LoadBalance {
        weights: vec![0.6, 0.4], // 60% OpenAI, 40% Anthropic
    })
    .build()?;

// Failover strategy
let client = UltrafastClient::standalone()
    .with_openai("primary-key")
    .with_anthropic("fallback-key")
    .with_routing_strategy(RoutingStrategy::Failover)
    .build()?;

§Advanced Features

§Circuit Breakers

Automatic failover and recovery:

use ultrafast_models_sdk::circuit_breaker::CircuitBreakerConfig;

let client = UltrafastClient::standalone()
    .with_openai("your-key")
    .with_circuit_breaker_config(CircuitBreakerConfig {
        failure_threshold: 5,
        recovery_timeout: Duration::from_secs(60),
        request_timeout: Duration::from_secs(30),
        half_open_max_calls: 3,
    })
    .build()?;

§Caching

Built-in response caching:

use ultrafast_models_sdk::cache::CacheConfig;

let client = UltrafastClient::standalone()
    .with_openai("your-key")
    .with_cache_config(CacheConfig {
        enabled: true,
        ttl: Duration::from_hours(1),
        max_size: 1000,
    })
    .build()?;

§Rate Limiting

Per-provider rate limiting:

use ultrafast_models_sdk::rate_limiting::RateLimitConfig;

let client = UltrafastClient::standalone()
    .with_openai("your-key")
    .with_rate_limit_config(RateLimitConfig {
        requests_per_minute: 100,
        tokens_per_minute: 10000,
        burst_size: 10,
    })
    .build()?;

§API Examples

§Chat Completions

use ultrafast_models_sdk::{ChatRequest, Message, Role};

let request = ChatRequest {
    model: "gpt-4".to_string(),
    messages: vec![
        Message {
            role: Role::System,
            content: "You are a helpful assistant.".to_string(),
        },
        Message {
            role: Role::User,
            content: "What is the capital of France?".to_string(),
        },
    ],
    temperature: Some(0.7),
    max_tokens: Some(150),
    stream: Some(false),
    ..Default::default()
};

let response = client.chat_completion(request).await?;
println!("Response: {}", response.choices[0].message.content);

§Streaming Responses

use futures::StreamExt;

let mut stream = client
    .stream_chat_completion(ChatRequest {
        model: "gpt-4".to_string(),
        messages: vec![Message::user("Tell me a story")],
        stream: Some(true),
        ..Default::default()
    })
    .await?;

while let Some(chunk) = stream.next().await {
    match chunk {
        Ok(chunk) => {
            if let Some(content) = &chunk.choices[0].delta.content {
                print!("{}", content);
            }
        }
        Err(e) => eprintln!("Error: {:?}", e),
    }
}

§Embeddings

use ultrafast_models_sdk::{EmbeddingRequest, EmbeddingInput};

let request = EmbeddingRequest {
    model: "text-embedding-ada-002".to_string(),
    input: EmbeddingInput::String("This is a test sentence.".to_string()),
    ..Default::default()
};

let response = client.embedding(request).await?;
println!("Embedding dimensions: {}", response.data[0].embedding.len());

§Image Generation

use ultrafast_models_sdk::ImageGenerationRequest;

let request = ImageGenerationRequest {
    model: "dall-e-3".to_string(),
    prompt: "A beautiful sunset over the ocean".to_string(),
    n: Some(1),
    size: Some("1024x1024".to_string()),
    ..Default::default()
};

let response = client.generate_image(request).await?;
println!("Image URL: {}", response.data[0].url);

§Error Handling

Comprehensive error handling with specific error types:

use ultrafast_models_sdk::error::UltrafastError;

match client.chat_completion(request).await {
    Ok(response) => println!("Success: {:?}", response),
    Err(UltrafastError::AuthenticationError { .. }) => {
        eprintln!("Authentication failed");
    }
    Err(UltrafastError::RateLimitExceeded { retry_after, .. }) => {
        eprintln!("Rate limit exceeded, retry after: {:?}", retry_after);
    }
    Err(UltrafastError::ProviderError { provider, message, .. }) => {
        eprintln!("Provider {} error: {}", provider, message);
    }
    Err(e) => eprintln!("Other error: {:?}", e),
}

§Configuration

Advanced client configuration:

use ultrafast_models_sdk::{UltrafastClient, ClientConfig};
use std::time::Duration;

let config = ClientConfig {
    timeout: Duration::from_secs(30),
    max_retries: 3,
    retry_delay: Duration::from_secs(1),
    user_agent: Some("MyApp/1.0".to_string()),
    ..Default::default()
};

let client = UltrafastClient::standalone()
    .with_config(config)
    .with_openai("your-key")
    .build()?;

§Testing

The SDK includes testing utilities:

#[cfg(test)]
mod tests {
    use super::*;
    use tokio_test;

    #[tokio_test]
    async fn test_chat_completion() {
        let client = UltrafastClient::standalone()
            .with_openai("test-key")
            .build()
            .unwrap();

        let request = ChatRequest {
            model: "gpt-4".to_string(),
            messages: vec![Message::user("Hello")],
            ..Default::default()
        };

        let result = client.chat_completion(request).await;
        assert!(result.is_ok());
    }
}

§Performance Optimization

Tips for optimal performance:

// Use connection pooling
let client = UltrafastClient::standalone()
    .with_connection_pool_size(10)
    .with_openai("your-key")
    .build()?;

// Enable compression
let client = UltrafastClient::standalone()
    .with_compression(true)
    .with_openai("your-key")
    .build()?;

// Configure timeouts
let client = UltrafastClient::standalone()
    .with_timeout(Duration::from_secs(15))
    .with_openai("your-key")
    .build()?;

§Migration Guide

§From OpenAI SDK

// Before (OpenAI SDK)
use openai::Client;
let client = Client::new("your-key");
let response = client.chat().create(request).await?;

// After (Ultrafast SDK)
use ultrafast_models_sdk::UltrafastClient;
let client = UltrafastClient::standalone()
    .with_openai("your-key")
    .build()?;
let response = client.chat_completion(request).await?;

§From Anthropic SDK

// Before (Anthropic SDK)
use anthropic::Client;
let client = Client::new("your-key");
let response = client.messages().create(request).await?;

// After (Ultrafast SDK)
use ultrafast_models_sdk::UltrafastClient;
let client = UltrafastClient::standalone()
    .with_anthropic("your-key")
    .build()?;
let response = client.chat_completion(request).await?;

§Contributing

We welcome contributions! Please see our contributing guide for details on:

  • Code style and formatting
  • Testing requirements
  • Documentation standards
  • Pull request process

§License

This project is licensed under the MIT License - see the LICENSE file for details.

§Support

For support and questions:

Re-exports§

pub use circuit_breaker::CircuitBreaker;
pub use circuit_breaker::CircuitBreakerConfig;
pub use circuit_breaker::CircuitState;
pub use client::ClientMode;
pub use client::UltrafastClient;
pub use client::UltrafastClientBuilder;
pub use error::ClientError;
pub use error::ProviderError;
pub use models::AudioRequest;
pub use models::AudioResponse;
pub use models::ChatRequest;
pub use models::ChatResponse;
pub use models::Choice;
pub use models::EmbeddingRequest;
pub use models::EmbeddingResponse;
pub use models::ImageRequest;
pub use models::ImageResponse;
pub use models::Message;
pub use models::Role;
pub use models::SpeechRequest;
pub use models::SpeechResponse;
pub use models::Usage;
pub use providers::create_provider_with_circuit_breaker;
pub use providers::Provider;
pub use providers::ProviderConfig;
pub use providers::ProviderMetrics;
pub use routing::Condition;
pub use routing::RoutingRule;
pub use routing::RoutingStrategy;

Modules§

cache
Caching Module
circuit_breaker
Circuit Breaker Module
client
Ultrafast Client Module
common
error
Error Handling Module
models
AI Model Types and Structures
providers
Provider System Module
routing
Intelligent Routing Module

Type Aliases§

Result
Result type for SDK operations.