flyllm 0.1.1

A rust library for unifying LLM backends as an abstraction layer with load balancing.
Documentation
# FlyLLM


FlyLLM is a Rust library that provides a load-balanced, multi-provider client for Large Language Models. It enables developers to seamlessly work with multiple LLM providers (OpenAI, Anthropic, Google, Mistral...) through a unified API with request routing, load balancing, and failure handling.

<div align="center">
  <img src="https://github.com/user-attachments/assets/00adb692-d65f-47ca-aeed-fda5264ef094" width="300" alt="FlyLLM Logo">
</div>

## Features


- **Multiple Provider Support** 🌐: Unified interface for OpenAI, Anthropic, Google, and Mistral APIs
- **Task-Based Routing** 🧭: Route requests to the most appropriate provider based on predefined tasks
- **Load Balancing** ⚖️: Automatically distribute load across multiple provider instances
- **Failure Handling** 🛡️: Retry mechanisms and automatic failover between providers
- **Parallel Processing** ⚡: Process multiple requests concurrently for improved throughput
- **Custom Parameters** 🔧: Set provider-specific parameters per task or request
- **Usage Tracking** 📊: Monitor token consumption for cost management

## Installation


Add FlyLLM to your `Cargo.toml`:

```toml
[dependencies]
flyllm = "0.1.0"
tokio = { version = "1", features = ["full"] } # For async runtime
```

## Architecture


![Open Escordia_2025-04-25_13-41-55](https://github.com/user-attachments/assets/a56e375b-0bca-4de6-a4d3-c000812105d5)

The LLM Manager (`LLMManager`) serves as the core component for orchestrating language model interactions in your application. It manages multiple LLM instances (`LLMInstance`), each defined by a model, API key, and supported tasks (`TaskDefinition`).

When your application sends a generation request (`GenerationRequest`), the manager automatically selects an appropriate instance based on configurable strategies (Last Recently Used, Quickest Response Time, etc.) and returns the generated response by the LLM (`LLMResponse`). This design prevents rate limiting by distributing requests across multiple instances (even of the same model) with different API keys.

The manager handles failures gracefully by re-routing requests to alternative instances. It also supports parallel execution for significant performance improvements when processing multiple requests simultaneously!

You can define default parameters (temperature, max_tokens) for each task while retaining the ability to override these settings in specific requests. The system also tracks token usage across all instances:

```
--- Token Usage Statistics ---
ID    Provider        Model                          Prompt Tokens   Completion Tokens Total Tokens
-----------------------------------------------------------------------------------------------
0     mistral         mistral-small-latest           109             897             1006
1     anthropic       claude-3-sonnet-20240229       133             1914            2047
2     anthropic       claude-3-opus-20240229         51              529             580
3     google          gemini-2.0-flash               0               0               0
4     openai          gpt-3.5-turbo                  312             1003            1315
```

## Usage Examples


You can activate FlyLLM's debug messages by setting the environment variable `RUST_LOG` to the value `"debug"`.

### Quick Start


```rust
use flyllm::{
    ProviderType, LlmManager, GenerationRequest, TaskDefinition,
    use_logging
};
use std::collections::HashMap;
use serde_json::json;

#[tokio::main]

async fn main() {
    // Initialize logging
    use_logging();

    // Create an LLM manager
    let mut manager = LlmManager::new();

    // Define a task with specific parameters
    let summary_task = TaskDefinition {
        name: "summary".to_string(),
        parameters: HashMap::from([
            ("max_tokens".to_string(), json!(500)),
            ("temperature".to_string(), json!(0.3)),
        ]),
    };

    // Add a provider with its supported tasks
    manager.add_provider(
        ProviderType::OpenAI,
        std::env::var("OPENAI_API_KEY").expect("OPENAI_API_KEY not set"),
        "gpt-3.5-turbo".to_string(),
        vec![summary_task.clone()],
        true,
    );

    // Create a request
    let request = GenerationRequest {
        prompt: "Summarize the following text: Climate change refers to long-term shifts in temperatures...".to_string(),
        task: Some("summary".to_string()),
        params: None,
    };

    // Generate response 
    // The Manager will automatically choose a provider fit for the task according to the selected strategy
    let responses = manager.generate_sequentially(vec![request]).await;
    
    // Handle the response
    if let Some(response) = responses.first() {
        if response.success {
            println!("Response: {}", response.content);
        } else {
            println!("Error: {}", response.error.as_ref().unwrap_or(&"Unknown error".to_string()));
        }
    }
}
```

### Adding Multiple Providers


```rust
// Add OpenAI
manager.add_provider(
    ProviderType::OpenAI,
    openai_api_key,
    "gpt-4-turbo".to_string(),
    vec![summary_task.clone(), qa_task.clone()],
    true,
);

// Add Anthropic
manager.add_provider(
    ProviderType::Anthropic,
    anthropic_api_key,
    "claude-3-sonnet-20240229".to_string(),
    vec![creative_writing_task.clone(), code_generation_task.clone()],
    true,
);

// Add Mistral
manager.add_provider(
    ProviderType::Mistral,
    mistral_api_key,
    "mistral-large-latest".to_string(),
    vec![summary_task.clone(), translation_task.clone()],
    true,
);

// Add Google (Gemini)
manager.add_provider(
    ProviderType::Google,
    google_api_key,
    "gemini-1.5-pro".to_string(),
    vec![qa_task.clone(), creative_writing_task.clone()],
    true,
);
```

### Task-Based Routing


```rust
// Define tasks with specific parameters
let summary_task = TaskDefinition {
    name: "summary".to_string(),
    parameters: HashMap::from([
        ("max_tokens".to_string(), json!(500)),
        ("temperature".to_string(), json!(0.3)),
    ]),
};

let creative_writing_task = TaskDefinition {
    name: "creative_writing".to_string(),
    parameters: HashMap::from([
        ("max_tokens".to_string(), json!(1500)),
        ("temperature".to_string(), json!(0.9)),
    ]),
};

// Create requests with different tasks
let summary_request = GenerationRequest {
    prompt: "Summarize the following article: ...".to_string(),
    task: Some("summary".to_string()),
    params: None,
};

let story_request = GenerationRequest {
    prompt: "Write a short story about a robot discovering emotions.".to_string(),
    task: Some("creative_writing".to_string()),
    params: None,
};
```

### Parallel Processing


```rust
// Create multiple requests
let requests = vec![
    GenerationRequest { /* ... */ },
    GenerationRequest { /* ... */ },
    GenerationRequest { /* ... */ },
];

// Process in parallel
let parallel_results = manager.batch_generate(requests).await;

// Process each result
for result in parallel_results {
    if result.success {
        println!("Success: {}", result.content);
    } else {
        println!("Error: {}", result.error.as_ref().unwrap_or(&"Unknown error".to_string()));
    }
}
```

## License


This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Contributing


Contributions are always welcome! If you're interested in contributing to FlyLLM, please fork the repository and create a new branch for your changes. When you're done with your changes, submit a pull request to merge your changes into the main branch.

## Supporting FlyLLM


If you want to support FlyLLM, you can:
- **Star** :star: the project in Github!
- **Donate** :coin: to my [Ko-fi]https://ko-fi.com/rodmarkun page!
- **Share** :heart: the project with your friends!

## To-Do List

Next features planned:

- [x] ~~Usage tracking~~
- [ ] Log redirection to file
- [ ] Streaming responses
- [x] Internal documentation
- [ ] Unit & Integration tests
- [ ] Builder Pattern
- [ ] Aggregation of more strategies