FlyLLM
FlyLLM is a Rust library that provides a load-balanced, multi-provider client for Large Language Models. It enables developers to seamlessly work with multiple LLM providers (OpenAI, Anthropic, Google, Mistral...) through a unified API with request routing, load balancing, and failure handling.
Features
- Multiple Provider Support 🌐: Unified interface for OpenAI, Anthropic, Google, Ollama and Mistral APIs
- Task-Based Routing 🧭: Route requests to the most appropriate provider based on predefined tasks
- Load Balancing ⚖️: Automatically distribute load across multiple provider instances
- Failure Handling 🛡️: Retry mechanisms and automatic failover between providers
- Parallel Processing ⚡: Process multiple requests concurrently for improved throughput
- Custom Parameters 🔧: Set provider-specific parameters per task or request
- Usage Tracking 📊: Monitor token consumption for cost management
- Debug Logging 🔍: Optional request/response logging to JSON files for debugging and analysis
- Builder Pattern Configuration ✨: Fluent and readable setup for tasks and providers.
Installation
Add FlyLLM to your Cargo.toml:
[]
= "0.3.0"
= { = "1", = ["macros", "rt-multi-thread", "sync"] } # For async runtime
Architecture
The LLM Manager (LLMManager) serves as the core component for orchestrating language model interactions in your application. It manages multiple LLM instances (LLMInstance), each defined by a model, API key, and supported tasks (TaskDefinition).
When your application sends a generation request (GenerationRequest), the manager automatically selects an appropriate instance based on configurable strategies (Last Recently Used, Quickest Response Time, etc.) and returns the generated response by the LLM (LLMResponse). This design prevents rate limiting by distributing requests across multiple instances (even of the same model) with different API keys.
The manager handles failures gracefully by re-routing requests to alternative instances. It also supports parallel execution for significant performance improvements when processing multiple requests simultaneously!
You can define default parameters (temperature, max_tokens) for each task while retaining the ability to override these settings in specific requests. The system also tracks token usage across all instances:
ID Provider Model Prompt Tokens Completion Tokens Total Tokens
-----------------------------------------------------------------------------------------------
0 mistral mistral-small-latest 109 897 1006
1 anthropic claude-3-sonnet-20240229 133 1914 2047
2 anthropic claude-3-opus-20240229 51 529 580
3 google gemini-2.0-flash 0 0 0
4 openai gpt-3.5-turbo 312 1003 1315
Usage Examples
The following sections describe the usage of flyllm. You can also check out the example given in examples/task_routing.rs! To activate FlyLLM's debug messages by setting the environment variable RUST_LOG to the value "debug".
Quick Start
use ;
use env; // To read API keys from environment variables
async
Adding Multiple Providers
Configure the LlmManager with various providers, each supporting different tasks.
use ;
use env;
use PathBuf;
async
Task-Based Routing
Define tasks with specific default parameters and create requests targeting those tasks. FlyLLM routes the request to a provider configured to support that task.
use ;
use env;
// Assume manager is configured as shown in "Adding Multiple Providers"
async
Parallel Processing
// Process in parallel
let parallel_results = manager.batch_generate.await;
// Process each result
for result in parallel_results
Debug Logging
FlyLLM supports optional debug logging to help you analyze requests and responses. When enabled, it creates JSON files with detailed information about each generation call.
use ;
use PathBuf;
async
The debug files contain structured JSON with:
- Metadata: timestamp, instance details, request duration
- Input: prompt, task, parameters used
- Output: success status, generated content or error, token usage
Example debug file structure:
License
This project is licensed under the MIT License - see the LICENSE file for details.
Contributing
Contributions are always welcome! If you're interested in contributing to FlyLLM, please fork the repository and create a new branch for your changes. When you're done with your changes, submit a pull request to merge your changes into the main branch.
Supporting FlyLLM
If you want to support FlyLLM, you can:
- Star :star: the project in Github!
- Donate :coin: to my Ko-fi page!
- Share :heart: the project with your friends!