MiniLLMLib-RS
A minimalist, async-first Rust library for LLM interactions with streaming support.
Features
- Async-first: Built on Tokio for high-performance async operations
- Streaming Support: First-class SSE streaming for real-time responses
- Conversation Trees:
ChatNodeprovides tree-based conversation structure with branching - Tree Manipulation:
detach(),merge(), tree iterators (depth-first, breadth-first, leaves) - Template Substitution: Format kwargs with
{placeholders}in messages - Thread Serialization: Save/load conversation threads to/from JSON files
- Cost Tracking: OpenRouter usage accounting with callbacks
- Multimodal: Support for images and audio in messages
- JSON Repair: Robust handling of malformed JSON from LLM outputs
- OpenRouter Compatible: Works with OpenRouter, OpenAI, and any OpenAI-compatible API
- Retry with Backoff: Built-in exponential backoff and retry logic
- Provider Routing: OpenRouter provider settings (sort, ignore, data collection)
Installation
Add to your Cargo.toml:
[]
= "0.2"
= { = "1", = ["rt-multi-thread", "macros"] }
Quick Start
use ;
async
Environment Variables
Set your API key in a .env file or environment:
OPENROUTER_API_KEY=sk-or-v1-your-key-here
# Or for direct OpenAI:
OPENAI_API_KEY=sk-your-key-here
Usage Examples
Basic Completion
use ;
let generator = openrouter;
let root = root;
let user = root.add_user;
// With custom parameters
let params = new
.with_params;
let response = user.complete.await?;
println!;
Streaming
let root = root;
let user = root.add_user;
let mut stream = user.complete_streaming.await?;
while let Some = stream.next_chunk.await
Multi-turn Conversation
let root = root;
// First turn
let response1 = root.chat.await?;
// Second turn - context is preserved
let response2 = response1.chat.await?;
// Response will mention "Alice"
Image Input
use ;
let generator = openrouter;
let image = from_file?;
let content = with_images;
let root = root;
let user = root.add_child;
let response = user.complete.await?;
Audio Input
use ;
let audio = from_file?;
let content = with_audio;
JSON Response with Repair
let params = new
.with_parse_json // Enable JSON repair
.with_crash_on_refusal // Retry if no valid JSON
.with_retry; // Number of retries
let response = user.complete.await?;
// response.text() will contain valid, repaired JSON
Retry with Exponential Backoff
let params = new
.with_retry
.with_exp_back_off
.with_back_off_time // Start with 1 second
.with_max_back_off // Max 30 seconds
.with_crash_on_empty; // Retry on empty responses
Force Prepend (Constrained Generation)
// Force the model to start its response with specific text
let params = new
.with_force_prepend;
// Response will start with "Score: " followed by the model's completion
OpenRouter Provider Settings
use ;
let provider = new
.sort_by_throughput // or .sort_by_price()
.deny_data_collection
.with_ignore; // Exclude providers
let params = new
.with_provider;
Custom/Extra Parameters
// Pass arbitrary parameters to the API
let params = new
.with_extra
.with_extra;
Pretty Print Conversations
use ;
let root = root;
let user = root.add_user;
let assistant = user.add_assistant;
// Default formatting
let pretty = format_conversation;
// Output: "SYSTEM: You are helpful.\n\nUSER: Hello\n\nASSISTANT: Hi there!"
// Custom formatting
let config = new;
let pretty = pretty_messages;
Template Substitution (Format Kwargs)
use ChatNode;
// Create a reusable prompt template
let root = root;
root.set_format_kwarg;
root.set_format_kwarg;
let user = root.add_user;
// Get formatted messages with placeholders replaced
let formatted = user.formatted_thread;
// Messages now contain "You are Claude, a helpful assistant." etc.
Save and Load Conversation Threads
use ChatNode;
// Build a conversation
let root = root;
root.set_format_kwarg;
let user = root.add_user;
let assistant = user.add_assistant;
// Save to JSON file
assistant.save_thread?;
// Load from JSON file (returns root and leaf)
let = from_thread_file?;
// Or load from JSON string
let json = r#"{"prompts": [{"role": "system", "content": "Hello"}], "required_kwargs": {}}"#;
let = from_thread_json?;
Tree Manipulation
use ChatNode;
// Navigate to root from any node
let root = some_deep_node.get_root;
// Detach a subtree
let subtree = node.detach; // node is now a new root
// Merge trees
let merged = tree1_leaf.merge; // tree2's root becomes child of tree1_leaf
// Iterate over tree
for node in root.iter_depth_first
// Get all leaves
let leaves = root.iter_leaves;
// Count nodes
let count = root.node_count;
Cost Tracking (OpenRouter)
use ;
use ;
let generator = openrouter;
// Track costs across multiple requests
let total_cost = new;
let cost_tracker = total_cost.clone;
let params = new
.with_openrouter_cost_tracking
.with_cost_callback;
let root = root;
let user = root.add_user;
let response = user.complete.await?;
println!;
API Reference
Core Types
| Type | Description |
|---|---|
ChatNode |
A node in the conversation tree |
GeneratorInfo |
LLM provider configuration |
CompletionParameters |
Generation parameters (temperature, max_tokens, etc.) |
NodeCompletionParameters |
Per-request settings (retry, JSON parsing, cost tracking, etc.) |
Message |
A single message with role and content |
MessageContent |
Text or multimodal content |
ThreadData |
Serializable conversation thread with format kwargs |
CostInfo |
Cost and token usage information from completions |
CostTrackingType |
Cost tracking mode (None, OpenRouter) |
GeneratorInfo Methods
// Pre-configured providers
openrouter // OpenRouter API
openai // OpenAI API
anthropic // Anthropic API
custom // Custom endpoint
// Builder methods
.with_api_key
.with_api_key_from_env
.with_header
.with_vision
.with_audio
.with_max_context
.with_default_params
CompletionParameters
| Parameter | Type | Default | Description |
|---|---|---|---|
max_tokens |
Option<u32> |
4096 |
Maximum tokens to generate |
temperature |
Option<f32> |
0.7 |
Sampling temperature |
top_p |
Option<f32> |
None |
Nucleus sampling |
top_k |
Option<u32> |
None |
Top-k sampling |
stop |
Option<Vec<String>> |
None |
Stop sequences |
seed |
Option<u64> |
None |
Random seed |
provider |
Option<ProviderSettings> |
None |
OpenRouter provider routing |
extra |
Option<HashMap> |
None |
Custom parameters |
NodeCompletionParameters
| Parameter | Type | Default | Description |
|---|---|---|---|
system_prompt |
Option<String> |
None |
Override system prompt |
parse_json |
bool |
false |
Parse/repair JSON response |
force_prepend |
Option<String> |
None |
Force response prefix |
retry |
u32 |
4 |
Retry attempts |
exp_back_off |
bool |
false |
Exponential backoff |
back_off_time |
f64 |
1.0 |
Initial backoff (seconds) |
max_back_off |
f64 |
15.0 |
Max backoff (seconds) |
crash_on_refusal |
bool |
false |
Error if no JSON |
crash_on_empty_response |
bool |
false |
Error if empty |
cost_tracking |
CostTrackingType |
None |
Enable cost tracking |
cost_callback |
Option<CostCallback> |
None |
Callback for cost info |
ProviderSettings (OpenRouter)
| Parameter | Description |
|---|---|
order |
Ordered list of providers to try |
sort |
Sort by: "price", "throughput", "latency" |
ignore |
Providers to exclude |
data_collection |
"allow" or "deny" |
allow_fallbacks |
Allow fallback providers |
CLI Tool
The library includes a CLI for JSON repair:
# Repair JSON from file
# Repair JSON from stdin
|
Running Tests
# Run all tests (unit + integration)
# Run only unit tests (fast, no API calls)
# Run integration tests (requires API key)
# Run with output
License
MIT License - see LICENSE for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.