miyabi-llm
LLM abstraction layer for Miyabi - GPT-OSS-20B integration
Features
- Provider abstraction: Unified trait for all LLM providers
- GPT-OSS-20B support: Native support for OpenAI's open-source model (Apache 2.0 license)
- Multiple backends: vLLM, Ollama, Groq
- Async/await: Built on tokio for high performance
- Function calling: Support for structured function calls (planned)
- Reasoning levels: Low, Medium, High reasoning effort
Installation
Add this to your Cargo.toml:
[]
= { = "1.0.0", = "../miyabi-llm" }
Quick Start
Groq (Recommended for quick start)
use ;
async
vLLM (Recommended for production)
# Start vLLM server
use ;
async
Ollama (Recommended for development)
# Pull model
# Run model
use ;
async
API Reference
LLMProvider trait
Core trait for all LLM providers.
GPTOSSProvider
GPT-OSS-20B provider implementation.
Constructors:
GPTOSSProvider::new_groq(api_key)- Groq providerGPTOSSProvider::new_vllm(endpoint)- vLLM providerGPTOSSProvider::new_ollama()- Ollama provider
Builder methods:
.with_model(model)- Set custom model name.with_timeout(duration)- Set request timeout
LLMRequest
Request configuration for LLM inference.
Builder methods:
LLMRequest::new(prompt)- Create new request with defaults.with_temperature(temp)- Set temperature (0.0-2.0).with_max_tokens(tokens)- Set max tokens.with_reasoning_effort(effort)- Set reasoning level
ReasoningEffort
Reasoning effort level for inference.
ReasoningEffort::Low- Fast inference for simple tasksReasoningEffort::Medium- Balanced quality and speed (default)ReasoningEffort::High- High quality reasoning for complex tasks
LLMResponse
Response from LLM inference.
Chat Completion
use ;
async
Error Handling
use ;
async
Cost Comparison
Groq (Pay-per-use)
- Input: $0.10 / 1M tokens
- Output: $0.50 / 1M tokens
- Speed: 1000+ tokens/second
- Best for: Prototyping, low-frequency use
Example cost (500 Agent executions/month):
- Input: 1M tokens × $0.10 = $0.10
- Output: 0.5M tokens × $0.50 = $0.25
- Total: $0.35/month ($4.20/year)
vLLM (Self-hosted)
- Infrastructure: AWS p3.2xlarge @ $3.06/hour
- Monthly: $2,203 (24/7) or $539 (8h/day × 22days)
- Best for: Production, high-frequency use
Ollama (Local)
- Hardware: NVIDIA RTX 4080 16GB (~$1,200)
- Electricity: ~$6.76/month
- Best for: Development, privacy-sensitive applications
Performance
| Provider | Speed | Latency | Cost/1M tokens |
|---|---|---|---|
| Groq | 1000+ t/s | ~1-2s | $0.10 in, $0.50 out |
| vLLM | 500-1000 t/s | ~2-3s | Self-hosted |
| Ollama | 50-100 t/s | ~5-15s | Self-hosted |
Configuration
Environment Variables
# Groq API key (required for Groq provider)
# vLLM endpoint (optional, default: http://localhost:8000)
# Ollama endpoint (optional, default: http://localhost:11434)
.miyabi.yml
llm:
provider: "groq" # "vllm" | "ollama" | "groq"
groq:
api_key: "${GROQ_API_KEY}"
model: "openai/gpt-oss-20b"
vllm:
endpoint: "http://localhost:8000"
ollama:
model: "gpt-oss:20b"
default_temperature: 0.2
default_max_tokens: 4096
default_reasoning_effort: "medium"
Testing
# Run tests
# Run tests with output
# Run specific test
Examples
See the examples/ directory for more examples:
basic.rs- Basic usage examplechat.rs- Chat completion examplestreaming.rs- Streaming responses (planned)function_calling.rs- Function calling example (planned)
Roadmap
- Core LLMProvider trait
- GPTOSSProvider implementation
- Groq support
- vLLM support
- Ollama support
- Basic chat completion
- Streaming responses
- Function calling
- Token counting utilities
- Retry logic with exponential backoff
- Response caching
License
Apache-2.0
Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
Related Projects
- Miyabi - Complete autonomous AI development operations platform
- miyabi-agents - Agent implementations using miyabi-llm
- GPT-OSS - OpenAI's open-source model