llm-gateway
- English | 中文
Table of Contents
- Features
- Supported providers (library mode)
- Usage
- YAML config
- Load in code
- Cargo dependency
- Init from env
- Explicit config
- Streaming
- Environment variables
- License
A lightweight Rust library to normalize requests and responses across multiple LLM providers.
Features:
- Provider-agnostic request/response types
- Simple trait for implementing new providers
- Built-in routing by provider prefix: deepseek/…, glm/…, qwen/…, kimi/…
Supported providers (library mode)
- DeepSeek (prefix: deepseek/)
- Zhipu GLM (prefix: glm/ or zhipu/)
- Alibaba Qwen (prefix: qwen/ or alibaba/)
- Moonshot Kimi (prefix: kimi/ or moonshot/)
Usage
Config via YAML (recommended)
- Create a file (not committed to Git), e.g. llm-gateway.config.yaml in your project root or ~/.config/llm-gateway/config.yaml
- Example contents:
# llm-gateway.config.yaml
# Do NOT commit this file (contains secrets)
deepseek:
api_key:
base_url: https://api.deepseek.com/v1
glm:
api_key:
base_url: https://open.bigmodel.cn/api/paas/v4
qwen:
api_key:
base_url: https://dashscope.aliyun.com/compatible-mode/v1
kimi:
api_key:
base_url: https://api.moonshot.cn/v1
Load it in code automatically:
use ;
use ChatRequest;
# async
Add to your Cargo.toml:
[]
= "0.1.0"
Initialize the client from environment variables (recommended):
use ;
use ChatRequest;
# async
Or configure explicitly:
use ;
use ChatRequest;
# async
Environment variables recognized by from_env()
:
- DEEPSEEK_API_KEY, DEEPSEEK_BASE_URL (default https://api.deepseek.com/v1)
- GLM_API_KEY or ZHIPU_API_KEY, GLM_BASE_URL or ZHIPU_BASE_URL (default https://open.bigmodel.cn/api/paas/v4)
- QWEN_API_KEY or ALIBABA_QWEN_API_KEY, QWEN_BASE_URL or ALIBABA_QWEN_BASE_URL (default https://dashscope.aliyun.com/compatible-mode/v1)
- KIMI_API_KEY or MOONSHOT_API_KEY, KIMI_BASE_URL or MOONSHOT_BASE_URL (default https://api.moonshot.cn/v1)
Streaming
use StreamExt;
use ;
use ChatRequest;
# async
Note: The implementation uses OpenAI-compatible SSE parsing (lines beginning with data:
). If a provider deviates, we can add provider-specific parsers.
End-to-end flow (from minimal example to ready-to-use)
This complements the quick-start snippets above and shows the full path from configuration to calling chat, including model discovery and caching.
- Prepare configuration (choose one)
- File locations (first existing wins):
- Project root: ./llm-gateway.config.yaml
- User config: ~/.config/llm-gateway/config.yaml
- Option A: Online model discovery + result cache (TTL)
# Only config controls discovery in library mode discover_models: true discover_models_ttl_secs: 600 # 10 minutes; when 0, a built-in default of 300s is used deepseek: base_url: https://api.deepseek.com/v1 qwen: base_url: https://dashscope.aliyun.com/compatible-mode/v1 glm: base_url: https://open.bigmodel.cn/api/paas/v4 kimi: base_url: https://api.moonshot.cn/v1
- Option B: Import model list via YAML (no online discovery, no cache)
discover_models: false deepseek: base_url: https://api.deepseek.com/v1 models: - deepseek-chat - deepseek-reasoner
Tip: Avoid committing API keys to the repo. Put them in your local config or inject at runtime.
- Load config and create client
use Client;
let client = if let Some = from_yaml_file_auto else ;
- List available models
- Local/default only (no online discovery):
let local = client.list_models;
println!;
- Auto aggregation based on config (triggers discovery+cache when discover_models=true):
let auto = client.list_models_auto.await?;
println!;
Cache semantics:
- When discover_models=true, discovery results are cached using discover_models_ttl_secs;
- When set to 0, a built-in default of 300s is used;
- Cache hit returns immediately; after TTL expiration, results are refreshed automatically;
- Discovery failures fall back to local models or built-in defaults.
- Send a chat request
use ChatRequest;
let resp = client.chat.await?;
println!;
- Validate cache behavior (optional)
- First list_models_auto() triggers remote discovery and writes cache;
- Subsequent calls within TTL should be fast cache hits;
- After TTL expires, calling again refreshes remote lists;
- When network is flaky, the library falls back to cache or local lists for resilience.
Troubleshooting
- Config not found: ensure it resides in one of the two paths above, or implement your own loader;
- Model name parsing: prefer provider/model (e.g., glm/glm-4, qwen/qwen2-7b-instruct);
- 401/403 from provider: verify API key injection (via config or runtime);
- Base URL mismatch: check the provider's OpenAI-compatible endpoint.
See also: ../../docs/config.sample.yaml
License
MIT