llm-gateway 0.1.1

A lightweight proxy library to normalize requests and responses across multiple LLM providers.
Documentation
# llm-gateway

- English | [中文]./README.zh-CN.md

## Table of Contents
- Features
- Supported providers (library mode)
- Usage
  - YAML config
  - Load in code
  - Cargo dependency
  - Init from env
  - Explicit config
- Streaming
- Environment variables
- License

A lightweight Rust library to normalize requests and responses across multiple LLM providers.

Features:
- Provider-agnostic request/response types
- Simple trait for implementing new providers
- Built-in routing by provider prefix: deepseek/…, glm/…, qwen/…, kimi/…

## Supported providers (library mode)
- DeepSeek (prefix: deepseek/)
- Zhipu GLM (prefix: glm/ or zhipu/)
- Alibaba Qwen (prefix: qwen/ or alibaba/)
- Moonshot Kimi (prefix: kimi/ or moonshot/)

## Usage

Config via YAML (recommended)
- Create a file (not committed to Git), e.g. llm-gateway.config.yaml in your project root or ~/.config/llm-gateway/config.yaml
- Example contents:

```yaml
# llm-gateway.config.yaml
# Do NOT commit this file (contains secrets)
deepseek:
  api_key: {{YOUR_DEEPSEEK_API_KEY}}
  base_url: https://api.deepseek.com/v1
glm:
  api_key: {{YOUR_ZHIPU_API_KEY}}
  base_url: https://open.bigmodel.cn/api/paas/v4
qwen:
  api_key: {{YOUR_QWEN_API_KEY}}
  base_url: https://dashscope.aliyun.com/compatible-mode/v1
kimi:
  api_key: {{YOUR_KIMI_API_KEY}}
  base_url: https://api.moonshot.cn/v1
```

Load it in code automatically:
```rust
use llm_gateway::{Client};
use llm_gateway::types::ChatRequest;

# async fn run() -> anyhow::Result<()> {
let client = Client::from_yaml_file_auto().unwrap_or_else(|| Client::from_env());
let resp = client.chat(ChatRequest {
    model: "deepseek/deepseek-chat".into(),
    messages: vec![("user".into(), "Hello".into())],
}).await?;
println!("{}", resp.text);
# Ok(())
# }
```

Add to your Cargo.toml:

```toml
[dependencies]
llm-gateway = "0.1.0"
```

Initialize the client from environment variables (recommended):

```rust
use llm_gateway::{Client};
use llm_gateway::types::ChatRequest;

# async fn run() -> anyhow::Result<()> {
let client = Client::from_env();
let resp = client.chat(ChatRequest {
    model: "deepseek/deepseek-chat".into(),
    messages: vec![("user".into(), "Hello".into())],
}).await?;
println!("{}", resp.text);
# Ok(())
# }
```

Or configure explicitly:

```rust
use llm_gateway::{Client, Config, ProviderConfig};
use llm_gateway::types::ChatRequest;

# async fn run() -> anyhow::Result<()> {
let client = Client::with_config(Config {
    deepseek: ProviderConfig {
        base_url: Some("https://api.deepseek.com/v1".into()),
        api_key: Some("{{DEEPSEEK_API_KEY}}".into()),
    },
    glm: ProviderConfig { base_url: None, api_key: None },
    qwen: ProviderConfig { base_url: None, api_key: None },
    kimi: ProviderConfig { base_url: None, api_key: None },
});
let resp = client.chat(ChatRequest {
    model: "deepseek/deepseek-chat".into(),
    messages: vec![("user".into(), "Hello".into())],
}).await?;
println!("{}", resp.text);
# Ok(())
# }
```

Environment variables recognized by `from_env()`:
- DEEPSEEK_API_KEY, DEEPSEEK_BASE_URL (default https://api.deepseek.com/v1)
- GLM_API_KEY or ZHIPU_API_KEY, GLM_BASE_URL or ZHIPU_BASE_URL (default https://open.bigmodel.cn/api/paas/v4)
- QWEN_API_KEY or ALIBABA_QWEN_API_KEY, QWEN_BASE_URL or ALIBABA_QWEN_BASE_URL (default https://dashscope.aliyun.com/compatible-mode/v1)
- KIMI_API_KEY or MOONSHOT_API_KEY, KIMI_BASE_URL or MOONSHOT_BASE_URL (default https://api.moonshot.cn/v1)

### Streaming

```rust
use futures_util::StreamExt;
use llm_gateway::{Client};
use llm_gateway::types::ChatRequest;

# async fn run() -> anyhow::Result<()> {
let client = Client::from_env();
let mut stream = client.chat_stream(ChatRequest {
    model: "qwen/qwen2-7b-instruct".into(),
    messages: vec![("user".into(), "流式测试".into())],
}).await?;

while let Some(chunk) = stream.next().await {
    let text = chunk?; // each item is a text delta
    print!("{}", text);
}
# Ok(())
# }
```

Note: The implementation uses OpenAI-compatible SSE parsing (lines beginning with `data: `). If a provider deviates, we can add provider-specific parsers.

## End-to-end flow (from minimal example to ready-to-use)

This complements the quick-start snippets above and shows the full path from configuration to calling chat, including model discovery and caching.

1) Prepare configuration (choose one)
- File locations (first existing wins):
  - Project root: ./llm-gateway.config.yaml
  - User config: ~/.config/llm-gateway/config.yaml
- Option A: Online model discovery + result cache (TTL)
  ```yaml
  # Only config controls discovery in library mode
  discover_models: true
  discover_models_ttl_secs: 600  # 10 minutes; when 0, a built-in default of 300s is used
  deepseek:
    base_url: https://api.deepseek.com/v1
  qwen:
    base_url: https://dashscope.aliyun.com/compatible-mode/v1
  glm:
    base_url: https://open.bigmodel.cn/api/paas/v4
  kimi:
    base_url: https://api.moonshot.cn/v1
  ```
- Option B: Import model list via YAML (no online discovery, no cache)
  ```yaml
  discover_models: false
  deepseek:
    base_url: https://api.deepseek.com/v1
    models:
      - deepseek-chat
      - deepseek-reasoner
  ```
Tip: Avoid committing API keys to the repo. Put them in your local config or inject at runtime.

2) Load config and create client
```rust
use llm_gateway::Client;

let client = if let Some(c) = Client::from_yaml_file_auto() {
    c
} else {
    // Fallback to environment-driven defaults for provider api_key/base_url only;
    // discovery and TTL are config-only in library mode
    Client::from_env()
};
```

3) List available models
- Local/default only (no online discovery):
```rust
let local = client.list_models();
println!("local: {:?}", local);
```
- Auto aggregation based on config (triggers discovery+cache when discover_models=true):
```rust
let auto = client.list_models_auto().await?;
println!("auto: {:?}", auto);
```
Cache semantics:
- When discover_models=true, discovery results are cached using discover_models_ttl_secs;
- When set to 0, a built-in default of 300s is used;
- Cache hit returns immediately; after TTL expiration, results are refreshed automatically;
- Discovery failures fall back to local models or built-in defaults.

4) Send a chat request
```rust
use llm_gateway::types::ChatRequest;

let resp = client.chat(ChatRequest {
    model: "glm/glm-4".into(),  // prefer the provider/model form
    messages: vec![("user".into(), "Write a 2-line haiku about Rust".into())],
}).await?;
println!("{}", resp.text);
```

5) Validate cache behavior (optional)
- First list_models_auto() triggers remote discovery and writes cache;
- Subsequent calls within TTL should be fast cache hits;
- After TTL expires, calling again refreshes remote lists;
- When network is flaky, the library falls back to cache or local lists for resilience.

Troubleshooting
- Config not found: ensure it resides in one of the two paths above, or implement your own loader;
- Model name parsing: prefer provider/model (e.g., glm/glm-4, qwen/qwen2-7b-instruct);
- 401/403 from provider: verify API key injection (via config or runtime);
- Base URL mismatch: check the provider's OpenAI-compatible endpoint.

See also: ../../docs/config.sample.yaml

## License

MIT