# Onwards
[](https://crates.io/crates/onwards)
[](https://docs.rs/onwards)
[](https://github.com/doublewordai/onwards)
A Rust-based AI Gateway that provides a unified interface for routing requests
to openAI compatible targets. The goal is to be as 'transparent' as possible.
## Quickstart
Create a `config.json` file with your target configurations:
```json
{
"targets": {
"gpt-4": {
"url": "https://api.openai.com",
"onwards_key": "sk-your-openai-key",
"onwards_model": "gpt-4"
},
"claude-3": {
"url": "https://api.anthropic.com",
"onwards_key": "sk-ant-your-anthropic-key"
},
"local-model": {
"url": "http://localhost:8080"
}
}
}
```
Start the gateway:
```bash
cargo run -- -f config.json
```
Modifying the file will automatically & atomically reload the configuration (to
disable, set the `--watch` flag to false).
### Configuration Options
- `url`: The base URL of the AI provider
- `onwards_key`: API key to include in requests to the target (optional)
- `onwards_model`: Model name to use when forwarding requests (optional)
- `keys`: Array of API keys required for authentication to this target (optional)
- `rate_limit`: Rate limiting configuration with `requests_per_second` and `burst_size` (optional)
- `concurrency_limit`: Concurrency limiting configuration with `max_concurrent_requests` (optional)
- `upstream_auth_header_name`: Custom header name for upstream authentication (optional, defaults to "Authorization")
- `upstream_auth_header_prefix`: Custom prefix for upstream authentication header value (optional, defaults to "Bearer ")
- `rate_limit`: Configuration for per-target rate limiting (optional)
- `requests_per_second`: Number of requests allowed per second
- `burst_size`: Maximum burst size of requests
- `response_header`: Key-value pairs to add or override headers in the response (optional)
## Usage
### Command Line Options
- `--targets <file>`: Path to configuration file (required)
- `--port <port>`: Port to listen on (default: 3000)
- `--watch`: Enable configuration file watching for hot-reloading (default: true)
- `--metrics`: Enable Prometheus metrics endpoint (default: true)
- `--metrics-port <port>`: Port for Prometheus metrics (default: 9090)
- `--metrics-prefix <prefix>`: Prefix for metrics (default: "onwards")
### API Usage
### List Available Models
Get a list of all configured targets, in the openAI models format:
```bash
curl http://localhost:3000/v1/models
```
### Sending requests
Send requests to the gateway using the standard OpenAI API format:
```bash
curl -X POST http://localhost:3000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4",
"messages": [{"role": "user", "content": "Hello!"}]
}'
```
### Model Override Header
Override the target using the `model-override` header:
```bash
curl -X POST http://localhost:3000/v1/chat/completions \
-H "model-override: claude-3" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4",
"messages": [{"role": "user", "content": "Hello!"}]
}'
```
This is also used for routing requests without bodies - for example, to get the
embeddings usage for your organization:
```bash
curl -X GET http://localhost:3000/v1/organization/usage/embeddings \
-H "model-override: claude-3"
```
### Metrics
To enable Prometheus metrics, start the gateway with the `--metrics` flag, then
access the metrics endpoint by:
```bash
curl http://localhost:9090/metrics
```
## Authentication
Onwards supports bearer token authentication to control access to your AI
targets. You can configure authentication keys both globally and per-target.
### Global Authentication Keys
Global keys apply to all targets that have authentication enabled:
```json
{
"auth": {
"global_keys": ["global-api-key-1", "global-api-key-2"]
},
"targets": {
"gpt-4": {
"url": "https://api.openai.com",
"onwards_key": "sk-your-openai-key",
"keys": ["target-specific-key"]
}
}
}
```
### Per-Target Authentication
You can also specify authentication keys for individual targets:
```json
{
"targets": {
"secure-gpt-4": {
"url": "https://api.openai.com",
"onwards_key": "sk-your-openai-key",
"keys": ["secure-key-1", "secure-key-2"]
},
"open-local": {
"url": "http://localhost:8080"
}
}
}
```
In this example:
- `secure-gpt-4` requires a valid bearer token from the `keys` array
- `open-local` has no authentication requirements
If both global and local keys are supplied, either global or local keys will be
valid for accessing models with local keys.
### How Authentication Works
When a target has `keys` configured, requests must include a valid
`Authorization: Bearer <token>` header where `<token>` matches one of the
configured keys. If global keys are configured, they are automatically added to
each target's key set.
**Successful authenticated request:**
```bash
curl -X POST http://localhost:3000/v1/chat/completions \
-H "Authorization: Bearer secure-key-1" \
-H "Content-Type: application/json" \
-d '{
"model": "secure-gpt-4",
"messages": [{"role": "user", "content": "Hello!"}]
}'
```
**Failed authentication (invalid key):**
```bash
curl -X POST http://localhost:3000/v1/chat/completions \
-H "Authorization: Bearer wrong-key" \
-H "Content-Type: application/json" \
-d '{
"model": "secure-gpt-4",
"messages": [{"role": "user", "content": "Hello!"}]
}'
# Returns: 401 Unauthorized
```
**Failed authentication (missing header):**
```bash
curl -X POST http://localhost:3000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "secure-gpt-4",
"messages": [{"role": "user", "content": "Hello!"}]
}'
# Returns: 401 Unauthorized
```
**No authentication required:**
```bash
curl -X POST http://localhost:3000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "open-local",
"messages": [{"role": "user", "content": "Hello!"}]
}'
# Success - no authentication required for this target
```
## Upstream Authentication Configuration
By default, Onwards sends upstream API keys using the standard `Authorization:
Bearer <key>` header format. However, some AI providers use different
authentication header formats. You can customize both the header name and
prefix per target.
### Custom Header Name
Some providers use custom header names for authentication:
```json
{
"targets": {
"custom-api": {
"url": "https://api.custom-provider.com",
"onwards_key": "your-api-key-123",
"upstream_auth_header_name": "X-API-Key"
}
}
}
```
This sends: `X-API-Key: Bearer your-api-key-123`
### Custom Header Prefix
Some providers use different prefixes or no prefix at all:
```json
{
"targets": {
"api-with-prefix": {
"url": "https://api.provider1.com",
"onwards_key": "token-xyz",
"upstream_auth_header_prefix": "ApiKey "
},
"api-without-prefix": {
"url": "https://api.provider2.com",
"onwards_key": "plain-key-456",
"upstream_auth_header_prefix": ""
}
}
}
```
This sends:
- To provider1: `Authorization: ApiKey token-xyz`
- To provider2: `Authorization: plain-key-456`
### Combining Custom Name and Prefix
You can customize both the header name and prefix:
```json
{
"targets": {
"fully-custom": {
"url": "https://api.custom.com",
"onwards_key": "secret-key",
"upstream_auth_header_name": "X-Custom-Auth",
"upstream_auth_header_prefix": "Token "
}
}
}
```
This sends: `X-Custom-Auth: Token secret-key`
### Default Behavior
If these options are not specified, Onwards uses the standard OpenAI-compatible format:
```json
{
"targets": {
"standard-api": {
"url": "https://api.openai.com",
"onwards_key": "sk-openai-key"
}
}
}
```
This sends: `Authorization: Bearer sk-openai-key`
## Rate Limiting
Onwards supports per-target rate limiting using a token bucket algorithm. This
allows you to control the request rate to each AI provider independently.
### Configuration
Add rate limiting to any target in your `config.json`:
```json
{
"targets": {
"rate-limited-model": {
"url": "https://api.provider.com",
"key": "your-api-key",
"rate_limit": {
"requests_per_second": 5.0,
"burst_size": 10
}
}
}
}
```
### How It Works
We use a "Token Bucket Algorithm": Each target gets its own token bucket.Tokens
are refilled at a rate determined by the "requests_per_second" parameter. The
maximum number of tokens in the bucket is determined by the "burst_size"
parameter. When the bucket is empty, requests to that target will be rejected
with a `429 Too Many Requests` response.
### Examples
```json
// Allow 1 request per second with burst of 5
"rate_limit": {
"requests_per_second": 1.0,
"burst_size": 5
}
// Allow 100 requests per second with burst of 200
"rate_limit": {
"requests_per_second": 100.0,
"burst_size": 200
}
```
Rate limiting is optional - targets without `rate_limit` configuration have no
rate limiting applied.
## Per-API-Key Rate Limiting
In addition to per-target rate limiting, Onwards supports individual rate
limits for different API keys. This allows you to provide different service
tiers to your users - for example, basic users might have lower limits while
premium users get higher limits.
### Configuration
Per-key rate limiting uses a `key_definitions` section in the auth configuration:
```json
{
"auth": {
"global_keys": ["fallback-key"],
"key_definitions": {
"basic_user": {
"key": "sk-user-12345",
"rate_limit": {
"requests_per_second": 10,
"burst_size": 20
}
},
"premium_user": {
"key": "sk-premium-67890",
"rate_limit": {
"requests_per_second": 100,
"burst_size": 200
}
},
"enterprise_user": {
"key": "sk-enterprise-abcdef",
"rate_limit": {
"requests_per_second": 500,
"burst_size": 1000
}
}
}
},
"targets": {
"gpt-4": {
"url": "https://api.openai.com",
"onwards_key": "sk-your-openai-key",
"keys": ["basic_user", "premium_user", "enterprise_user", "fallback-key"]
}
}
}
```
### Priority Order
Rate limits are checked in this order:
1. **Per-key rate limits** (if the API key has limits configured)
2. **Per-target rate limits** (if the target has limits configured)
If either limit is exceeded, the request returns `429 Too Many Requests`.
### Usage Examples
**Basic user request (10/sec limit):**
```bash
curl -X POST http://localhost:3000/v1/chat/completions \
-H "Authorization: Bearer sk-user-12345" \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4", "messages": [{"role": "user", "content": "Hello!"}]}'
```
**Premium user request (100/sec limit):**
```bash
curl -X POST http://localhost:3000/v1/chat/completions \
-H "Authorization: Bearer sk-premium-67890" \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4", "messages": [{"role": "user", "content": "Hello!"}]}'
```
**Legacy key (no per-key limits):**
```bash
curl -X POST http://localhost:3000/v1/chat/completions \
-H "Authorization: Bearer fallback-key" \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4", "messages": [{"role": "user", "content": "Hello!"}]}'
```
## Concurrency Limiting
In addition to rate limiting (which controls *how fast* requests are made),
Onwards supports concurrency limiting to control *how many* requests are
processed simultaneously. This is useful for managing resource usage and
preventing overload.
### Per-Target Concurrency Limiting
Limit the number of concurrent requests to a specific target:
```json
{
"targets": {
"resource-limited-model": {
"url": "https://api.provider.com",
"onwards_key": "your-api-key",
"concurrency_limit": {
"max_concurrent_requests": 5
}
}
}
}
```
With this configuration, only 5 requests will be processed concurrently for
this target. Additional requests will receive a `429 Too Many Requests`
response until an in-flight request completes.
### Per-API-Key Concurrency Limiting
You can also set different concurrency limits for different API keys:
```json
{
"auth": {
"key_definitions": {
"basic_user": {
"key": "sk-user-12345",
"concurrency_limit": {
"max_concurrent_requests": 2
}
},
"premium_user": {
"key": "sk-premium-67890",
"concurrency_limit": {
"max_concurrent_requests": 10
},
"rate_limit": {
"requests_per_second": 100,
"burst_size": 200
}
}
}
},
"targets": {
"gpt-4": {
"url": "https://api.openai.com",
"onwards_key": "sk-your-openai-key"
}
}
}
```
### Combining Rate Limiting and Concurrency Limiting
You can use both rate limiting and concurrency limiting together:
- **Rate limiting** controls how fast requests are made over time
- **Concurrency limiting** controls how many requests are active at once
```json
{
"targets": {
"balanced-model": {
"url": "https://api.provider.com",
"onwards_key": "your-api-key",
"rate_limit": {
"requests_per_second": 10,
"burst_size": 20
},
"concurrency_limit": {
"max_concurrent_requests": 5
}
}
}
}
```
### How It Works
Concurrency limits use a semaphore-based approach:
1. When a request arrives, it tries to acquire a permit
2. If a permit is available, the request proceeds (holding the permit)
3. If no permits are available, the request is rejected with `429 Too Many Requests`
4. When the request completes, the permit is automatically released
The error response distinguishes between rate limiting and concurrency limiting:
- Rate limit: `"code": "rate_limit"`
- Concurrency limit: `"code": "concurrency_limit_exceeded"`
Both use HTTP 429 status code for consistency.
## Response Headers
Onwards can include custom headers in the response for each target. These can
override existing headers or add new ones.
### Pricing
One use of this feature is to set pricing information. This means that if you
have a dynamic token price when a user's request is accepted the price is then
agreed and can be recorded in the HTTP headers.
Add pricing information to any target in your `config.json`:
```json
{
"targets": {
"priced-model": {
"url": "https://api.provider.com",
"key": "your-api-key",
"response_headers": {
"Input-Price-Per-Token": "0.0001",
"Output-Price-Per-Token": "0.0002"
}
```
## Response Sanitization
Onwards can enforce strict OpenAI API schema compliance for `/v1/chat/completions` responses. This feature:
- **Removes provider-specific fields** from responses
- **Rewrites the model field** to match what the client originally requested
- **Supports both streaming and non-streaming** responses
- **Validates responses** against OpenAI's official API schema
This is useful when proxying to non-OpenAI providers that add custom fields, or when using `onwards_model` to rewrite model names upstream.
### Enabling Response Sanitization
Add `sanitize_response: true` to any target or provider in your `config.json`:
**Single provider:**
```json
{
"targets": {
"gpt-4": {
"url": "https://api.openai.com",
"onwards_key": "sk-your-key",
"onwards_model": "gpt-4-turbo-2024-04-09",
"sanitize_response": true
}
}
}
```
**Pool with multiple providers:**
```json
{
"targets": {
"gpt-4": {
"sanitize_response": true,
"providers": [
{
"url": "https://api1.example.com",
"onwards_key": "sk-key-1"
},
{
"url": "https://api2.example.com",
"onwards_key": "sk-key-2"
}
]
}
}
}
```
### How it Works
When `sanitize_response: true` and a client requests `model: gpt-4`:
1. **Request sent upstream** with `model: gpt-4`
2. **Upstream responds** with custom fields and `model: gpt-4-turbo-2024-04-09`
3. **Onwards sanitizes**:
- Parses response using OpenAI schema (removes unknown fields)
- Rewrites `model` field to `gpt-4` (matches original request)
- Reserializes clean response
4. **Client receives** standard OpenAI response with `model: gpt-4`
### Common Use Cases
**Third-party providers** (e.g., OpenRouter, Together AI) often add extra fields:
- `provider`, `native_finish_reason`, `cost`, etc.
**Provider comparison** - normalize responses from different providers for consistent handling
**Debugging** - reduce noise by filtering to only standard OpenAI fields
### Supported Endpoints
Currently supports:
- `/v1/chat/completions` (streaming and non-streaming)
## Load Balancing
Onwards supports load balancing across multiple providers for a single alias,
with automatic failover, weighted distribution, and configurable retry behavior.
### Configuration
```json
{
"targets": {
"gpt-4": {
"strategy": "weighted_random",
"fallback": {
"enabled": true,
"on_status": [429, 5],
"on_rate_limit": true
},
"providers": [
{ "url": "https://api.openai.com", "onwards_key": "sk-key-1", "weight": 3 },
{ "url": "https://api.openai.com", "onwards_key": "sk-key-2", "weight": 1 }
]
}
}
}
```
### Strategy
- **`weighted_random`** (default): Distributes traffic randomly based on weights.
A provider with `weight: 3` receives ~3x the traffic of `weight: 1`.
- **`priority`**: Always routes to the first provider. Falls through to subsequent
providers only when fallback is triggered.
### Fallback
Controls automatic retry on other providers when requests fail:
- `enabled`: Master switch (default: false)
- `on_status`: Status codes that trigger fallback. Supports wildcards:
- `5` → all 5xx (500-599)
- `50` → 500-509
- `502` → exact match
- `on_rate_limit`: Fallback when hitting local rate limits (default: false)
When fallback triggers, the next provider is selected based on strategy
(weighted random resamples from remaining pool; priority uses definition order).
### Pool-Level Options
Settings that apply to the entire alias:
| `keys` | Access control keys for this alias |
| `rate_limit` | Rate limit for all requests to this alias |
| `concurrency_limit` | Max concurrent requests to this alias |
| `response_headers` | Headers added to all responses |
| `strategy` | `weighted_random` or `priority` |
| `fallback` | Retry configuration (see above) |
| `providers` | Array of provider configurations |
### Provider-Level Options
Settings specific to each provider:
| `url` | Provider endpoint URL |
| `onwards_key` | API key for this provider |
| `onwards_model` | Model name override |
| `weight` | Traffic weight (default: 1) |
| `rate_limit` | Provider-specific rate limit |
| `concurrency_limit` | Provider-specific concurrency limit |
| `response_headers` | Provider-specific headers |
### Examples
**Primary/backup failover:**
```json
{
"targets": {
"gpt-4": {
"strategy": "priority",
"fallback": { "enabled": true, "on_status": [5], "on_rate_limit": true },
"providers": [
{ "url": "https://primary.example.com", "onwards_key": "sk-primary" },
{ "url": "https://backup.example.com", "onwards_key": "sk-backup" }
]
}
}
}
```
**Multiple API keys with pool-level rate limit:**
```json
{
"targets": {
"gpt-4": {
"rate_limit": { "requests_per_second": 100, "burst_size": 200 },
"providers": [
{ "url": "https://api.openai.com", "onwards_key": "sk-key-1" },
{ "url": "https://api.openai.com", "onwards_key": "sk-key-2" }
]
}
}
}
```
### Backwards Compatibility
Single-provider configs still work unchanged:
```json
{ "targets": { "gpt-4": { "url": "https://api.openai.com", "onwards_key": "sk-key" } } }
```
## Testing
Run the test suite:
```bash
cargo test
```