Onwards
A Rust-based AI Gateway that provides a unified interface for routing requests to openAI compatible targets. The goal is to be as 'transparent' as possible.
Quickstart
Create a config.json file with your target configurations:
Start the gateway:
Modifying the file will automatically & atomically reload the configuration (to
disable, set the --watch flag to false).
Configuration Options
url: The base URL of the AI provideronwards_key: API key to include in requests to the target (optional)onwards_model: Model name to use when forwarding requests (optional)keys: Array of API keys required for authentication to this target (optional)upstream_auth_header_name: Custom header name for upstream authentication (optional, defaults to "Authorization")upstream_auth_header_prefix: Custom prefix for upstream authentication header value (optional, defaults to "Bearer ")
Usage
Command Line Options
--targets <file>: Path to configuration file (required)--port <port>: Port to listen on (default: 3000)--watch: Enable configuration file watching for hot-reloading (default: true)--metrics: Enable Prometheus metrics endpoint (default: true)--metrics-port <port>: Port for Prometheus metrics (default: 9090)--metrics-prefix <prefix>: Prefix for metrics (default: "onwards")
API Usage
List Available Models
Get a list of all configured targets, in the openAI models format:
Sending requests
Send requests to the gateway using the standard OpenAI API format:
Model Override Header
Override the target using the model-override header:
This is also used for routing requests without bodies - for example, to get the embeddings usage for your organization:
Metrics
To enable Prometheus metrics, start the gateway with the --metrics flag, then access the metrics endpoint by:
Authentication
Onwards supports bearer token authentication to control access to your AI targets. You can configure authentication keys both globally and per-target.
Global Authentication Keys
Global keys apply to all targets that have authentication enabled:
Per-Target Authentication
You can also specify authentication keys for individual targets:
In this example:
secure-gpt-4requires a valid bearer token from thekeysarrayopen-localhas no authentication requirements
If both global and local keys are supplied, either global or local keys will be valid for accessing models with local keys.
How Authentication Works
When a target has keys configured, requests must include a valid Authorization: Bearer <token> header where <token> matches one of the configured keys. If global keys are configured, they are automatically added to each target's key set.
Successful authenticated request:
Failed authentication (invalid key):
# Returns: 401 Unauthorized
Failed authentication (missing header):
# Returns: 401 Unauthorized
No authentication required:
# Success - no authentication required for this target
Upstream Authentication Configuration
By default, Onwards sends upstream API keys using the standard Authorization: Bearer <key> header format. However, some AI providers use different authentication header formats. You can customize both the header name and prefix per target.
Custom Header Name
Some providers use custom header names for authentication:
This sends: X-API-Key: Bearer your-api-key-123
Custom Header Prefix
Some providers use different prefixes or no prefix at all:
This sends:
- To provider1:
Authorization: ApiKey token-xyz - To provider2:
Authorization: plain-key-456
Combining Custom Name and Prefix
You can customize both the header name and prefix:
This sends: X-Custom-Auth: Token secret-key
Default Behavior
If these options are not specified, Onwards uses the standard OpenAI-compatible format:
This sends: Authorization: Bearer sk-openai-key
Rate Limiting
Onwards supports per-target rate limiting using a token bucket algorithm. This allows you to control the request rate to each AI provider independently.
Configuration
Add rate limiting to any target in your config.json:
How It Works
We use a "Token Bucket Algorithm": Each target gets its own token bucket.Tokens
are refilled at a rate determined by the "requests_per_second" parameter. The
maximum number of tokens in the bucket is determined by the "burst_size"
parameter. When the bucket is empty, requests to that target will be rejected
with a 429 Too Many Requests response.
Examples
// Allow 1 request per second with burst of 5
"rate_limit":
// Allow 100 requests per second with burst of 200
"rate_limit":
Rate limiting is optional - targets without rate_limit configuration have no
rate limiting applied.
Per-API-Key Rate Limiting
In addition to per-target rate limiting, Onwards supports individual rate limits for different API keys. This allows you to provide different service tiers to your users - for example, basic users might have lower limits while premium users get higher limits.
Configuration
Per-key rate limiting uses a key_definitions section in the auth configuration:
Priority Order
Rate limits are checked in this order:
- Per-key rate limits (if the API key has limits configured)
- Per-target rate limits (if the target has limits configured)
If either limit is exceeded, the request returns 429 Too Many Requests.
Usage Examples
Basic user request (10/sec limit):
Premium user request (100/sec limit):
Legacy key (no per-key limits):
Testing
Run the test suite: