Onwards
A Rust-based AI Gateway that provides a unified interface for routing requests to openAI compatible targets. The goal is to be as 'transparent' as possible.
Quickstart
Create a config.json file with your target configurations:
Start the gateway:
Modifying the file will automatically & atomically reload the configuration (to
disable, set the --watch flag to false).
Configuration Options
url: The base URL of the AI provideronwards_key: API key to include in requests to the target (optional)onwards_model: Model name to use when forwarding requests (optional)keys: Array of API keys required for authentication to this target (optional)
Usage
Command Line Options
--targets <file>: Path to configuration file (required)--port <port>: Port to listen on (default: 3000)--watch: Enable configuration file watching for hot-reloading (default: true)--metrics: Enable Prometheus metrics endpoint (default: true)--metrics-port <port>: Port for Prometheus metrics (default: 9090)--metrics-prefix <prefix>: Prefix for metrics (default: "onwards")
API Usage
List Available Models
Get a list of all configured targets, in the openAI models format:
Sending requests
Send requests to the gateway using the standard OpenAI API format:
Model Override Header
Override the target using the model-override header:
This is also used for routing requests without bodies - for example, to get the embeddings usage for your organization:
Metrics
To enable Prometheus metrics, start the gateway with the --metrics flag, then access the metrics endpoint by:
Authentication
Onwards supports bearer token authentication to control access to your AI targets. You can configure authentication keys both globally and per-target.
Global Authentication Keys
Global keys apply to all targets that have authentication enabled:
Per-Target Authentication
You can also specify authentication keys for individual targets:
In this example:
secure-gpt-4requires a valid bearer token from thekeysarrayopen-localhas no authentication requirements
If both global and local keys are supplied, either global or local keys will be valid for accessing models with local keys.
How Authentication Works
When a target has keys configured, requests must include a valid Authorization: Bearer <token> header where <token> matches one of the configured keys. If global keys are configured, they are automatically added to each target's key set.
Successful authenticated request:
Failed authentication (invalid key):
# Returns: 401 Unauthorized
Failed authentication (missing header):
# Returns: 401 Unauthorized
No authentication required:
# Success - no authentication required for this target
Rate Limiting
Onwards supports per-target rate limiting using a token bucket algorithm. This allows you to control the request rate to each AI provider independently.
Configuration
Add rate limiting to any target in your config.json:
How It Works
We use a "Token Bucket Algorithm": Each target gets its own token bucket.Tokens
are refilled at a rate determined by the "requests_per_second" parameter. The
maximum number of tokens in the bucket is determined by the "burst_size"
parameter. When the bucket is empty, requests to that target will be rejected
with a 429 Too Many Requests response.
Examples
// Allow 1 request per second with burst of 5
"rate_limit":
// Allow 100 requests per second with burst of 200
"rate_limit":
Rate limiting is optional - targets without rate_limit configuration have no
rate limiting applied.
Per-API-Key Rate Limiting
In addition to per-target rate limiting, Onwards supports individual rate limits for different API keys. This allows you to provide different service tiers to your users - for example, basic users might have lower limits while premium users get higher limits.
Configuration
Per-key rate limiting uses a key_definitions section in the auth configuration:
Priority Order
Rate limits are checked in this order:
- Per-key rate limits (if the API key has limits configured)
- Per-target rate limits (if the target has limits configured)
If either limit is exceeded, the request returns 429 Too Many Requests.
Usage Examples
Basic user request (10/sec limit):
Premium user request (100/sec limit):
Legacy key (no per-key limits):
Testing
Run the test suite: