forge-guardrails
A Rust implementation of antoinezambelli/forge. See python version for original.
This was mostly a test of my clean-room-skill repo to see if it could manage to reproduce in rust. It somewhat succeeded, but I made a lot of tweaks to more closely match the original project.
Summary
A reliability layer for self-hosted LLM tool-calling. You give forge a set of tools; the model calls whichever it wants in whatever order. Workflow structure is opt-in — required_steps, prerequisites, and terminal_tool let you constrain the loop when you need to, but forge's guardrails (rescue parsing, retry nudges, response validation) apply with zero required steps too.
What forge-guardrails isn't:
- Not an agent orchestrator. Forge sits inside one agentic loop and makes its tool calls reliable. Multi-agent graphs, DAG planners, and cross-agent coordination are out of scope.
- Not a coding harness. Forge is domain-agnostic. If you're building a coding agent, proxy mode lifts your existing harness with forge's guardrails — no rewrite.
Three ways to use it:
-
Proxy server — Drop-in OpenAI-compatible and Anthropic-compatible proxy (
forge-guardrails-proxybinary) that sits between any client and a local model server. Applies guardrails transparently. Also accepts Anthropic Messages API requests atPOST /v1/messages, translated throughanyllm_translate. -
WorkflowRunner — Define tools, pick a backend, run structured agent loops. Forge manages the full lifecycle: system prompts, tool execution, context compaction, and guardrails. SlotWorker adds priority-queued access to a shared inference slot with auto-preemption — for multi-agent architectures where specialist workflows share a GPU slot. Best when you're building on forge directly.
-
Guardrails middleware — Use forge's reliability stack inside your own orchestration loop. You control the loop; forge validates responses, rescues malformed tool calls, and enforces required steps.
Supports Ollama, llama-server (llama.cpp), Llamafile, Anthropic, and anyllm-routed OpenAI-compatible upstreams as backends.
Status: experimental. Behavioral parity with the Python reference has been verified through the parity test suite. Review for production hardening before deployment — see Known review areas.
Provenance
- Original project:
antoinezambelli/forge
Release notes live in CHANGELOG.md.
Requirements
- Rust 1.95+
- A running LLM backend (see below)
Install
Install the proxy binary:
# macOS, using the Homebrew cask
# macOS or Linux, from crates.io
Use it with an existing OpenAI-compatible local backend:
Then point OpenAI-compatible clients at http://localhost:8081/v1.
Requests should include their own model field. The proxy does not pick a
default upstream model unless you explicitly set --model, FORGE_MODEL, or
SMALL_MODEL; managed ollama still requires --model, and managed
llamaserver / llamafile use --gguf.
Add to your Cargo.toml:
[]
= "0.1"
For development:
The Makefile wraps common development and eval commands. make build builds
all targets with the default classifier feature; override with
FEATURES="" for a no-feature build.
The forge/ submodule contains the Python reference for fixture generation and parity checks. Initialize it with:
Release
Release is tag-driven. After the version in Cargo.toml is ready and main
is pushed, create and push a matching tag:
The release workflow verifies the tag matches the crate version, runs format,
clippy, tests, cargo package, and cargo publish, builds platform archives
for forge-guardrails-proxy, publishes the GitHub release, then updates
whit3rabbit/homebrew-tap when HOMEBREW_TAP_TOKEN is configured. Users can
install the cask with:
Backend setup (pick one)
llama-server (recommended — top eval configs all run on llama-server):
Recommended model: mistralai_Ministral-3-8B-Instruct-2512-Q8_0.gguf (bartowski / HuggingFace)
# Install from https://github.com/ggml-org/llama.cpp/releases
Ollama (alternative — easier setup):
# Install from https://ollama.com/download
Anthropic (API, no local GPU needed):
See Backend Setup for full instructions.
Quick Start
Run the proxy server as a reliability layer between your client agent and the LLM backend.
- Run the proxy on default port
8081pointing to your local LLM backend: - Run on a different port by specifying the
--portflag: - Run with tool-output compression using the
--tool-output-compressionflag to automatically compress prior tool results: - Run with proxy input redaction using the
--redact-secretsflag to redact selected request text before it reaches the upstream model: - Run with classifier/validator using the
--classifyflag to score model tool calls against the verifier ONNX model:
Client & Backend Integration
Configure your agent clients or model backends to route through the proxy.
Claude Code Env Variables
Set the Anthropic base URL environment variable to point to the proxy:
# Or your actual key if proxying to Anthropic
Backends (llama-server / LM Studio / Ollama)
Point the proxy's --backend-url to your running model server:
- llama-server (default port 8080):
--backend-url http://localhost:8080 - LM Studio (default port 1234):
--backend-url http://localhost:1234 - Ollama (default port 11434):
--backend-url http://localhost:11434
Library Usage (Rust)
For direct library integration, use the WorkflowRunner in your Rust code:
use ;
use HashMap;
async
For multi-step workflows, multi-turn conversations, and backend auto-management, see Eval Guide and Backend Setup.
Proxy Server
Drop-in OpenAI-compatible (and Anthropic-compatible) proxy that sits between any client and a local model server. Point your client at the proxy and forge applies its guardrails transparently.
This is the path for using forge with an existing harness (opencode, Continue, aider, Cline, anything that speaks the OpenAI chat-completions schema). No rewrite.
# External mode — you manage the backend, forge proxies it.
# Managed mode — forge starts the backend and proxy together.
# Optional ONNX tool-call classifier shortcut.
# Prefetch the quantized classifier artifact and print its location.
# Convenience launcher for the recommended Ministral GGUF.
The launcher uses managed llamaserver mode, verifies the GGUF path,
requires llama-server on PATH, checks that the proxy and backend ports are
free, and reuses an existing proxy binary from PATH, CARGO_TARGET_DIR, or
target/. If no binary is found, it falls back to cargo build.
Without an explicit path it searches for
mistralai_Ministral-3-8B-Instruct-2512-Q8_0.gguf; set FORGE_MODELS_DIR or
MODELS_DIR to point at your model directory. Defaults are proxy port 8081
and managed backend port 8080; override them with FORGE_PROXY_PORT and
FORGE_BACKEND_PORT. Press Ctrl+C to stop the proxy and its managed
llama-server backend.
The --classify shortcut is opt-in and requires building with
--features classifier. It downloads the pinned quantized tool-call ONNX
classifier if needed, stores it outside target/, enables advisory mode, and
prints the artifact directory during startup. By default it uses
FORGE_CLASSIFIER_CACHE_DIR, then XDG_CACHE_HOME, then
$HOME/.cache/forge-guardrails/classifiers. Use --classifier-dir to provide
an explicit artifact directory.
Secret input redaction is compiled in by default through the secrets-scanner
Cargo feature, but runtime redaction is off unless you pass
--redact-secrets or set FORGE_REDACT_SECRETS=true. When enabled, Forge
redacts proxy-bound input before upstream forwarding: OpenAI and Anthropic
message text, tool-result text, and prior assistant tool-call argument payloads.
It does not redact LLM responses, tool names, tool IDs, roles, model names, or
tool schemas. Builds made with --no-default-features reject
--redact-secrets and FORGE_REDACT_SECRETS=true.
Tool-output compression is enabled by default. It mutates only prior tool-result
content before forwarding a request upstream; tool calls, tool IDs, arguments,
and final responses are left unchanged. Start conservatively with safe or
standard (the default); dictionary compression requires explicit aggressive mode.
See Tool Output Compression for modes, request-level
_forge overrides, and method details.
Then configure OpenAI-compatible clients to use http://localhost:8081/v1 as the API base URL. Anthropic-compatible clients should use http://localhost:8081; the proxy accepts Anthropic Messages API requests at POST /v1/messages. Requests without model are rejected unless you explicitly configured a fallback with --model, FORGE_MODEL, or SMALL_MODEL.
Backend compatibility:
- Managed mode spins up the backend for you. Supported backends:
llamaserver,llamafile,ollama(use--gguffor GGUF-based backends, or--modelfor Ollama). - External mode is backend-agnostic — forge talks
POST /v1/chat/completionsto whatever you point--backend-urlat, as long as it speaks the OpenAI schema. Tool calls must come back in OpenAItool_callsformat or in one of forge's rescue-parsed formats (Mistral[TOOL_CALLS], Qwen<tool_call>XML, fenced JSON). - Anthropic-compatible inbound uses
anyllm_translatefor Anthropic/OpenAI conversion by default. With--backend-protocol anthropic, external mode sends Anthropic Messages requests to an Anthropic-shape downstream. Path 1 preserves block-levelcache_controlonly on clean calls; retries, compaction, and context warnings rebuild the request and drop block metadata. Path 2 drops Anthropic-only block metadata at the OpenAI boundary. - Env-routed mode remains a Rust extension for Docker/provider routing. If neither
--backend-urlnor--backendis passed, the binary uses existing anyllm/provider env vars such asPROXY_CONFIG,OPENAI_BASE_URL, andBACKEND.
This proxy does not enforce inbound authentication. Do not expose it publicly without a reverse proxy, network policy, or another auth layer.
What proxy mode fortifies
On every POST /v1/chat/completions, forge applies (in order):
- Response validation — each tool call in the model's response is checked against the
toolsarray in the request. Calls to unknown tool names or with malformed shapes are caught before the response returns to your client. - Rescue parsing — when the model emits tool calls in the wrong format (JSON in a code fence, Mistral's
[TOOL_CALLS]name{args}, Qwen's<tool_call>...\</tool_call>XML), forge extracts the structured call and re-emits it in the canonical OpenAItool_callsschema. - Retry loop with error tracking — if validation fails, forge retries inference up to
--max-retries(default 3) with a corrective tool-result message on the canonical channel, rather than returning a malformed response. - Synthetic
respondtool injection — when tools are present in the request, forge injects a syntheticrespondtool the model calls instead of producing bare text. Therespondcall is stripped from the outbound response — the client sees a normal text response (finish_reason: "stop") and never knows the tool exists. Essential for small local models (~8B) that can't reliably choose between text and tool calls.
What proxy mode does not do
Proxy mode is single-shot per request; some forge features need multi-turn workflow state that the OpenAI chat-completions schema doesn't carry:
- Prerequisite enforcement and step-ordering — these need a workflow definition spanning turns. Available in
WorkflowRunner. - Context window management — proxy mode does not choose or maintain the client's rolling message window. Default tool-output compression can rewrite prior tool-result content, and dedup can use an explicit
session_id, but the client still owns conversation memory. - VRAM-aware budget detection — opt in with
--budget-mode forge-fullor--budget-mode forge-fast; otherwise proxy uses the backend's reported budget. Env-routed mode can also useFORGE_CONTEXT_TOKENS.
Useful flags
| Flag | Default | Purpose |
|---|---|---|
--backend-url URL |
— | External OpenAI-compatible backend |
--backend {llamaserver,llamafile,ollama} |
— | Managed backend type |
--model MODEL |
— | Model name, required for ollama |
--gguf PATH |
— | GGUF path, required for llamaserver / llamafile |
--backend-port N |
8080 |
Managed backend port |
--host HOST |
127.0.0.1 |
Proxy bind host in CLI mode |
--port N |
8081 |
Proxy bind port |
--max-retries N |
3 |
Retry budget per validation failure |
--classify |
off | Enable the quantized tool-call ONNX classifier in advisory mode |
--classify-download |
off | Download the quantized tool-call ONNX classifier and exit |
--redact-secrets |
off | Redact selected proxy-bound input before upstream forwarding |
--tool-output-compression {disabled,safe,standard,aggressive} |
standard |
Compress prior tool-result content before upstream forwarding |
--tool-output-compression-method {lzw,repair,auto} |
lzw |
Aggressive dictionary method |
--no-rescue |
rescue on | Disable rescue parsing |
--budget-mode {backend,manual,forge-full,forge-fast} |
backend |
Context budget source |
--budget-tokens N |
— | Manual token budget |
--serialize / --no-serialize |
auto | Force request serialization |
--extra-flags -- FLAG VALUE ... |
— | Pass additional flags to the managed backend |
Useful environment variables (Docker / env-routed mode)
| Variable | Default | Purpose |
|---|---|---|
FORGE_HOST |
0.0.0.0 |
Bind address |
FORGE_PORT / PORT / LISTEN_PORT |
8081 |
Forge proxy listen port |
FORGE_MODEL / SMALL_MODEL |
(none) |
Optional fallback model when a request omits model |
FORGE_CONTEXT_TOKENS |
128000 |
Token budget |
FORGE_MAX_RETRIES |
3 |
Retry budget per validation failure |
FORGE_RESCUE_ENABLED |
true |
Enable rescue parsing |
FORGE_SERIALIZE_REQUESTS |
false |
Force request serialization |
FORGE_SENTRY_ENABLED |
false |
Opt in to Sentry crash and aggregate guardrail telemetry |
FORGE_CLASSIFIER_CACHE_DIR |
platform cache | User-facing classifier download cache root |
FORGE_CLASSIFIER_DIR |
— | Local ONNX tool-call classifier artifact directory |
FORGE_CLASSIFIER_MODE |
shadow |
disabled, shadow, advisory, or enforce |
FORGE_CLASSIFIER_MODEL |
quantized |
quantized or full classifier ONNX file |
FORGE_REDACT_SECRETS |
false |
Redact selected proxy-bound input before upstream forwarding |
FORGE_TOOL_OUTPUT_COMPRESSION |
standard |
disabled, safe, standard, or aggressive |
FORGE_TOOL_OUTPUT_COMPRESSION_METHOD |
lzw |
lzw, repair, or auto; used only by aggressive mode |
FORGE_START_SIDECAR |
Docker: auto | Start the internal anyllm sidecar in Docker |
ANYLLM_LISTEN_PORT |
Docker: 3000 |
Internal anyllm sidecar port; do not publish it |
FORGE_SIDECAR_API_KEY / PROXY_API_KEYS |
generated | Shared Forge-to-sidecar key in Docker |
BACKEND |
openai |
anyllm provider id or first-party backend |
OPENAI_BASE_URL |
— | Route to a local OpenAI-compatible backend |
OPENAI_API_KEY |
— | API key forwarded to the upstream |
Existing anyllm env and config are still honored, including provider API keys, PROXY_CONFIG, BIG_MODEL, SMALL_MODEL, and LiteLLM aliases such as LITELLM_CONFIG.
FORGE_SENTRY_ENABLED=true enables Sentry for the proxy binary only. Sentry
events are limited to scrubbed crashes and aggregate guardrail signals such as
classifier labels, retry exhaustion reasons, counts, and tool names. Prompts,
messages, headers, request bodies, tool arguments, tool outputs, and final
responses are not sent. Use FORGE_TRAINING_CAPTURE_LOG or
FORGE_CLASSIFIER_LOG for private local JSONL training/eval examples.
Docker
You can run the Forge proxy as a Docker container. The image starts Forge plus an internal anyllm sidecar by default, and exposes only the Forge proxy port (8081) to clients. The sidecar is an upstream hop from Forge to anyllm; do not publish the sidecar port.
Build the image locally:
The default Dockerfile builds the normal proxy image without ONNX classifier
support. Use Dockerfile.classifier when you want the quantized tool-call
classifier artifact downloaded into the image and loaded on proxy startup:
The classifier image sets:
FORGE_CLASSIFIER_DIR=/opt/forge/classifiers/tool-call/onnx
FORGE_CLASSIFIER_MODE=advisory
FORGE_CLASSIFIER_MODEL=quantized
Set FORGE_CLASSIFIER_MODE=disabled at runtime to use the classifier image as a
plain proxy. This image bundles only the ONNX classifier artifact; it does not
bundle a GGUF or provider LLM.
After publishing, replace forge-guardrails:local in these examples with followthewhit3rabbit/forge-guardrails:latest.
Run with OpenAI through the internal anyllm sidecar:
Run the classifier-ready image the same way:
The entrypoint generates a private Forge-to-sidecar key unless you set FORGE_SIDECAR_API_KEY or PROXY_API_KEYS. It starts the sidecar with the upstream provider environment, then starts Forge with OPENAI_API_KEY set to the sidecar key and --backend-url http://127.0.0.1:3000.
Start Ollama on the host in another terminal:
Then run the proxy container:
Start llama-server on the host in another terminal:
Then run the proxy container:
On Linux Docker engines that do not define host.docker.internal, add:
--add-host=host.docker.internal:host-gateway
Smoke the running proxy:
OpenAI-compatible clients should use:
base_url: http://localhost:8081/v1
api_key: dummy
model: qwen2.5-coder:14b
Claude Code can use the same Docker proxy through Forge's Anthropic-compatible endpoint:
Do not add /v1 to ANTHROPIC_BASE_URL; Claude Code sends Anthropic Messages requests and Forge serves those at /v1/messages. If you want Claude Code's model picker to query Forge's /v1/models endpoint, set CLAUDE_CODE_ENABLE_GATEWAY_MODEL_DISCOVERY=1.
Publish to Docker Hub as followthewhit3rabbit/forge-guardrails:
Override VERSION, IMAGE, PLATFORMS, or BUILDER when publishing a different tag or registry.
Backends
| Backend | Best for | Native FC? |
|---|---|---|
| Ollama | Easiest setup, model management built-in | Yes |
| llama-server | Best performance, full control | Yes (with --jinja) |
| Llamafile | Single binary, zero dependencies | No (prompt-injected) |
| Anthropic | Frontier baseline, hybrid workflows | Yes |
| anyllm runtime | In-process provider routing, OpenAI-compatible | Provider-dependent |
| anyllm sidecar | Separate process; admin UI, cache, metrics | Provider-dependent |
See Backend Setup for installation details.
macOS / Apple Silicon
Apple Silicon is supported through all backends. Ollama can be installed with Homebrew or the official macOS download. llama.cpp / llama-server can be installed with Homebrew or a Metal-enabled release build. Llamafile works on macOS as a downloaded binary after chmod +x.
Managed llama.cpp and llamafile startup passes -ngl 999; on macOS that uses Metal rather than CUDA. Automatic Ollama context budgets use Rust VRAM tiers: <24 GB → 4096 tokens, 24–47 GB → 32768 tokens, ≥48 GB → 262144 tokens.
MLX is supported as an optional eval path on macOS through an OpenAI-compatible server such as mlx_lm.server, routed by AnyLlmRuntimeClient or AnyLlmProxyClient. It is not a managed ServerManager backend. Prefer llama-server for parity runs; treat GGUF-on-MLX as experimental.
Running Tests
# Parity suite only (requires the Python golden fixture)
# With coverage (requires cargo-llvm-cov)
Regenerate the Python golden fixture after intentional reference-behavior changes:
Eval Harness
The eval harness measures how reliably a model + backend combo navigates multi-step tool-calling workflows. See Eval Guide for full CLI reference.
# 10-run release benchmark without classifier, with resource baseline enabled.
# 10-run release benchmark with classifier, with resource baseline enabled.
# Fast smoke variants, also with resource baseline enabled.
# Managed local smoke without the classifier
# Managed local smoke with the user-cache classifier shortcut.
# Downloads or validates the quantized tool-call artifact before the proxy starts.
# Python oracle against a running Rust proxy
# Native Rust smoke runner
Common Makefile overrides:
The Rust smoke runner supports basic_2step, sequential_3step, and error_recovery scenarios and emits JSONL for quick CI/smoke checks.
Project Structure
src/
lib.rs Public API re-exports
error.rs ForgeError hierarchy
server.rs setup_backend(), ServerManager, BudgetMode
classifier_download.rs Classifier artifact download logic (--features classifier)
tool_output.rs Tool-output compression pipeline (safe / standard / aggressive)
tool_policy.rs Per-request allowed/blocked tool sets and prerequisite policy
core/
message.rs Message, MessageRole, MessageType, MessageMeta, ToolCallInfo
tool_spec.rs ToolSpec, ToolDef, ParamModel — tool schema and callable defs
workflow.rs Workflow model, terminal tools, prerequisites
steps.rs StepTracker, step tracking and required-step state
inference.rs run_inference() — shared front half (compact, fold, validate, retry)
runner.rs WorkflowRunner — the agentic loop
slot_worker.rs SlotWorker — priority-queued slot access
guardrails/
guardrails.rs Guardrails facade — applies the full stack in foreign loops
nudge.rs Nudge dataclass
response_validator.rs ResponseValidator, ValidationResult
step_enforcer.rs StepEnforcer, StepCheck, StepPrerequisite
error_tracker.rs ErrorTracker
scoring.rs ScoringPipeline, ScoringExecutor — async classifier dispatch
scoring_context.rs ScoringContext — serialized input for ONNX scorer
classifier_artifact.rs Artifact loader, manifest validation, threshold policy
onnx_scorer.rs OnnxToolCallScorer, OnnxFinalResponseScorer (--features classifier)
history.rs Events timeline for validation results and violations
policy.rs Allowed/blocked tool policy based on sequence prerequisites
clients/
base.rs LLMClient trait, ChunkType, StreamChunk, LLMCallInfo, TokenUsage
sampling.rs Model sampling defaults, MODEL_SAMPLING_DEFAULTS
anthropic/ AnthropicClient (frontier baseline, native FC)
llamafile/ LlamafileClient (native FC or prompt-injected)
ollama/ OllamaClient (native FC)
anyllm_proxy.rs AnyLlmRuntimeClient, AnyLlmProxyClient
context/
manager.rs ContextManager, CompactEvent
strategies.rs NoCompact, TieredCompact, SlidingWindowCompact
hardware.rs HardwareProfile, detect_hardware()
prompts/
mod.rs Tool prompt builders (prompt-injected path)
nudges.rs Retry, step-enforcement, and semantic classifier nudge templates
parse_strategies.rs Rescue parsing: Mistral, Qwen, fenced JSON
tools/
respond.rs Synthetic respond tool (respond_tool(), respond_spec())
proxy/
handler.rs Request handler — bridge between HTTP and run_inference
proxy.rs OpenAI messages ↔ forge Messages conversion, SSE helpers
server.rs HTTPServer — axum HTTP/SSE server
bin/
forge-guardrails-proxy.rs CLI proxy entry point
download-classifier.rs Standalone artifact downloader for eval / training paths
forge-eval/ Native Rust eval smoke runner
model/
README.md Artifact repository index and download commands
MODEL.md Full model card: training config, metrics, labels, thresholds
tests/
parity/ Python-generated golden fixtures for Rust parity tests
parity_tests.rs Rust assertions against python_golden.json
engine_tests.rs WorkflowRunner / inference integration tests
guardrails_tests.rs Guardrails and step enforcement tests
compact_tests.rs Compaction strategy tests
context_tests.rs ContextManager tests
*_tests.rs Unit and integration tests per subsystem
scripts/
eval_openai_proxy.py Python eval oracle wrapper for Rust proxy checks
docs/
CLEANROOM.md Clean-room run summary and parity review
PARITY.md Parity contract and subsystem alignment status
EVAL_GUIDE.md Eval harness CLI reference
BACKEND_SETUP.md Backend installation and server setup
COMPRESSION.md Tool-output compression modes and request overrides
Public API Surface
The crate re-exports the main building blocks from src/lib.rs:
use ;
Usage Modes
1. Workflow runner
Use WorkflowRunner when you want the library to manage the LLM loop: system prompt construction, message folding, validation, retries, tool execution, context compaction, and terminal-tool detection.
2. Guardrails middleware
Use the guardrail primitives directly when you already own the orchestration loop but want validation and policy enforcement.
Relevant pieces:
Guardrails— composable facade for the full stackResponseValidator/ValidationResultStepEnforcer/StepCheck/StepPrerequisiteErrorTrackerretry_nudge,step_nudge,prerequisite_nudge,unknown_tool_nudge,classifier_nudgeScoringPipeline/ScoringExecutor— async classifier dispatch for shadow, advisory, and enforce modesScoringContext— serializes workflow state into the canonicaltoolcall-verifier-input/v1format for the ONNX scorer
The ONNX tool-call verifier and final-response verifier are built with --features classifier. Both start in shadow mode and are promoted only after eval replay proves safety. See model/README.md for artifact contracts, labels, thresholds, and promotion criteria.
3. OpenAI-compatible proxy / server layer
Use the proxy and HTTP server pieces when you need an OpenAI-compatible request/response boundary around a backend.
Relevant pieces:
openai_to_messages,tool_calls_to_openai,text_response_to_openaitext_to_sse_events,tool_calls_to_sse_eventshandle_chat_completions,handle_anthropic_messagesHTTPServer,ServerManager,setup_backend
4. anyllm runtime and sidecar integration
Use AnyLlmRuntimeClient for in-process anyllm provider routing (no HTTP overhead; Forge still owns interception and nudging):
use AnyLlmRuntimeClient;
let client = from_multi_config
.with_context_length;
Use AnyLlmProxyClient when you prefer to run anyllm_proxy as a separate sidecar process. The sidecar URL is an upstream hop from Forge to anyllm, not the public client-facing Forge proxy URL. Keep the sidecar private and expose only the Forge proxy unless you intentionally need direct anyllm access.
use AnyLlmProxyClient;
let client = new
.with_base_url
.with_api_key
.with_context_length;
Both clients expose provider observability through LLMClient::last_call_info(). Cost estimates, routing metadata, cache state, and rate-limit details come from anyllm runtime or sidecar metadata; Forge does not maintain separate pricing logic.
Testing Scope
- 487+ passing tests across 16 test files
- Deterministic parity suite against
tests/parity/fixtures/python_golden.json - Classifier tests (
--test classifier_tests) cover artifact loading, serializer parity, ONNX scorer output, and scoring pipeline routing - 0 contamination incidents in the clean-room run
Keep tests deterministic where possible. Backend integration tests use mock servers (via mockito) unless they intentionally qualify a live backend. Classifier tests require --features classifier and the pinned ONNX artifact; they are gated separately from the core test suite.
Known Review Areas Before Release
The implementation should be reviewed for protocol correctness and production hardening before publication or deployment. Behavioral parity with the Python reference is covered by the parity test suite; the following areas need additional protocol and integration review:
- tool-call ID pairing across assistant tool calls and tool results
- transcript validity after guardrail-blocked steps
- compaction behavior around tool-call / tool-result groups
- true progressive streaming behavior for each backend
- HTTP parsing and CORS/header handling if exposed beyond local development
- backend startup ordering and context-budget discovery
- serialization behavior for OpenAI, Ollama, and Anthropic formats
Python Parity
The parity suite compares Rust behavior to synthetic golden outputs generated by the Python reference submodule. The source of truth for fixture generation is tests/parity/generate_fixtures.py; the checked-in output is tests/parity/fixtures/python_golden.json; Rust assertions live in tests/parity_tests.rs.
When updating parity behavior:
- Add or update the Python fixture first.
- Regenerate
tests/parity/fixtures/python_golden.json. - Add or update the matching Rust assertion in
tests/parity_tests.rs. - Run
cargo test --test parity_testsbefore broader repo gates.
See docs/PARITY.md for the full parity contract.
Relationship to Upstream Forge
The upstream Forge project is a Python reliability layer for self-hosted LLM tool-calling and multi-step agentic workflows. This repository is a Rust implementation inspired by that project's behavior — not a direct source translation — and has been verified for full behavioral parity with the Python reference through the parity test suite.
The Python reference is included as the forge/ git submodule for use in fixture generation and parity checks.
Use the upstream repository for the original Python implementation, documentation, paper citation, and release history:
The forge guardrail framework and ablation study are published as:
Zambelli, A. Forge: A Reliability Layer for Self-Hosted LLM Tool-Calling. https://doi.org/10.1145/3786335.3813193
License
MIT — Rust implementation copyright (c) 2025-2026 whit3rabbit.
The upstream Forge project is separately licensed by its author as MIT as well. Preserve upstream attribution and review license compatibility before redistribution.