sgr-agent
Pure Rust LLM client and agent framework based on Schema-Guided Reasoning (SGR) by Rinat Abdullin. No dlopen, no external binaries.
Works on iOS, Android, WASM — anywhere reqwest + rustls compiles.
Three backends
| Backend | Feature | API | Best for |
|---|---|---|---|
| openai-oxide | oxide |
Responses API | OpenAI models (fastest — HTTP/2 keep-alive, WebSocket, hedged requests) |
| genai | genai |
Chat Completions / Responses | Multi-provider (Gemini, Anthropic, OpenRouter, Ollama) |
| async-openai | async-openai-backend |
Responses API | Migration from async-openai |
Llm::new() auto-selects backend by model name:
gpt-*,o3*,o4*,chatgpt-*→ oxide (Responses API, gzip, HTTP/2 keep-alive)- Everything else → genai (multi-provider)
- Custom
base_url/ Vertex AI → always genai
let llm = new; // → oxide
let llm = new; // → genai
println!; // "oxide" or "genai"
With oxide-ws feature, upgrade to WebSocket for -20% latency in agent loops:
let oxide = from_config?;
oxide.connect_ws.await?; // all calls now go through wss://
See openai-oxide benchmarks — wins 10/13 vs Python.
Two layers
Layer 1 — LLM Client (default features: gemini, openai):
structured output, function calling, flexible parsing. Just add a dependency and call an API.
Layer 2 — Agent Framework (feature: agent):
Tool trait, registry, agent loop with loop detection, 4 agent variants, dual-model routing, retry, streaming.
Build autonomous agents that reason and act.
Quick start
# Cargo.toml
# Client only (structured output + function calling)
= "0.4"
# Full agent framework
= { = "0.2", = ["agent"] }
Structured output (client only)
use GeminiClient;
use ProviderConfig;
use JsonSchema;
use Deserialize;
async
Agent with tools
use ;
use ;
use SgrAgent;
use AgentContext;
use GeminiClient;
use ToolRegistry;
use Message;
use ProviderConfig;
use Value;
;
;
async
Features
| Feature | Default | What |
|---|---|---|
gemini |
yes | Google AI + Vertex AI backend |
openai |
yes | OpenAI + OpenRouter + Ollama backend |
oxide |
no | openai-oxide backend — fastest for OpenAI models (Responses API, HTTP/2, gzip) |
oxide-ws |
no | WebSocket mode for oxide (-20% latency on agent loops) |
genai |
no | Multi-provider via genai crate (Gemini, Anthropic, OpenRouter, Ollama) |
async-openai-backend |
no | async-openai backend for comparison/migration |
agent |
no | Full agent framework (traits, loop, registry, routing) |
session |
no | Session persistence, 4-tier loop detection, memory context, hints, tasks, intent guard |
app-tools |
no | Shared tools: bash, fs (read/write/edit), git, apply_patch |
providers |
no | Provider config (TOML), auth, CLI proxy, Codex proxy |
telemetry |
no | OTEL telemetry → Phoenix / LangSmith (OpenInference conventions) |
logging |
no | File-based JSONL logging |
search |
no | Fuzzy session search (nucleo-matcher) |
Architecture
LLM Client layer
| Module | What |
|---|---|
gemini |
Gemini client — Google AI (generativelanguage.googleapis.com) and Vertex AI (aiplatform.googleapis.com) |
openai |
OpenAI-compatible client — works with OpenAI, OpenRouter, Ollama, any compatible API |
types |
Message, ToolCall, SgrError, ProviderConfig, RateLimitInfo |
tool |
ToolDef — tool definition (name, description, JSON Schema parameters) |
schema |
json_schema_for::<T>() — derive JSON Schema from Rust types via schemars |
flexible_parser |
Extract JSON from markdown blocks, broken JSON, streaming chunks, chain-of-thought text |
coerce |
Fuzzy type coercion — "42" → 42, "true" → true, fuzzy enum matching |
Agent Framework layer (feature = "agent")
| Module | What |
|---|---|
agent |
Agent trait with decide() + lifecycle hooks (prepare_context, prepare_tools, after_action) |
agent_tool |
Tool trait — name(), description(), parameters_schema(), execute() |
agent_loop |
run_loop() — decide → execute → feed back, with 3-tier loop detection + auto-completion + sliding window |
registry |
ToolRegistry — ordered collection, case-insensitive lookup, fuzzy resolve, filtering |
context |
AgentContext — working directory, state machine, per-tool config, custom metadata |
client |
LlmClient trait — abstraction over any LLM backend |
agents/sgr |
SgrAgent — structured output via discriminated union schema |
agents/tool_calling |
ToolCallingAgent — native function calling (simplest variant) |
agents/flexible |
FlexibleAgent — text parsing with retry and error feedback (for weak models) |
agents/hybrid |
HybridAgent — 2-phase: reasoning-only FC → full toolkit with reasoning context |
agents/planning |
PlanningAgent — read-only wrapper that produces structured plans (like Claude Code plan mode) |
agents/clarification |
ClarificationTool + PlanTool — built-in system tools for interactive agents |
router |
ModelRouter — transparent dual-model routing (smart for complex, fast for simple tasks) |
retry |
RetryClient — exponential backoff with jitter, honors Retry-After headers |
factory |
AgentFactory — create agents from JSON config |
discovery |
ToolFilter — progressive tool discovery via keyword/TF-IDF scoring |
streaming |
StreamingSender/StreamingReceiver — channel-based event streaming |
schema_simplifier |
Convert JSON Schema to human-readable text (for FlexibleAgent prompts) |
union_schema |
Build discriminated union JSON Schema from tool definitions at runtime |
Agent variants
SgrAgent (structured output)
Best for capable models (Gemini 3.1 Pro, GPT-4o). Builds a discriminated union JSON Schema from your tools at runtime, sends via structured_call, parses response with flexible parser + coercion.
let agent = new;
ToolCallingAgent (native function calling)
Simplest variant. Sends tools via native FC API, gets Vec<ToolCall> back directly. Works with any model that supports function calling.
let agent = new;
FlexibleAgent (text parsing)
For weak models or text-only backends (Ollama, local models). Puts tool descriptions in the system prompt as human-readable text, parses JSON from model's free-form response. Includes retry with error feedback.
let agent = new;
HybridAgent (2-phase reasoning)
Two-phase approach: Phase 1 calls a "reasoning" tool only (think step), Phase 2 sends the full toolkit with reasoning context. Best for complex multi-step tasks.
let agent = new;
PlanningAgent (read-only plan mode)
Wraps any agent to restrict tools to a read-only subset. The agent explores the codebase, then calls submit_plan with a structured plan. Like Claude Code's plan mode.
use ;
use ;
let inner = new;
let planner = new;
let tools = new
.register
.register
.register
.register // submit_plan — produces structured Plan
.register; // ask_user — pause for questions
run_loop.await?;
// Extract the plan after completion
let plan = from_context.unwrap;
println!;
for in plan.steps.iter.enumerate
// Inject plan into build agent's context
let plan_msg = plan.to_message;
build_messages.insert;
Interactive agents (clarification)
Use run_loop_interactive when the agent may need to ask the user questions:
use run_loop_interactive;
use ClarificationTool;
let tools = new
.register
.register
.register // ask_user tool
.register;
// Async callback — called when agent needs user input
run_loop_interactive.await?;
The regular run_loop also supports WaitingForInput events but continues with a placeholder instead of pausing.
Dual-model routing
Use a smart model for complex decisions and a fast model for simple ones:
use ;
let router = new.with_config;
// Use router as any LlmClient — routing is transparent
let agent = new;
Retry with backoff
Wrap any client with automatic retry on transient errors (rate limits, 5xx, timeouts):
use ;
let client = new
.with_config;
Honors Retry-After headers from rate limit responses.
Agent loop
The loop drives the agent: decide → execute tools → feed results back → repeat.
use ;
let config = LoopConfig ;
let steps = run_loop.await?;
3-tier loop detection:
- Exact signature — same tool call sequence repeats N times
- Tool frequency — single tool dominates >90% of all calls
- Output stagnation — tool outputs are identical across steps
Auto-completion detection:
- Catches agents that finished but forgot to call
finish_task - Keyword detection ("task is complete", "all done", etc.)
- Repeated situation text (agent stuck describing same state)
Sliding window:
- Keeps first 2 messages (system + user prompt) + last N
- Inserts a summary marker where messages were trimmed
Agent lifecycle hooks
Override hooks on the Agent trait for cross-cutting concerns:
Vertex AI
let config = vertex;
// Default region: "global" (aiplatform.googleapis.com)
Flexible parser
The flexible parser extracts JSON from messy LLM output — markdown blocks, broken JSON, streaming chunks, chain-of-thought wrapping:
use ;
use JsonSchema;
use Deserialize;
// Handles: ```json {...} ```, bare JSON, broken brackets, single quotes, trailing commas
let result: Output = parse_flexible_coerced?;
Progressive discovery
Filter tools by relevance when you have many tools but want to show only the most relevant ones:
use ToolFilter;
let filter = new; // show max 5 tools
let relevant = filter.select;
// Returns: system tools (always) + top-scored tools by keyword overlap
Running the example
A full 15-tool coding agent demo is included:
# With Google AI
# With Vertex AI
The example includes: ReadFile, WriteFile, EditFile, ListDir, Bash (with 30s timeout), BackgroundTask, SearchCode, Grep, Glob, GitDiff, GitStatus, GitLog, GetCwd, ChangeDir, FinishTask.
Standalone project example
# Cargo.toml
[]
= "my-agent"
= "0.1.0"
= "2021"
[]
= { = "0.2", = ["agent", "gemini"] }
= "1"
= { = "1", = ["rt-multi-thread", "macros"] }
= "0.1"
See /tmp/my-agent for a full working standalone project.
License
MIT