ai
Simple to use LLM library for Rust with streaming, tool calling, OAuth helpers,
and a lightweight agent loop, inspired by pi.
Table of Contents
- Supported Providers
- Installation
- Quick Start
- Tools
- Image Input
- Image Generation
- Thinking/Reasoning
- Stop Reasons
- Error Handling
- APIs, Models, and Providers
- Cross-Provider Handoffs
- Context Serialization
- Browser Usage
- OAuth Providers
- Agent Core
- Development
- License
Supported Providers
- OpenAI via Chat Completions, Responses, and Images
- Anthropic via Messages
- GitHub Copilot through OAuth-backed OpenAI/Anthropic-compatible routes
- OpenRouter for image generation
- Azure Foundry and other compatible endpoints through provider handles with
explicit
base_url, headers, and compatibility settings
The active built-in stream APIs are:
openai-completionsopenai-responsesanthropic-messages
The active built-in image generation APIs are:
openai-imagesopenrouter-images
The active built-in provider handles are focused on openai, anthropic, and
github_copilot for chat, plus openai and openrouter for image generation. Azure
Foundry, llama.cpp, MLX, Ollama, vLLM, and other compatible endpoints can use
configured provider handles with explicit base_url, HTTP headers, and
compatibility settings.
Broad native provider-specific APIs outside OpenAI, Anthropic, GitHub Copilot, and custom compatible routing are not part of the active built-in provider surface. PRs to add support for additional providers are welcome.
Image generation is exposed through OpenAI-compatible image models and OpenRouter image models. Chat image input and image blocks in tool results are still supported by the regular chat APIs.
Installation
This crate uses Tokio-compatible async APIs. The examples use
#[tokio::main], which requires Tokio's macros and rt-multi-thread
features. The examples also use futures::StreamExt for stream iteration and
serde_json::json for JSON Schema values.
Quick Start
use ;
async
Streaming
Use stream_simple when the UI should update as tokens arrive. The
futures::StreamExt import is only needed for .next().await.
use StreamExt;
use ;
async
Provider Handles
Use openai::from_env() for OpenAI Responses. Use openai::builder() when
selecting Chat Completions or an OpenAI-compatible endpoint.
OpenAI Responses
use openai;
let openai_responses_from_env = from_env?;
let openai_responses_with_key = builder
.api_key
.responses
.build?;
OpenAI Chat Completions
use openai;
let openai_chat_with_key = builder
.api_key
.chat_completions
.build?;
let ollama_chat = builder
.base_url
.chat_completions
.build?;
Anthropic
use anthropic;
let anthropic_from_env = from_env?;
let anthropic_with_key = builder
.api_key
.build?;
Dynamic Provider Choice
Provider handles are trait objects when the application wants to choose a backend at runtime.
use ;
async
Tools
Tools enable LLMs to interact with external systems. This crate uses JSON Schema values for tool definitions and provides validation helpers for tool calls.
Defining Tools
use Tool;
use json;
let weather_tool = builder
.description
.parameters
.build?;
Handling Tool Calls
Tool results use content blocks and can include both text and images.
use ;
if let ToolCall = block
Streaming Tool Calls with Partial JSON
During streaming, tool call arguments are progressively parsed as they arrive. This enables real-time UI updates before the complete arguments are available.
use AssistantMessageEvent;
match event
Important notes about partial tool arguments:
- During
ToolCallDelta, arguments may be incomplete. - Fields may be missing or partially parsed.
- String values may be truncated mid-word.
- Arrays and nested objects may be incomplete.
- Always validate final tool arguments before executing external effects.
- Use
content_indexto associate events with the right assistant content block.
Validating Tool Arguments
When using the agent loop, tool arguments are validated before execution. When
implementing your own loop with stream or complete, use
validate_tool_call or validate_tool_arguments.
use ;
let validated = validate_tool_call?;
Complete Event Reference
All streaming events emitted during assistant message generation:
| Event | Description | Key Properties |
|---|---|---|
Start |
Stream begins | partial: initial assistant message structure |
TextStart |
Text block starts | content_index: position in content array |
TextDelta |
Text chunk received | delta, content_index |
TextEnd |
Text block complete | content, content_index |
ThinkingStart |
Thinking block starts | content_index |
ThinkingDelta |
Thinking chunk received | delta, content_index |
ThinkingEnd |
Thinking block complete | content, content_index |
ToolCallStart |
Tool call begins | content_index |
ToolCallDelta |
Tool arguments stream | delta, partial |
ToolCallEnd |
Tool call complete | tool_call |
Done |
Stream complete | reason, message |
Error |
Error occurred | reason, error |
Streaming events for different content blocks are not guaranteed to be
contiguous. Consumers should use content_index to associate deltas and end
events with their blocks.
Image Input
Models with vision capabilities can process images. Check model.input for
ModelInput::Image. If you pass images to a non-vision model, the message
transform layer downgrades unsupported image content to text placeholders.
use ;
let openai = from_env?;
let model = openai.model.build?;
if model.input.contains
let context = Context ;
Image Generation
Use generate_images with an OpenAI-compatible or OpenRouter image model. The
returned AssistantImages can contain text and image output blocks, matching
the selected model configuration.
Basic Image Generation
use ;
async
For llama.cpp, MLX, Ollama, or another OpenAI-compatible image endpoint, use the OpenAI provider with the compatible server's base URL. For example, with Ollama:
use ;
let ollama = builder
.provider_id
.base_url
.images
.build?;
let model = ollama.model.build_image?;
let context = builder
.text
.build;
let images = generate_images.await?;
Set OPENROUTER_API_KEY for openrouter::from_env(), or pass a key through
providers::openrouter::builder().api_key(Some("...")).
OpenRouter image models remain available through the openrouter provider:
use ;
let openrouter = from_env?;
let model = openrouter
.model
.build_image?;
let context = builder.text.build;
let images = generate_images.await?;
OpenRouter image models use conservative defaults because this crate does not ship a built-in model catalog. They default to text input and image output. If a specific OpenRouter model supports image input or text output, set those capabilities on the model builder:
use ;
let openrouter = from_env?;
let model = openrouter
.model
.input
.output
.build_image?;
let context = builder.text.build;
let images = generate_images.await?;
Notes and Limitations
The active Rust image-generation surface covers OpenAI-compatible
/images/generations models through the openai-images API and OpenRouter's
chat-completions-style image models through the openrouter-images API. The
OpenAI-compatible generations path supports text input; image edits are not
implemented yet. OpenRouter image input and text output are opt-in model
capabilities configured by the caller. Provider errors are returned as
AssistantImages with stop_reason: ImagesStopReason::Error; cancelled
requests use ImagesStopReason::Aborted.
Thinking/Reasoning
Many models support thinking or reasoning content. Check model.reasoning and
use get_supported_thinking_levels to inspect supported levels.
Unified Interface (streamSimple/completeSimple)
Rust exports these as stream_simple and complete_simple.
use ;
let anthropic = from_env?;
let model = anthropic.model.build?;
let options = SimpleStreamOptions ;
let response = complete_simple
.await?;
Provider-Specific Options (stream/complete)
stream_simple and complete_simple are the preferred app-level APIs,
They take SimpleStreamOptions, resolve the model's API, and map common
options such as reasoning, cache retention, API key, cancellation, payload
hooks, retry settings, and provider options onto the selected provider.
stream and complete are the lower-level APIs. Use them when you need the
non-simple StreamOptions shape or direct provider-option forwarding. For
provider-specific escape hatches, place fields in
StreamOptions::provider_options using provider option names such as
toolChoice, serviceTier, or thinkingDisplay.
The crate root also exports scoped direct provider stream functions:
stream_openai_completions/stream_simple_openai_completionsstream_openai_responses/stream_simple_openai_responsesstream_anthropic/stream_simple_anthropic
Provider modules expose typed provider options for direct provider calls:
providers::openai_completions::OpenAICompletionsOptionsproviders::openai_responses::OpenAIResponsesOptionsproviders::anthropic::AnthropicOptions
Streaming Thinking Content
Thinking content streams through ThinkingStart, ThinkingDelta, and
ThinkingEnd events. Completed messages store thinking blocks as
AssistantContent::Thinking.
Stop Reasons
Every AssistantMessage includes a stop_reason field that indicates how the
generation ended:
Stop- Normal completionLength- Output hit the maximum token limitToolUse- Model is calling tools and expects tool resultsError- An error occurred during generationAborted- Request was cancelled
AssistantMessage may also include response_id, a provider-specific response
or message identifier when the underlying API exposes one.
Error Handling
Setup failures before a stream exists are returned as Error values. Once a
provider stream exists, provider-declared failures and cancellation are
surfaced as terminal AssistantMessageEvent::Error events carrying the final
assistant message. The complete_* helpers return that final assistant message;
check message.stop_reason for StopReason::Error or StopReason::Aborted.
Transport or decoder failures that cannot be represented as provider messages
still return Err.
Aborting Requests
Use a Tokio cancellation token to abort in-flight requests. Prefer direct struct initialization over mutating default options:
use ;
use CancellationToken;
let token = new;
let options = SimpleStreamOptions ;
token.cancel;
Continuing After Abort
Abort produces an assistant message with StopReason::Aborted. The transform
layer drops aborted assistant turns before follow-up messages so conversations
can continue cleanly.
Debugging Provider Payloads
Use StreamOptions::on_payload and StreamOptions::on_response hooks to
inspect or override provider payloads and observe raw provider responses. The
hooks are supported by stream, complete, stream_simple, and
complete_simple.
APIs, Models, and Providers
Provider handles build executable models. Built-in language model APIs include:
anthropic-messages: Anthropic Messages APIopenai-completions: OpenAI Chat Completions APIopenai-responses: OpenAI Responses API
Faux provider for tests
register_faux_provider() registers a temporary in-memory provider for tests
and demos. It is opt-in and not part of the built-in provider set.
Providers and Models
A provider offers models through a specific API. In this crate:
- Anthropic models use
anthropic-messages. - OpenAI models use
openai-completionsoropenai-responses. - GitHub Copilot models use OAuth-backed OpenAI/Anthropic-compatible routes.
- Azure Foundry and other compatible endpoints are configured as custom
models by choosing an active API, setting
base_url, and fillingModelCompatwhere the endpoint differs from the default request shape.
Built-in provider handles create executable model values directly from string IDs. There is no built-in model catalog; applications that need one should keep it in application state and build models through configured provider handles.
Querying Providers and Models
use ;
let provider = from_env?;
let capabilities = provider.capabilities;
let model = provider.model.build?;
let copilot = builder
.api_key
.anthropic_messages
.build?;
let claude = copilot.model.build?;
Custom Models
You can create provider-bound models for local inference servers or custom endpoints:
use ;
let provider = builder
.provider_id
.api_key
.base_url
.chat_completions
.build?;
let model = provider.model.build?;
let stream = stream_simple?;
The same pattern works for local inference servers such as llama.cpp, MLX, Ollama, vLLM, and LM Studio when they expose an OpenAI-compatible chat endpoint.
Some OpenAI-compatible servers do not understand the developer role used for
reasoning-capable models. For those endpoints, build the model with compat
metadata so the system prompt is sent as a system message instead. If the
server also does not support reasoning_effort, disable that compat flag too.
OpenAI Compatibility Settings
The openai-completions API is implemented by many providers with minor
differences. ModelCompat stores compatibility metadata for explicit custom
models, but the active built-in surface does not infer broad provider-specific
behavior from provider names or base URLs.
Set model-builder compat metadata when the target OpenAI-compatible endpoint needs payload differences such as non-standard reasoning, cache-control, max-token, or tool-result behavior.
Thread Safety
Provider handles are regular cloneable values. Build them during application
startup, pass them where needed, and create executable model values with
provider.model(id).build()?.
Type Safety
Public types are serializable with serde where they represent portable
context or message state.
Cross-Provider Handoffs
The library supports handoffs between OpenAI, Anthropic, and GitHub Copilot-compatible models within the same conversation.
How It Works
When messages from one provider are sent to a different provider, the crate transforms them for compatibility:
- User and tool-result messages are passed through.
- Assistant messages from the same provider/API are preserved as-is.
- Assistant messages from different providers have thinking blocks converted to
text with
<thinking>tags where needed. - Tool calls and regular text are preserved.
Example: Multi-Provider Conversation
use ;
let mut context = Context ;
let anthropic = from_env?;
let claude = anthropic.model.build?;
let claude_response =
complete_simple.await?;
context.messages.push;
let openai = from_env?;
let gpt = openai.model.build?;
context.messages.push;
let gpt_response =
complete_simple.await?;
context.messages.push;
Provider Compatibility
All active providers can handle shared text, tool calls, tool results including images, thinking/reasoning blocks after transformation, and aborted messages with partial content.
Context Serialization
Context, Message, assistant content, tool calls, and tool results implement
Serialize and Deserialize, so context can be persisted or handed to another
process.
use Context;
let serialized = to_string?;
let restored: Context = from_str?;
If the context contains images encoded as base64, those are serialized too.
Serialized user, assistant, and tool-result messages include stable role
fields, including assistant messages nested in stream events.
Browser Usage
This Rust crate is server/native focused and does not provide browser-specific packaging. Pass API keys explicitly through options or use environment variables on the server.
Browser Compatibility Notes
Not applicable to this Rust crate.
Environment Variables
| Provider | Environment variables |
|---|---|
openai |
OPENAI_API_KEY |
anthropic |
ANTHROPIC_OAUTH_TOKEN, then ANTHROPIC_API_KEY |
github-copilot |
COPILOT_GITHUB_TOKEN |
Explicit API keys in StreamOptions take precedence over environment lookup.
Checking Environment Variables
use get_env_api_key;
let key = get_env_api_key;
OAuth Providers
The OAuth registry includes:
- Anthropic (Claude Pro/Max subscription)
- GitHub Copilot (Copilot subscription)
CLI Login
This crate exposes login primitives for applications that want to provide their own login UI.
Programmatic OAuth
use ;
let provider = get_oauth_provider.expect;
Login Flow Example
Use login_anthropic or login_github_copilot with OAuthLoginCallbacks to
drive the login UI from your application.
use ;
async
Using OAuth Tokens
Use provider-specific helpers to turn stored credentials into the API key used
by provider builders or stream options. For GitHub Copilot, the helper refreshes
expired credentials and returns new_credentials; persist those back to your
auth store.
use ;
async
Provider Notes
GitHub Copilot: OAuth helpers and dynamic request headers are included. Some Copilot model ids use vendor names, but they are routed through the active OpenAI/Anthropic-compatible APIs in this crate. Other native provider APIs are not registered.
Anthropic: OAuth follows the Claude Pro/Max OAuth flow.
Agent Core
Stateful agent support is part of this crate. It runs model turns, executes registered tools, appends tool results, and continues until the assistant stops, an error occurs, or a hook asks the loop to stop.
Agent Installation
No separate crate is required. The agent core lives in ai.
Agent Quick Start
use ;
async
Core Concepts
AgentMessage vs LLM Message
AgentMessage is the same portable Message enum used by the shared LLM API.
It can contain standard LLM messages (User, Assistant, ToolResult) and
Custom app-owned messages.
LLMs only understand user, assistant, and tool-result messages. The
convert_to_llm function bridges this gap by filtering or transforming custom
messages before each provider call.
Message Flow
AgentMessage[] -> transform_context() -> AgentMessage[] -> convert_to_llm() -> Message[] -> LLM
(optional) (required)
transform_context is intended for pruning, compaction, or external context
injection. convert_to_llm filters or converts app-owned messages.
Event Flow
The agent emits events for UI updates. Understanding the event sequence helps build responsive interfaces.
prompt_text() Event Sequence
When you call prompt_text("Hello"), the wrapper emits this core sequence:
prompt_text("Hello")
|- agent_start
|- turn_start
|- message_start user
|- message_end user
|- message_start assistant
|- message_update assistant delta
|- message_end assistant
|- turn_end
`- agent_end
With Tool Calls
If the assistant calls tools, the loop emits tool_execution_start, optional
tool_execution_update, tool_execution_end, then a tool-result
message_start / message_end. If the batch does not terminate, the next turn
starts and the model receives the tool results.
Tool execution mode is configurable:
Parallelis the default. Preflight runs sequentially, allowed tools execute concurrently, completion events emit as each tool finalizes, and persisted tool-result messages remain in assistant source order.Sequentialexecutes tool calls one by one.
before_tool_call runs after tool_execution_start and validated argument
parsing. after_tool_call runs after tool execution and before final tool
events. Tool results can set terminate = true; the loop stops early only when
every finalized result in the batch terminates.
Low-level loop callers can set should_stop_after_turn to stop gracefully after
the current turn completes. It runs after turn_end, before steering/follow-up
queues are polled, and before another model request starts.
continue_run() Event Sequence
continue_run() resumes from existing context without adding a new message.
Use it for retries after errors. The last message in context must be a user or
tool-result message, not an assistant message.
Event Types
| Event | Description |
|---|---|
AgentStart |
Agent begins processing |
AgentEnd |
Final event for the run. Awaited subscribers for this event still count toward settlement |
TurnStart |
New turn begins: one LLM call plus tool executions |
TurnEnd |
Turn completes with assistant message and tool results |
MessageStart |
Any message begins: user, assistant, or tool result |
MessageUpdate |
Assistant-only update containing the underlying assistant stream event |
MessageEnd |
Message completes |
ToolExecutionStart |
Tool begins |
ToolExecutionUpdate |
Tool streams progress |
ToolExecutionEnd |
Tool completes |
Agent::subscribe listeners are awaited in registration order. agent_end
means no more loop events will be emitted, but wait_for_idle and
prompt_text settle only after awaited final listeners finish.
Agent Options
AgentOptions contains:
initial_state: system prompt, model, thinking level, tools, and messages.convert_to_llm: converts agent messages to LLM messages.transform_context: prunes, compacts, or injects context before conversion.steering_modeandfollow_up_mode: queue handling behavior.stream_fn: custom stream function for proxy backends.session_id: forwarded throughSimpleStreamOptions.tool_execution: parallel or sequential tool execution.before_tool_callandafter_tool_call: preflight and postprocess hooks.prepare_next_turn: updates context, model, or thinking level before another turn starts.options: transport, retry, cancellation, payload hooks, provider options, thinking budgets, and API key defaults.
use ;
let initial_state = builder
.system_prompt
.thinking_level
.tools
.messages
.build;
let agent = new;
Agent State
AgentState contains the system prompt, active model, thinking level, tools,
message history, streaming status, pending tool call IDs, and the latest error
message.
let state = builder
.system_prompt
.thinking_level
.message
.build;
During streaming, streaming_message contains the current partial assistant
message. is_streaming remains true until the run fully settles, including
awaited agent_end subscribers.
Methods
Prompting
agent.prompt_text.await?;
agent.prompt_messages.await?;
agent.continue_run.await?;
continue_run resumes from current context. The last message must be a user or
tool-result message.
State Management
Use the state and option mutation helpers to update system prompt, model,
thinking level, tools, messages, session ID, queues, hooks, and tool execution
mode. reset returns the agent to its initial state.
agent.set_system_prompt.await;
agent.set_model.await;
agent.set_thinking_level.await;
agent.set_tools.await;
agent.set_tool_execution.await;
agent.set_messages.await;
agent.push_message.await;
agent.reset.await;
Session and Thinking Budgets
AgentOptions::session_id is forwarded to providers that support prompt-cache
or session affinity behavior. AgentOptions::options.thinking_budgets is
applied by the simple-stream option builder before each model call.
agent.set_session_id.await;
agent.set_thinking_budgets;
Control
agent.abort.await;
agent.wait_for_idle.await;
Events
let subscription = agent.subscribe;
subscription.unsubscribe;
Keep the subscription handle alive while the listener remains registered. Dropping the handle also unsubscribes.
Steering and Follow-up
Steering messages let you interrupt the agent while it is running. Follow-up messages let you queue work after the agent would otherwise stop.
When steering messages are detected after a turn completes:
- All tool calls from the current assistant message have already finished.
- Steering messages are injected.
- The LLM responds on the next turn.
Follow-up messages are checked only when there are no more tool calls and no steering messages.
Custom Message Types
Use Message::Custom for app-specific agent transcript entries. Custom
messages are retained in agent state, then filtered or converted by
convert_to_llm before provider calls.
Agent Tools
Agent tools implement the AgentTool trait. definition() returns the shared
Tool schema, label() provides UI text, execution_mode() can force a whole
batch to run sequentially, prepare_arguments() can reshape model arguments
before validation, and execute() performs the tool work.
use ;
use ;
async
Implement AgentTool directly when a tool needs state, custom argument
preparation, an execution mode override, cancellation handling, or streaming
updates.
use ;
use async_trait;
use Value;
use CancellationToken;
;
Agent Tool Error Handling
Tool failures should return an error from execute(). The loop catches that
error and reports a tool-result message with is_error = true.
Return terminate = true from execute() or after_tool_call to hint that
the agent should stop after the current tool batch. This only takes effect when
every finalized tool result in the batch is terminating.
Proxy Usage
For proxy backends, pass a custom StreamFn through AgentOptions::stream_fn
or directly to agent_loop. The function receives the selected Model, the
converted Context, and SimpleStreamOptions.
Low-Level API
Use agent_loop or agent_loop_continue when you want an event stream, and
run_agent_loop or run_agent_loop_continue when you want to await the whole
loop directly.
use StreamExt;
use ;
async
Low-level streams are observational. They preserve event order, but they do not
wait for async event handling to settle before later producer phases continue.
Use Agent when message processing must be a barrier before tool preflight.
Development
This crate currently keeps Rust test coverage in module-level unit tests under
src; there is no crates/ai/tests integration-test directory at the moment.
Adding a New Provider
Adding a new LLM provider generally requires changes across multiple files:
1. Core Types (src/types.rs)
- Add the API identifier if the provider needs a new transport shape.
- Create provider-specific options where direct provider calls need typed options.
- Add or extend compatibility metadata only when the payload behavior differs.
2. Provider Implementation (src/providers/)
Create a provider module that exports:
stream_<provider>()stream_simple_<provider>()- Provider-specific options
- Message conversion from
Contextto provider payload - Tool conversion if the provider supports tools
- Response parsing into standardized assistant events
3. Provider Factory
- Implement
Providerfor the configured provider handle. - Return model builders from
model(id)and future capability builders. - Add root-level exports in
src/lib.rswhen the provider should be public.
4. Runtime API
- Implement the capability runtime trait carried by the built model, such as
LanguageModelApi. - Map provider capability, cost, input, context-window, and reasoning metadata
onto the shared
Modeltype.
5. Tests
Create or update tests for streaming, tool use, token usage, abort behavior, context overflow, empty messages, Unicode handling, tool-result edge cases, image input/tool-result images if applicable, and cross-provider handoff.
6. Agent Integration
If the provider needs agent-specific behavior, update this crate's agent tests and examples directly.
7. Documentation
Update this README with provider scope, authentication, provider-specific options, and environment variables.
License
MIT