rstructor: Structured LLM Outputs for Rust
rstructor is a Rust library for extracting structured data from Large Language Models (LLMs) with built-in validation. Define your schemas as Rust structs/enums, and rstructor will handle the restβgenerating JSON Schemas, communicating with LLMs, parsing responses, and validating the results.
Think of it as the Rust equivalent of Instructor + Pydantic for Python, bringing the same structured output capabilities to the Rust ecosystem.
β¨ Features
- π Type-Safe Definitions: Define data models as standard Rust structs/enums with attributes
- π JSON Schema Generation: Auto-generates JSON Schema from your Rust types
- β Built-in Validation: Type checking plus custom business rule validation
- π Multiple LLM Providers: Support for OpenAI, Anthropic, Grok (xAI), and Gemini (Google), with an extensible backend system
- π§© Complex Data Structures: Support for nested objects, arrays, optional fields, and deeply nested enums
- π§ Schema Fidelity: Heuristic-free JSON Schema generation that preserves nested struct and enum detail
- π Custom Validation Rules: Add domain-specific validation with automatically detected
validatemethods - π Async API: Fully asynchronous API for efficient operations
- βοΈ Builder Pattern: Fluent API for configuring LLM clients (temperature retries, timeouts, etc)
- π Feature Flags: Optional backends via feature flags
π¦ Installation
Add rstructor to your Cargo.toml:
[]
= "0.1.0"
= { = "1.0", = ["derive"] }
= { = "1.0", = ["rt-multi-thread", "macros"] }
π Quick Start
Here's a simple example of extracting structured information about a movie from an LLM:
use ;
use ;
use env;
use Duration;
// Define your data model
async
π Detailed Examples
Production Example with Automatic Retry
For production use, configure the client with retry options so it automatically retries on validation errors:
use ;
use Duration;
use ;
// Configure the client to automatically retry validation errors
let client = client
.max_retries // retry up to 3 times on validation errors
.include_error_feedback; // include validation feedback in retry prompts
let movie: Movie = client..await?;
Extended Thinking (GPT-5.x, Gemini 3, Claude 4.x)
Configure the depth of reasoning for models that support extended thinking:
use ;
// GPT-5.2 with Low thinking (default)
let client = from_env?;
// Enable higher thinking for complex reasoning tasks
let client = from_env?
.thinking_level;
// Gemini 3 Flash Preview with Low thinking (default)
let client = from_env?;
// Disable thinking for maximum speed
let client = from_env?
.thinking_level;
// Claude 4.x with thinking enabled
let client = from_env?
.thinking_level;
Thinking Levels:
Off- Disabled (fastest, no reasoning overhead)Minimal- Minimal reasoning (Gemini Flash only)Low- Light reasoning (default for GPT-5.x, Gemini 3)Medium- Balanced reasoningHigh- Deep reasoning for complex problem-solving
Token Usage / Metadata
Track token usage for monitoring costs and debugging:
use ;
use ;
async
The _with_metadata variants return a result wrapper that includes:
data/text- The actual response datausage- OptionalTokenUsagewithmodel,input_tokens, andoutput_tokens
Use the standard materialize() and generate() methods if you don't need token tracking.
Error Handling
rstructor provides rich, actionable error types to help you handle API failures gracefully:
use ;
use Duration;
async
Retryable errors (automatically retried when .max_retries() is configured):
RateLimited- 429 errors with optionalretry_afterdurationServiceUnavailable- 503 errorsGatewayError- 520-524 Cloudflare errorsServerError- 500/502 errorsTimeout- Request timeout
Non-retryable errors:
AuthenticationFailed- Invalid/missing API key (401)PermissionDenied- Access denied (403)InvalidModel- Model not found (404)RequestTooLarge- Payload too large (413)BadRequest- Malformed request (400)
Model Discovery
Query available models from any provider's API:
use ;
async
Each provider filters to relevant chat completion models:
- OpenAI: GPT models (
gpt-*,o1-*,o3-*) - Anthropic: Claude models (
claude-*) - Gemini: Models supporting
generateContent - Grok: Grok models (
grok-*)
Basic Example with Validation
Add custom validation rules to enforce business logic beyond type checking:
use ;
use ;
// Custom validation function referenced by the validate attribute
Complex Nested Structures
rstructor supports complex nested data structures:
use ;
use Duration;
use ;
// Define a nested data model for a recipe
// Usage:
// let recipe: Recipe = client.materialize("Give me a recipe for chocolate chip cookies").await?;
Working with Enums
rstructor supports both simple enums and enums with associated data.
Simple Enums
Use enums for categorical data:
use ;
use ;
// Define an enum for sentiment analysis
// Usage:
// let analysis: SentimentAnalysis = client.materialize("Analyze the sentiment of: I love this product!").await?;
Enums with Associated Data (Tagged Unions)
rstructor also supports more complex enums with associated data:
use ;
use ;
// Enum with different types of associated data
// Using struct variants for more complex associated data
// Usage:
// let user_status: UserStatus = client.materialize("What's the user's status?").await?;
Nested Enums Across Structs
Enums can be freely nested inside other enums and structsβ#[derive(Instructor)] now
generates the correct schema without requiring manual SchemaType implementations:
// Works automatically β TaskState::schema() includes the nested enum structure.
See examples/nested_enum_example.rs for a complete runnable walkthrough that
also exercises deserialization of nested enum variants.
When serialized to JSON, these enum variants with data become tagged unions:
// UserStatus::Away("Back in 10 minutes")
// PaymentMethod::Card { number: "4111...", expiry: "12/25" }
Working with Custom Types (Dates, UUIDs, etc.)
rstructor provides the CustomTypeSchema trait to handle types that don't have direct JSON representations but need specific schema formats. This is particularly useful for:
- Date/time types (e.g.,
chrono::DateTime) - UUIDs (e.g.,
uuid::Uuid) - Email addresses
- URLs
- Custom domain-specific types
Basic Implementation
use ;
use ;
use ;
use json;
use Uuid;
// Implement CustomTypeSchema for chrono::DateTime<Utc>
// Implement CustomTypeSchema for UUID
Usage in Structs
Once implemented, these custom types can be used directly in your structs:
Advanced Customization
You can add additional schema properties for more complex validation:
The macro automatically detects these custom types and generates appropriate JSON Schema with format specifications that guide LLMs to produce correctly formatted values. The library includes built-in recognition of common date and UUID types, but you can implement the trait for any custom type.
Flexible Model Selection and Custom Endpoints
rstructor supports both enum variants and string-based model selection, making it easy to:
- Use new models without waiting for library updates
- Work with local LLMs or OpenAI-compatible endpoints
- Configure models from environment variables or config files
Using String Model Names
You can specify models as strings, which is especially useful for:
- New models that haven't been added to the enum yet
- Local LLMs running on your infrastructure
- Custom or fine-tuned models
use OpenAIClient;
// Use a string directly - works with any model name
let client = new?
.model; // Custom model name
// Works with environment variables
let model_name = var.unwrap;
let client = new?
.model; // Dynamic model selection
// Enum variants still work for convenience
let client = new?
.model; // Type-safe enum variant
Custom Endpoints for Local LLMs
Use rstructor with local LLMs or proxy endpoints by setting a custom base URL:
use OpenAIClient;
// Connect to a local LLM server (e.g., Ollama, vLLM, etc.)
let client = new? // API key may not be required for local
.base_url // Custom endpoint
.model; // Local model name
// Or use with OpenAI-compatible proxy services
let client = new?
.base_url // Proxy endpoint
.model;
Note: All providers (OpenAI, Anthropic, Grok, Gemini) support both string model selection and custom endpoints via the .base_url() method.
Configuring Different LLM Providers
Choose between different providers:
// Using OpenAI
let openai_client = new?
.model
.temperature
.max_tokens
.timeout; // Optional: set 60 second timeout
// Using Anthropic
let anthropic_client = new?
.model
.temperature
.max_tokens
.timeout; // Optional: set 60 second timeout
// Using Grok (xAI) - reads from XAI_API_KEY environment variable
let grok_client = from_env? // Reads from XAI_API_KEY env var
.model
.temperature
.max_tokens
.timeout; // Optional: set 60 second timeout
// Using Gemini (Google) - reads from GEMINI_API_KEY environment variable
let gemini_client = from_env? // Reads from GEMINI_API_KEY env var
.model
.temperature
.max_tokens
.timeout; // Optional: set 60 second timeout
Configuring Request Timeouts
All clients (OpenAIClient, AnthropicClient, GrokClient, and GeminiClient) support configurable timeouts for HTTP requests using the builder pattern:
use Duration;
let client = new?
.model
.temperature
.timeout; // Set 30 second timeout
Timeout Behavior:
- The timeout applies to each HTTP request made by the client
- If a request exceeds the timeout, it will return
RStructorError::Timeout - If no timeout is specified, the client uses reqwest's default timeout behavior
- Timeout values are specified as
std::time::Duration(e.g.,Duration::from_secs(30)orDuration::from_millis(2500))
Example with timeout error handling:
use ;
use Duration;
match client..await
Handling Container-Level Attributes
Add metadata and examples at the container level:
)]
π API Reference
Instructor Trait
The Instructor trait is the core of rstructor. It's implemented automatically via the derive macro and provides schema generation and validation:
Override the validate method to add custom validation logic.
CustomTypeSchema Trait
The CustomTypeSchema trait allows you to define JSON Schema representations for types that don't have direct JSON equivalents, like dates and UUIDs:
Implement this trait for custom types like DateTime<Utc> or Uuid to control their JSON Schema representation. Most implementations only need to specify schema_type() and schema_format(), with the remaining methods providing additional schema customization when needed.
LLMClient Trait
The LLMClient trait defines the interface for all LLM providers:
Note: For production applications, configure the client with .max_retries() and .include_error_feedback() and call materialize().
Supported Attributes
Field Attributes
description: Text description of the fieldexample: A single example valueexamples: Multiple example values
Container Attributes
description: Text description of the struct or enumtitle: Custom title for the JSON Schemaexamples: Example instances as JSON objects
π§ Feature Flags
Configure rstructor with feature flags:
[]
= { = "0.1.0", = ["openai", "anthropic", "grok", "gemini"] }
Available features:
openai: Include the OpenAI clientanthropic: Include the Anthropic clientgrok: Include the Grok (xAI) clientgemini: Include the Gemini (Google) clientderive: Include the derive macro (enabled by default)logging: Enable tracing integration with default subscriber
π Logging and Tracing
rstructor includes structured logging via the tracing crate:
use ;
// Initialize with desired level
init_logging;
// Or use filter strings for granular control
// init_logging_with_filter("rstructor=info,rstructor::backend=trace");
Override with environment variables:
RSTRUCTOR_LOG=debug
Validation errors, retries, and API interactions are thoroughly logged at appropriate levels.
π Examples
See the examples/ directory for complete, working examples:
structured_movie_info.rs: Basic example of getting movie information with validationnested_objects_example.rs: Working with complex nested structures for recipe datanews_article_categorizer.rs: Using enums for categorizationenum_with_data_example.rs: Working with enums that have associated data (tagged unions)event_planner.rs: Interactive event planning with user inputweather_example.rs: Simple model with validation demonstrationvalidation_example.rs: Demonstrates custom validation without dead code warningscustom_type_example.rs: Using custom types like dates and UUIDs with JSON Schema format supportlogging_example.rs: Demonstrates tracing integration with custom log levelsnested_enum_example.rs: Shows automatic schema generation for nested enums inside structs
βΆοΈ Running the Examples
# Set environment variables
# or
# or
# or
# Run examples
β οΈ Current Limitations
rstructor currently focuses on single-turn, synchronous structured output generation. The following features are planned but not yet implemented:
- Streaming Responses: Real-time streaming of partial results as they're generated
- Conversation History: Multi-turn conversations with message history (currently only single prompts supported)
- System Messages: Explicit system prompts for role-based interactions
- Response Modes: Different validation strategies (strict, partial, etc.)
- Rate Limiting: Built-in rate limit handling and backoff strategies
π£οΈ Roadmap
- Core traits and interfaces
- OpenAI backend implementation
- Anthropic backend implementation
- Procedural macro for deriving
Instructor - Schema generation functionality
- Custom validation capabilities
- Support for nested structures
- Rich validation API with custom domain rules
- Support for enums with associated data (tagged unions)
- Support for custom types (dates, UUIDs, etc.)
- Structured logging and tracing
- Automatic retry with validation error feedback
- Streaming responses
- Conversation history / multi-turn support
- System messages and role-based prompts
- Response modes (strict, partial, retry)
- Rate limiting and backoff strategies
- Support for additional LLM providers
- Integration with web frameworks (Axum, Actix)
π License
This project is licensed under the MIT License - see the LICENSE file for details.
π₯ Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
π For Python Developers
If you're coming from Python and searching for:
- "pydantic rust" or "rust pydantic" β rstructor provides similar schema validation and type safety
- "instructor rust" or "rust instructor" β rstructor offers the same structured LLM output extraction
- "structured output rust" or "llm structured output rust" β this is exactly what rstructor does
- "type-safe llm rust" β rstructor ensures type safety from LLM responses to Rust structs
rstructor brings the familiar Python patterns you know from Instructor and Pydantic to Rust, with the added benefits of Rust's type system and performance.