# Secretary
[](https://crates.io/crates/secretary)
[](https://docs.rs/secretary)
[](LICENSE)
**Secretary** is a Rust library that transforms natural language into structured data using large language models (LLMs). With its powerful derive macro system, you can extract structured information from unstructured text with minimal boilerplate code.
## Features
- ๐ **Unified Task Trait**: Single trait combining data extraction, schema definition, and system prompt generation with `#[derive(Task)]`
- ๐ **Schema-Based Extraction**: Define your data structure using Rust structs with field-level instructions
- ๐ **Declarative Field Instructions**: Use `#[task(instruction = "...")]` attributes to guide extraction
- โก **Async Support**: Built-in async/await support for concurrent processing
- ๐ง **Reasoning Model Support**: Force generation methods for models without JSON mode (o1, deepseek, etc.)
- ๐ **Extensible LLM Support**: Currently supports OpenAI API with more providers planned
- ๐ก๏ธ **Type Safety**: Leverage Rust's type system for reliable data extraction
- ๐งน **Simplified API**: Consolidated traits reduce boilerplate and complexity
## Quick Start
```bash
cargo add secretary
```
### Basic Example
```rust
use secretary::Task;
use secretary::llm_providers::openai::OpenAILLM;
use secretary::traits::GenerateData;
use serde::{Serialize, Deserialize};
// Define your data structure with extraction instructions
#[derive(Task, Serialize, Deserialize, Debug, Default)]
struct PersonInfo {
// Data fields with specific extraction instructions
#[task(instruction = "Extract the person's full name")]
pub name: String,
#[task(instruction = "Extract age as a number")]
pub age: u32,
#[task(instruction = "Extract email address if mentioned")]
pub email: Option<String>,
#[task(instruction = "List all hobbies or interests mentioned")]
pub interests: Vec<String>,
}
fn main() -> anyhow::Result<()> {
// Create a task instance
let task = PersonInfo::new();
// Additional instructions for the LLM
let additional_instructions = vec![
"Be precise with personal information".to_string(),
"Use 'Unknown' for missing data".to_string(),
];
// Initialize LLM client
let llm = OpenAILLM::new(
"https://api.openai.com/v1",
"your-api-key",
"gpt-4"
)?;
// Process natural language input
let input = "Hi, I'm Jane Smith, 29 years old. My email is jane@example.com. I love hiking, coding, and playing piano.";
// Process natural language input and get structured data directly
let person: PersonInfo = llm.generate_data(&task, input, &additional_instructions)?;
println!("{:#?}", person);
Ok()
}
```
## How It Works
1. **Define Your Schema**: Create a Rust struct with `#[derive(Task)]` and field-level instructions
2. **Annotate Fields**: Use `#[task(instruction = "...")]` to guide the LLM on how to extract each field
3. **Automatic Implementation**: The derive macro implements all necessary traits (data model, system prompt generation)
4. **Create Task Instance**: Initialize with `YourStruct::new()`
5. **Process Text**: Send natural language input to an LLM through the Secretary API with additional instructions
6. **Get Structured Data**: Receive structured data parsed into your struct
### Field Instructions
The `#[task(instruction = "...")]` attribute tells the LLM how to extract each field:
```rust
#[derive(Task, Serialize, Deserialize, Debug, Default)]
struct ProductInfo {
#[task(instruction = "Extract the product name or title")]
pub name: String,
#[task(instruction = "Extract price as a number without currency symbols")]
pub price: f64,
#[task(instruction = "Categorize the product type (electronics, clothing, etc.)")]
pub category: String,
#[task(instruction = "Extract brand name if mentioned, otherwise null")]
pub brand: Option<String>,
#[task(instruction = "Determine if product is available (true/false)")]
pub in_stock: bool,
}
```
## Advanced Features
### Async Processing
Secretary provides full async support for concurrent processing:
```rust
use secretary::traits::AsyncGenerateData;
use tokio;
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let llm = OpenAILLM::new("https://api.openai.com/v1", "your-api-key", "gpt-4")?;
let task = PersonInfo::new();
let additional_instructions = vec!["Extract accurately".to_string()];
// Process multiple inputs concurrently
let inputs = vec![
"John Doe, 25, loves gaming",
"Alice Smith, 30, enjoys reading and cooking",
"Bob Johnson, 35, passionate about photography",
];
let futures: Vec<_> = inputs.into_iter().map(|input| {
let llm = &llm;
let task = &task;
let additional_instructions = &additional_instructions;
async move {
llm.async_generate_data(task, input, additional_instructions).await
}
}).collect();
let results = futures::future::join_all(futures).await;
for result in results {
match result {
Ok(json) => println!("Extracted: {}", json),
Err(e) => eprintln!("Error: {}", e),
}
}
Ok(())
}
```
### Multiple Extractions
Process multiple inputs with the same task configuration:
```rust
fn main() -> anyhow::Result<()> {
let task = PersonInfo::new();
let additional_instructions = vec!["Extract all available information".to_string()];
let llm = OpenAILLM::new("https://api.openai.com/v1", "your-api-key", "gpt-4")?;
let inputs = vec![
"Hi, I'm John, 25 years old",
"Sarah works as a designer and is 30",
"Mike's email is mike@example.com"
];
for input in inputs {
let person: PersonInfo = llm.generate_data(&task, input, &additional_instructions)?;
println!("{:#?}", person);
}
Ok(())
}
```
### Force Generation for Models Without a JSON Mode
Secretary supports reasoning models like o1 and deepseek that don't have built-in JSON mode support through force generation methods:
```rust
use secretary::traits::{GenerateData, AsyncGenerateData};
// Synchronous force generation
let result: PersonInfo = llm.force_generate_data(&task, input, &additional_instructions)?;
// Asynchronous force generation
let result: PersonInfo = llm.async_force_generate_data(&task, input, &additional_instructions).await?;
```
### System Prompt Generation
The derive macro automatically generates comprehensive system prompts:
```rust
let task = PersonInfo::new();
let prompt = task.get_system_prompt();
println!("{}", prompt);
// Output includes:
// - JSON structure specification
// - Field-specific extraction instructions
// - Response format requirements
```
## Examples
The `examples/` directory contains practical demonstrations:
### Basic Usage
- **`sync.rs`** - Basic person information extraction using synchronous API
- **`async.rs`** - Async product information extraction with comprehensive testing
### Force Generation (for Reasoning Models)
- **`sync_force.rs`** - Financial report extraction using force generation for models without JSON mode
- **`async_force.rs`** - Research paper extraction using async force generation for reasoning models
Run examples with:
```bash
# Basic synchronous example
cargo run --example sync
# Async example with comprehensive testing
cargo run --example async
# Force generation examples (for o1, deepseek, etc.)
cargo run --example sync_force
cargo run --example async_force
# To test with real API, set environment variables:
export SECRETARY_OPENAI_API_BASE="https://api.openai.com/v1"
export SECRETARY_OPENAI_API_KEY="your-api-key"
export SECRETARY_OPENAI_MODEL="gpt-4" # or "o1-preview", "deepseek-reasoner", etc.
cargo run --example async
```
## Environment Setup
For production use with OpenAI:
```bash
export SECRETARY_OPENAI_API_BASE="https://api.openai.com/v1"
export SECRETARY_OPENAI_API_KEY="your-openai-api-key"
export SECRETARY_OPENAI_MODEL="gpt-4"
```
In your code:
```rust
let api_base = std::env::var("SECRETARY_OPENAI_API_BASE")
.expect("SECRETARY_OPENAI_API_BASE environment variable not set");
let api_key = std::env::var("SECRETARY_OPENAI_API_KEY")
.expect("SECRETARY_OPENAI_API_KEY environment variable not set");
let model = std::env::var("SECRETARY_OPENAI_MODEL")
.expect("SECRETARY_OPENAI_MODEL environment variable not set");
let llm = OpenAILLM::new(&api_base, &api_key, &model)?;
```
## API Reference
### Core Traits
| `Task` | Main trait for data extraction tasks | `new()`, `get_system_prompt()`, `push()` |
| `GenerateData` | Synchronous LLM interaction | `generate_data()`, `force_generate_data()` |
| `AsyncGenerateData` | Asynchronous LLM interaction | `async_generate_data()`, `async_force_generate_data()` |
| `IsLLM` | LLM provider abstraction | `access_client()`, `access_model()` |
| `ToJSON`/`FromJSON` | Serialization utilities | `to_json()`, `from_json()` |
### Derive Macro Attributes
- `#[derive(Task)]` - Implements the Task trait automatically
- `#[task(instruction = "...")]` - Provides field-specific extraction instructions
## Troubleshooting
### Common Issues
**"Failed to execute function" Error**
- Check your API key and endpoint configuration
- Verify network connectivity
- Ensure the model name is correct
**Serialization Errors**
- Ensure all data fields implement `Serialize` and `Deserialize`
- Check that field types match the expected JSON structure
- Verify that optional fields are properly handled
### Performance Tips
- Use async methods for concurrent processing
- Batch multiple requests when possible
- Consider caching LLM responses for repeated queries
- Use specific field instructions to improve extraction accuracy
## Roadmap
- [ ] Context-aware conversations and multi-turn interactions
- [ ] Support for additional LLM providers (Anthropic, Azure OpenAI, etc.)
- [ ] Enhanced error handling and validation
- [ ] Performance optimizations and caching
- [ ] Integration with more serialization formats
- [ ] Advanced prompt engineering features
- [ ] Streaming response support
## Contributing
Contributions are welcome!
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.