# LLM Program (Rust Implementation)
`llmprogram` is a Rust crate that provides a structured and powerful way to create and run programs that use Large Language Models (LLMs). It uses a YAML-based configuration to define the behavior of your LLM programs, making them easy to create, manage, and share.
## Features
- **YAML-based Configuration:** Define your LLM programs using simple and intuitive YAML files.
- **Input/Output Validation:** Use JSON schemas to validate the inputs and outputs of your programs, ensuring data integrity.
- **Tera Templating:** Use the power of Tera templates (Rust's Jinja2 equivalent) to create dynamic prompts for your LLMs.
- **Caching:** Built-in support for Redis caching to save time and reduce costs.
- **Execution Logging:** Automatically log program executions to a SQLite database for analysis and debugging.
- **Analytics:** Comprehensive analytics tracking with SQLite for token usage, LLM calls, program usage, and timing metrics.
- **Streaming:** Support for streaming responses from the LLM.
- **Batch Processing:** Process multiple inputs in parallel for improved performance.
- **CLI for Dataset Generation:** A command-line interface to generate instruction datasets for LLM fine-tuning from your logged data.
- **AI-Assisted YAML Generation:** Generate LLM program YAML files automatically based on natural language descriptions.
## Installation
Add this to your `Cargo.toml`:
```toml
[dependencies]
llmprogram = "0.1.0"
```
Or install the CLI globally:
```bash
cargo install llmprogram
```
## Usage
### CLI Usage
1. **Set your OpenAI API Key:**
```bash
export OPENAI_API_KEY='your-api-key'
```
2. **Create a program YAML file:**
Create a file named `sentiment_analysis.yaml`:
```yaml
name: sentiment_analysis
description: Analyzes the sentiment of a given text.
version: 1.0.0
model:
provider: openai
name: gpt-4.1-mini
temperature: 0.5
max_tokens: 100
response_format: json_object
system_prompt: |
You are a sentiment analysis expert. Analyze the sentiment of the given text and return a JSON response with the following format:
- sentiment (string): "positive", "negative", or "neutral"
- score (number): A score from -1 (most negative) to 1 (most positive)
input_schema:
type: object
required:
- text
properties:
text:
type: string
description: The text to analyze.
output_schema:
type: object
required:
- sentiment
- score
properties:
sentiment:
type: string
enum: ["positive", "negative", "neutral"]
score:
type: number
minimum: -1
maximum: 1
template: |
Analyze the following text:
{{text}}
```
3. **Run the program using the CLI:**
```bash
llmprogram run sentiment_analysis.yaml --inputs examples/sentiment_inputs.json
llmprogram run sentiment_analysis.yaml --input-json '{"text": "I love this product!"}'
echo '{"text": "I love this product!"}' | llmprogram run sentiment_analysis.yaml
llmprogram run sentiment_analysis.yaml --inputs examples/sentiment_inputs.json --stream
llmprogram run sentiment_analysis.yaml --inputs examples/sentiment_inputs.json --output result.json
```
### Programmatic Usage
You can also use the llmprogram library directly in your Rust code:
```rust
use llmprogram::LLMProgram;
use std::collections::HashMap;
use serde_json::Value;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Create and run the sentiment analysis program
let program = LLMProgram::new("sentiment_analysis.yaml")?;
let mut inputs = HashMap::new();
inputs.insert("text".to_string(), Value::String("I love this new product! It is amazing.".to_string()));
let result = program.run(&inputs).await?;
println!("{}", serde_json::to_string_pretty(&result)?);
Ok(())
}
```
## Configuration
The behavior of each LLM program is defined in a YAML file. Here are the key sections:
- `name`, `description`, `version`: Basic metadata for your program.
- `model`: Defines the LLM provider, model name, and other parameters like `temperature` and `max_tokens`.
- `system_prompt`: The instructions that are given to the LLM to guide its behavior.
- `input_schema`: A JSON schema that defines the expected input for the program. The program will validate the input against this schema before execution.
- `output_schema`: A JSON schema that defines the expected output from the LLM. The program will validate the LLM's output against this schema.
- `template`: A Tera template that is used to generate the prompt that is sent to the LLM. The template is rendered with the input variables.
## Using with other OpenAI-compatible endpoints
You can use `llmprogram` with any OpenAI-compatible endpoint, such as [Ollama](https://ollama.ai/). To do this, you can pass the `api_key` and `base_url` to the `LLMProgram` constructor:
```rust
let program = LLMProgram::new_with_options(
"your_program.yaml",
Some("your-api-key".to_string()),
Some("http://localhost:11434/v1".to_string()), // example for Ollama
true, // enable_cache
"redis://localhost:6379"
)?;
```
## Caching
`llmprogram` supports caching of LLM responses to Redis to improve performance and reduce costs. To enable caching, you need to have a Redis server running.
By default, caching is enabled. You can disable it or configure the Redis connection and cache TTL (time-to-live) when you create an `LLMProgram` instance:
```rust
let program = LLMProgram::new_with_options(
"your_program.yaml",
None, // api_key
None, // base_url
false, // enable_cache
"redis://localhost:6379"
)?;
```
## Logging and Dataset Generation
`llmprogram` automatically logs every execution of a program to a SQLite database. The database file is created in the same directory as the program YAML file, with a `.db` extension.
This logging feature is not just for debugging; it's also a powerful tool for creating high-quality datasets for fine-tuning your own LLMs. Each record in the log contains:
- `function_input`: The input given to the program.
- `function_output`: The output received from the LLM.
- `llm_input`: The prompt sent to the LLM.
- `llm_output`: The raw response from the LLM.
### Generating a Dataset
You can use the built-in CLI to generate an instruction dataset from the logged data. The dataset is created in JSONL format, which is commonly used for fine-tuning.
```bash
llmprogram generate-dataset /path/to/your_program.db /path/to/your_dataset.jsonl
```
Each line in the output file will be a JSON object with the following keys:
- `instruction`: The system prompt and the user prompt, combined to form the instruction for the LLM.
- `output`: The output from the LLM.
## Command-Line Interface (CLI)
`llmprogram` comes with a command-line interface for common tasks.
### `run`
Run an LLM program with inputs from command line or files.
**Usage:**
```bash
# First, set your OpenAI API key
export OPENAI_API_KEY='your-api-key'
# Run with inputs from a JSON file
llmprogram run program.yaml --inputs inputs.json
# Run with inputs from command line
llmprogram run program.yaml --input-json '{"text": "I love this product!"}'
# Run with inputs from stdin
# Run with streaming output
llmprogram run program.yaml --inputs inputs.json --stream
# Save output to a file
llmprogram run program.yaml --inputs inputs.json --output result.json
```
**Arguments:**
- `program_path`: The path to the program YAML file.
- `--inputs`, `-i`: Path to JSON/YAML file containing inputs.
- `--input-json`: JSON string of inputs.
- `--output`, `-o`: Path to output file (default: stdout).
- `--stream`, `-s`: Stream the response.
### `generate-yaml`
Generate an LLM program YAML file based on description using an AI assistant.
**Usage:**
```bash
# Generate a YAML program with a simple description
llmprogram generate-yaml "Create a program that analyzes the sentiment of text" --output sentiment_analyzer.yaml
# Generate a YAML program with examples
llmprogram generate-yaml "Create a program that extracts key information from customer reviews" \
--example-input "The battery life on this phone is amazing! It lasts all day." \
--example-output '{"product_quality": "positive", "battery": "positive", "durability": "neutral"}' \
--output review_analyzer.yaml
# Generate a YAML program and output to stdout
llmprogram generate-yaml "Create a program that summarizes long texts"
```
**Arguments:**
- `description`: A detailed description of what the LLM program should do.
- `--example-input`: Example of the input the program will receive.
- `--example-output`: Example of the output the program should generate.
- `--output`, `-o`: Path to output YAML file (default: stdout).
- `--api-key`: OpenAI API key (optional, defaults to OPENAI_API_KEY env var).
### `analytics`
Show analytics data collected from LLM program executions.
**Usage:**
```bash
# Show all analytics data
llmprogram analytics
# Show analytics for a specific program
llmprogram analytics --program sentiment_analysis
# Show analytics for a specific model
llmprogram analytics --model gpt-4
# Use a custom analytics database path
llmprogram analytics --db-path /path/to/custom/analytics.db
```
**Arguments:**
- `--db-path`: Path to the analytics database (default: llmprogram_analytics.db).
- `--program`: Filter by program name.
- `--model`: Filter by model name.
### `generate-dataset`
Generate an instruction dataset for LLM fine-tuning from a SQLite log file.
**Usage:**
```bash
llmprogram generate-dataset <database_path> <output_path>
```
**Arguments:**
- `database_path`: The path to the SQLite database file.
- `output_path`: The path to write the generated dataset to.
## Examples
You can find more examples in the `examples` directory:
- **Sentiment Analysis:** A simple program to analyze the sentiment of a piece of text. (`examples/sentiment_analysis.yaml`)
- **Code Generator:** A program that generates Python code from a natural language description. (`examples/code_generator.yaml`)
- **Email Generator:** A program that generates professional emails based on input parameters. (`examples/email_generator.yaml`)
To run the examples:
1. Navigate to the project directory.
2. Run the corresponding example command:
```bash
llmprogram run examples/sentiment_analysis.yaml --inputs examples/sentiment_inputs.json
llmprogram run examples/sentiment_analysis.yaml --inputs examples/sentiment_batch_inputs.json
llmprogram run examples/sentiment_analysis.yaml --inputs examples/sentiment_inputs.json --stream
llmprogram run examples/sentiment_analysis.yaml --inputs examples/sentiment_inputs.json --output result.json
llmprogram analytics
llmprogram analytics --program sentiment_analysis
llmprogram generate-yaml "Create a program that classifies email priority" \
--example-input "Subject: Urgent meeting tomorrow. Body: Please prepare the Q3 report." \
--example-output '{"priority": "high", "category": "work", "response_required": true}' \
--output email_classifier.yaml
llmprogram generate-dataset sentiment_analysis.db dataset.jsonl
```
## Development
To run the tests for this package:
```bash
cargo test
```
To build the documentation:
```bash
cargo doc --open
```
## License
MIT
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.