LLM Program (Rust Implementation)

llmprogram is a Rust crate that provides a structured and powerful way to create and run programs that use Large Language Models (LLMs). It uses a YAML-based configuration to define the behavior of your LLM programs, making them easy to create, manage, and share.

Features

YAML-based Configuration: Define your LLM programs using simple and intuitive YAML files.
Input/Output Validation: Use JSON schemas to validate the inputs and outputs of your programs, ensuring data integrity.
Tera Templating: Use the power of Tera templates (Rust's Jinja2 equivalent) to create dynamic prompts for your LLMs.
Caching: Built-in support for Redis caching to save time and reduce costs.
Execution Logging: Automatically log program executions to a SQLite database for analysis and debugging.
Analytics: Comprehensive analytics tracking with SQLite for token usage, LLM calls, program usage, and timing metrics.
Streaming: Support for streaming responses from the LLM.
Batch Processing: Process multiple inputs in parallel for improved performance.
CLI for Dataset Generation: A command-line interface to generate instruction datasets for LLM fine-tuning from your logged data.
AI-Assisted YAML Generation: Generate LLM program YAML files automatically based on natural language descriptions.

Installation

Add this to your Cargo.toml:

[dependencies]
llmprogram = "0.1.0"

Or install the CLI globally:

cargo install llmprogram

Usage

CLI Usage

Set your OpenAI API Key:
```
export OPENAI_API_KEY='your-api-key'
```

Create a program YAML file:

Create a file named sentiment_analysis.yaml:

name: sentiment_analysis
description: Analyzes the sentiment of a given text.
version: 1.0.0

model:
  provider: openai
  name: gpt-4.1-mini
  temperature: 0.5
  max_tokens: 100
  response_format: json_object

system_prompt: |
  You are a sentiment analysis expert. Analyze the sentiment of the given text and return a JSON response with the following format:
  - sentiment (string): "positive", "negative", or "neutral"
  - score (number): A score from -1 (most negative) to 1 (most positive)

input_schema:
  type: object
  required:
    - text
  properties:
    text:
      type: string
      description: The text to analyze.

output_schema:
  type: object
  required:
    - sentiment
    - score
  properties:
    sentiment:
      type: string
      enum: ["positive", "negative", "neutral"]
    score:
      type: number
      minimum: -1
      maximum: 1

template: |
  Analyze the following text:
  {{text}}

Run the program using the CLI:

# Using a JSON input file
llmprogram run sentiment_analysis.yaml --inputs examples/sentiment_inputs.json

# Using inline JSON
llmprogram run sentiment_analysis.yaml --input-json '{"text": "I love this product!"}'

# Using stdin
echo '{"text": "I love this product!"}' | llmprogram run sentiment_analysis.yaml

# Using streaming output
llmprogram run sentiment_analysis.yaml --inputs examples/sentiment_inputs.json --stream

# Saving output to a file
llmprogram run sentiment_analysis.yaml --inputs examples/sentiment_inputs.json --output result.json

Programmatic Usage

You can also use the llmprogram library directly in your Rust code:

use llmprogram::LLMProgram;
use std::collections::HashMap;
use serde_json::Value;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create and run the sentiment analysis program
    let program = LLMProgram::new("sentiment_analysis.yaml")?;
    
    let mut inputs = HashMap::new();
    inputs.insert("text".to_string(), Value::String("I love this new product! It is amazing.".to_string()));
    
    let result = program.run(&inputs).await?;
    println!("{}", serde_json::to_string_pretty(&result)?);
    
    Ok(())
}

Configuration

The behavior of each LLM program is defined in a YAML file. Here are the key sections:

name, description, version: Basic metadata for your program.
model: Defines the LLM provider, model name, and other parameters like temperature and max_tokens.
system_prompt: The instructions that are given to the LLM to guide its behavior.
input_schema: A JSON schema that defines the expected input for the program. The program will validate the input against this schema before execution.
output_schema: A JSON schema that defines the expected output from the LLM. The program will validate the LLM's output against this schema.
template: A Tera template that is used to generate the prompt that is sent to the LLM. The template is rendered with the input variables.

Using with other OpenAI-compatible endpoints

You can use llmprogram with any OpenAI-compatible endpoint, such as Ollama. To do this, you can pass the api_key and base_url to the LLMProgram constructor:

let program = LLMProgram::new_with_options(
    "your_program.yaml",
    Some("your-api-key".to_string()),
    Some("http://localhost:11434/v1".to_string()),  // example for Ollama
    true,  // enable_cache
    "redis://localhost:6379"
)?;

Caching

llmprogram supports caching of LLM responses to Redis to improve performance and reduce costs. To enable caching, you need to have a Redis server running.

By default, caching is enabled. You can disable it or configure the Redis connection and cache TTL (time-to-live) when you create an LLMProgram instance:

let program = LLMProgram::new_with_options(
    "your_program.yaml",
    None,  // api_key
    None,  // base_url
    false, // enable_cache
    "redis://localhost:6379"
)?;

Logging and Dataset Generation

llmprogram automatically logs every execution of a program to a SQLite database. The database file is created in the same directory as the program YAML file, with a .db extension.

This logging feature is not just for debugging; it's also a powerful tool for creating high-quality datasets for fine-tuning your own LLMs. Each record in the log contains:

function_input: The input given to the program.
function_output: The output received from the LLM.
llm_input: The prompt sent to the LLM.
llm_output: The raw response from the LLM.

Generating a Dataset

You can use the built-in CLI to generate an instruction dataset from the logged data. The dataset is created in JSONL format, which is commonly used for fine-tuning.

llmprogram generate-dataset /path/to/your_program.db /path/to/your_dataset.jsonl

Each line in the output file will be a JSON object with the following keys:

instruction: The system prompt and the user prompt, combined to form the instruction for the LLM.
output: The output from the LLM.

Command-Line Interface (CLI)

llmprogram comes with a command-line interface for common tasks.

`run`

Run an LLM program with inputs from command line or files.

Usage:

# First, set your OpenAI API key
export OPENAI_API_KEY='your-api-key'

# Run with inputs from a JSON file
llmprogram run program.yaml --inputs inputs.json

# Run with inputs from command line
llmprogram run program.yaml --input-json '{"text": "I love this product!"}'

# Run with inputs from stdin
echo '{"text": "I love this product!"}' | llmprogram run program.yaml

# Run with streaming output
llmprogram run program.yaml --inputs inputs.json --stream

# Save output to a file
llmprogram run program.yaml --inputs inputs.json --output result.json

Arguments:

program_path: The path to the program YAML file.
--inputs, -i: Path to JSON/YAML file containing inputs.
--input-json: JSON string of inputs.
--output, -o: Path to output file (default: stdout).
--stream, -s: Stream the response.

`generate-yaml`

Generate an LLM program YAML file based on description using an AI assistant.

Usage:

# Generate a YAML program with a simple description
llmprogram generate-yaml "Create a program that analyzes the sentiment of text" --output sentiment_analyzer.yaml

# Generate a YAML program with examples
llmprogram generate-yaml "Create a program that extracts key information from customer reviews" \
  --example-input "The battery life on this phone is amazing! It lasts all day." \
  --example-output '{"product_quality": "positive", "battery": "positive", "durability": "neutral"}' \
  --output review_analyzer.yaml

# Generate a YAML program and output to stdout
llmprogram generate-yaml "Create a program that summarizes long texts"

Arguments:

description: A detailed description of what the LLM program should do.
--example-input: Example of the input the program will receive.
--example-output: Example of the output the program should generate.
--output, -o: Path to output YAML file (default: stdout).
--api-key: OpenAI API key (optional, defaults to OPENAI_API_KEY env var).

`analytics`

Show analytics data collected from LLM program executions.

Usage:

# Show all analytics data
llmprogram analytics

# Show analytics for a specific program
llmprogram analytics --program sentiment_analysis

# Show analytics for a specific model
llmprogram analytics --model gpt-4

# Use a custom analytics database path
llmprogram analytics --db-path /path/to/custom/analytics.db

Arguments:

--db-path: Path to the analytics database (default: llmprogram_analytics.db).
--program: Filter by program name.
--model: Filter by model name.

`generate-dataset`

Generate an instruction dataset for LLM fine-tuning from a SQLite log file.

Usage:

llmprogram generate-dataset <database_path> <output_path>

Arguments:

database_path: The path to the SQLite database file.
output_path: The path to write the generated dataset to.

Examples

You can find more examples in the examples directory:

Sentiment Analysis: A simple program to analyze the sentiment of a piece of text. (examples/sentiment_analysis.yaml)
Code Generator: A program that generates Python code from a natural language description. (examples/code_generator.yaml)
Email Generator: A program that generates professional emails based on input parameters. (examples/email_generator.yaml)

To run the examples:

Navigate to the project directory.

Run the corresponding example command:

# Using the CLI with a JSON input file
llmprogram run examples/sentiment_analysis.yaml --inputs examples/sentiment_inputs.json

# Using the CLI with batch processing
llmprogram run examples/sentiment_analysis.yaml --inputs examples/sentiment_batch_inputs.json

# Using the CLI with streaming
llmprogram run examples/sentiment_analysis.yaml --inputs examples/sentiment_inputs.json --stream

# Using the CLI and saving output to a file
llmprogram run examples/sentiment_analysis.yaml --inputs examples/sentiment_inputs.json --output result.json

# View analytics data
llmprogram analytics

# View analytics for a specific program
llmprogram analytics --program sentiment_analysis

# Generate a new YAML program
llmprogram generate-yaml "Create a program that classifies email priority" \
  --example-input "Subject: Urgent meeting tomorrow. Body: Please prepare the Q3 report." \
  --example-output '{"priority": "high", "category": "work", "response_required": true}' \
  --output email_classifier.yaml

# Generate a dataset
llmprogram generate-dataset sentiment_analysis.db dataset.jsonl

Development

To run the tests for this package:

cargo test

To build the documentation:

cargo doc --open

License

MIT

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

llmprogram 0.1.0

LLM Program (Rust Implementation)

Features

Installation

Usage

CLI Usage

Programmatic Usage

Configuration

Using with other OpenAI-compatible endpoints

Caching

Logging and Dataset Generation

Generating a Dataset

Command-Line Interface (CLI)

run

generate-yaml

analytics

generate-dataset

Examples

Development

License

Contributing

`run`

`generate-yaml`

`analytics`

`generate-dataset`