llmprogram 0.1.0

# LLM Program (Rust Implementation)

`llmprogram` is a Rust crate that provides a structured and powerful way to create and run programs that use Large Language Models (LLMs). It uses a YAML-based configuration to define the behavior of your LLM programs, making them easy to create, manage, and share.

## Features

- **YAML-based Configuration:** Define your LLM programs using simple and intuitive YAML files.
- **Input/Output Validation:** Use JSON schemas to validate the inputs and outputs of your programs, ensuring data integrity.
- **Tera Templating:** Use the power of Tera templates (Rust's Jinja2 equivalent) to create dynamic prompts for your LLMs.
- **Caching:** Built-in support for Redis caching to save time and reduce costs.
- **Execution Logging:** Automatically log program executions to a SQLite database for analysis and debugging.
- **Analytics:** Comprehensive analytics tracking with SQLite for token usage, LLM calls, program usage, and timing metrics.
- **Streaming:** Support for streaming responses from the LLM.
- **Batch Processing:** Process multiple inputs in parallel for improved performance.
- **CLI for Dataset Generation:** A command-line interface to generate instruction datasets for LLM fine-tuning from your logged data.
- **AI-Assisted YAML Generation:** Generate LLM program YAML files automatically based on natural language descriptions.

## Installation

Add this to your `Cargo.toml`:

```toml
[dependencies]
llmprogram = "0.1.0"
```

Or install the CLI globally:

```bash
cargo install llmprogram
```

## Usage

### CLI Usage

1. **Set your OpenAI API Key:**

    ```bash
    export OPENAI_API_KEY='your-api-key'
    ```

2. **Create a program YAML file:**

    Create a file named `sentiment_analysis.yaml`:

    ```yaml
    name: sentiment_analysis
    description: Analyzes the sentiment of a given text.
    version: 1.0.0

    model:
      provider: openai
      name: gpt-4.1-mini
      temperature: 0.5
      max_tokens: 100
      response_format: json_object

    system_prompt: |
      You are a sentiment analysis expert. Analyze the sentiment of the given text and return a JSON response with the following format:
      - sentiment (string): "positive", "negative", or "neutral"
      - score (number): A score from -1 (most negative) to 1 (most positive)

    input_schema:
      type: object
      required:
        - text
      properties:
        text:
          type: string
          description: The text to analyze.

    output_schema:
      type: object
      required:
        - sentiment
        - score
      properties:
        sentiment:
          type: string
          enum: ["positive", "negative", "neutral"]
        score:
          type: number
          minimum: -1
          maximum: 1

    template: |
      Analyze the following text:
      {{text}}
    ```

3. **Run the program using the CLI:**

    ```bash
    # Using a JSON input file
    llmprogram run sentiment_analysis.yaml --inputs examples/sentiment_inputs.json
    
    # Using inline JSON
    llmprogram run sentiment_analysis.yaml --input-json '{"text": "I love this product!"}'
    
    # Using stdin
    echo '{"text": "I love this product!"}' | llmprogram run sentiment_analysis.yaml
    
    # Using streaming output
    llmprogram run sentiment_analysis.yaml --inputs examples/sentiment_inputs.json --stream
    
    # Saving output to a file
    llmprogram run sentiment_analysis.yaml --inputs examples/sentiment_inputs.json --output result.json
    ```

### Programmatic Usage

You can also use the llmprogram library directly in your Rust code:

```rust
use llmprogram::LLMProgram;
use std::collections::HashMap;
use serde_json::Value;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create and run the sentiment analysis program
    let program = LLMProgram::new("sentiment_analysis.yaml")?;
    
    let mut inputs = HashMap::new();
    inputs.insert("text".to_string(), Value::String("I love this new product! It is amazing.".to_string()));
    
    let result = program.run(&inputs).await?;
    println!("{}", serde_json::to_string_pretty(&result)?);
    
    Ok(())
}
```

## Configuration

The behavior of each LLM program is defined in a YAML file. Here are the key sections:

- `name`, `description`, `version`: Basic metadata for your program.
- `model`: Defines the LLM provider, model name, and other parameters like `temperature` and `max_tokens`.
- `system_prompt`: The instructions that are given to the LLM to guide its behavior.
- `input_schema`: A JSON schema that defines the expected input for the program. The program will validate the input against this schema before execution.
- `output_schema`: A JSON schema that defines the expected output from the LLM. The program will validate the LLM's output against this schema.
- `template`: A Tera template that is used to generate the prompt that is sent to the LLM. The template is rendered with the input variables.

## Using with other OpenAI-compatible endpoints

You can use `llmprogram` with any OpenAI-compatible endpoint, such as [Ollama](https://ollama.ai/). To do this, you can pass the `api_key` and `base_url` to the `LLMProgram` constructor:

```rust
let program = LLMProgram::new_with_options(
    "your_program.yaml",
    Some("your-api-key".to_string()),
    Some("http://localhost:11434/v1".to_string()),  // example for Ollama
    true,  // enable_cache
    "redis://localhost:6379"
)?;
```

## Caching

`llmprogram` supports caching of LLM responses to Redis to improve performance and reduce costs. To enable caching, you need to have a Redis server running.

By default, caching is enabled. You can disable it or configure the Redis connection and cache TTL (time-to-live) when you create an `LLMProgram` instance:

```rust
let program = LLMProgram::new_with_options(
    "your_program.yaml",
    None,  // api_key
    None,  // base_url
    false, // enable_cache
    "redis://localhost:6379"
)?;
```

## Logging and Dataset Generation

`llmprogram` automatically logs every execution of a program to a SQLite database. The database file is created in the same directory as the program YAML file, with a `.db` extension.

This logging feature is not just for debugging; it's also a powerful tool for creating high-quality datasets for fine-tuning your own LLMs. Each record in the log contains:

- `function_input`: The input given to the program.
- `function_output`: The output received from the LLM.
- `llm_input`: The prompt sent to the LLM.
- `llm_output`: The raw response from the LLM.

### Generating a Dataset

You can use the built-in CLI to generate an instruction dataset from the logged data. The dataset is created in JSONL format, which is commonly used for fine-tuning.

```bash
llmprogram generate-dataset /path/to/your_program.db /path/to/your_dataset.jsonl
```

Each line in the output file will be a JSON object with the following keys:

- `instruction`: The system prompt and the user prompt, combined to form the instruction for the LLM.
- `output`: The output from the LLM.

## Command-Line Interface (CLI)

`llmprogram` comes with a command-line interface for common tasks.

### `run`

Run an LLM program with inputs from command line or files.

**Usage:**

```bash
# First, set your OpenAI API key
export OPENAI_API_KEY='your-api-key'

# Run with inputs from a JSON file
llmprogram run program.yaml --inputs inputs.json

# Run with inputs from command line
llmprogram run program.yaml --input-json '{"text": "I love this product!"}'

# Run with inputs from stdin
echo '{"text": "I love this product!"}' | llmprogram run program.yaml

# Run with streaming output
llmprogram run program.yaml --inputs inputs.json --stream

# Save output to a file
llmprogram run program.yaml --inputs inputs.json --output result.json
```

**Arguments:**

- `program_path`: The path to the program YAML file.
- `--inputs`, `-i`: Path to JSON/YAML file containing inputs.
- `--input-json`: JSON string of inputs.
- `--output`, `-o`: Path to output file (default: stdout).
- `--stream`, `-s`: Stream the response.

### `generate-yaml`

Generate an LLM program YAML file based on description using an AI assistant.

**Usage:**

```bash
# Generate a YAML program with a simple description
llmprogram generate-yaml "Create a program that analyzes the sentiment of text" --output sentiment_analyzer.yaml

# Generate a YAML program with examples
llmprogram generate-yaml "Create a program that extracts key information from customer reviews" \
  --example-input "The battery life on this phone is amazing! It lasts all day." \
  --example-output '{"product_quality": "positive", "battery": "positive", "durability": "neutral"}' \
  --output review_analyzer.yaml

# Generate a YAML program and output to stdout
llmprogram generate-yaml "Create a program that summarizes long texts"
```

**Arguments:**

- `description`: A detailed description of what the LLM program should do.
- `--example-input`: Example of the input the program will receive.
- `--example-output`: Example of the output the program should generate.
- `--output`, `-o`: Path to output YAML file (default: stdout).
- `--api-key`: OpenAI API key (optional, defaults to OPENAI_API_KEY env var).

### `analytics`

Show analytics data collected from LLM program executions.

**Usage:**

```bash
# Show all analytics data
llmprogram analytics

# Show analytics for a specific program
llmprogram analytics --program sentiment_analysis

# Show analytics for a specific model
llmprogram analytics --model gpt-4

# Use a custom analytics database path
llmprogram analytics --db-path /path/to/custom/analytics.db
```

**Arguments:**

- `--db-path`: Path to the analytics database (default: llmprogram_analytics.db).
- `--program`: Filter by program name.
- `--model`: Filter by model name.

### `generate-dataset`

Generate an instruction dataset for LLM fine-tuning from a SQLite log file.

**Usage:**

```bash
llmprogram generate-dataset <database_path> <output_path>
```

**Arguments:**

- `database_path`: The path to the SQLite database file.
- `output_path`: The path to write the generated dataset to.

## Examples

You can find more examples in the `examples` directory:

- **Sentiment Analysis:** A simple program to analyze the sentiment of a piece of text. (`examples/sentiment_analysis.yaml`)
- **Code Generator:** A program that generates Python code from a natural language description. (`examples/code_generator.yaml`)
- **Email Generator:** A program that generates professional emails based on input parameters. (`examples/email_generator.yaml`)

To run the examples:

1. Navigate to the project directory.
2. Run the corresponding example command:

    ```bash
    # Using the CLI with a JSON input file
    llmprogram run examples/sentiment_analysis.yaml --inputs examples/sentiment_inputs.json
    
    # Using the CLI with batch processing
    llmprogram run examples/sentiment_analysis.yaml --inputs examples/sentiment_batch_inputs.json
    
    # Using the CLI with streaming
    llmprogram run examples/sentiment_analysis.yaml --inputs examples/sentiment_inputs.json --stream
    
    # Using the CLI and saving output to a file
    llmprogram run examples/sentiment_analysis.yaml --inputs examples/sentiment_inputs.json --output result.json
    
    # View analytics data
    llmprogram analytics
    
    # View analytics for a specific program
    llmprogram analytics --program sentiment_analysis
    
    # Generate a new YAML program
    llmprogram generate-yaml "Create a program that classifies email priority" \
      --example-input "Subject: Urgent meeting tomorrow. Body: Please prepare the Q3 report." \
      --example-output '{"priority": "high", "category": "work", "response_required": true}' \
      --output email_classifier.yaml
    
    # Generate a dataset
    llmprogram generate-dataset sentiment_analysis.db dataset.jsonl
    ```

## Development

To run the tests for this package:

```bash
cargo test
```

To build the documentation:

```bash
cargo doc --open
```

## License

MIT

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.