LLM Program (Rust Implementation)
llmprogram is a Rust crate that provides a structured and powerful way to create and run programs that use Large Language Models (LLMs). It uses a YAML-based configuration to define the behavior of your LLM programs, making them easy to create, manage, and share.
Features
- YAML-based Configuration: Define your LLM programs using simple and intuitive YAML files.
- Input/Output Validation: Use JSON schemas to validate the inputs and outputs of your programs, ensuring data integrity.
- Tera Templating: Use the power of Tera templates (Rust's Jinja2 equivalent) to create dynamic prompts for your LLMs.
- Caching: Built-in support for Redis caching to save time and reduce costs.
- Execution Logging: Automatically log program executions to a SQLite database for analysis and debugging.
- Analytics: Comprehensive analytics tracking with SQLite for token usage, LLM calls, program usage, and timing metrics.
- Streaming: Support for streaming responses from the LLM.
- Batch Processing: Process multiple inputs in parallel for improved performance.
- CLI for Dataset Generation: A command-line interface to generate instruction datasets for LLM fine-tuning from your logged data.
- AI-Assisted YAML Generation: Generate LLM program YAML files automatically based on natural language descriptions.
Installation
Add this to your Cargo.toml:
[]
= "0.1.0"
Or install the CLI globally:
Usage
CLI Usage
-
Set your OpenAI API Key:
-
Create a program YAML file:
Create a file named
sentiment_analysis.yaml:name: sentiment_analysis description: Analyzes the sentiment of a given text. version: 1.0.0 model: provider: openai name: gpt-4.1-mini temperature: 0.5 max_tokens: 100 response_format: json_object system_prompt: | You are a sentiment analysis expert. Analyze the sentiment of the given text and return a JSON response with the following format: - sentiment (string): "positive", "negative", or "neutral" - score (number): A score from -1 (most negative) to 1 (most positive) input_schema: type: object required: - text properties: text: type: string description: The text to analyze. output_schema: type: object required: - sentiment - score properties: sentiment: type: string enum: score: type: number minimum: -1 maximum: 1 template: | Analyze the following text: {{text}} -
Run the program using the CLI:
# Using a JSON input file # Using inline JSON # Using stdin | # Using streaming output # Saving output to a file
Programmatic Usage
You can also use the llmprogram library directly in your Rust code:
use LLMProgram;
use HashMap;
use Value;
async
Configuration
The behavior of each LLM program is defined in a YAML file. Here are the key sections:
name,description,version: Basic metadata for your program.model: Defines the LLM provider, model name, and other parameters liketemperatureandmax_tokens.system_prompt: The instructions that are given to the LLM to guide its behavior.input_schema: A JSON schema that defines the expected input for the program. The program will validate the input against this schema before execution.output_schema: A JSON schema that defines the expected output from the LLM. The program will validate the LLM's output against this schema.template: A Tera template that is used to generate the prompt that is sent to the LLM. The template is rendered with the input variables.
Using with other OpenAI-compatible endpoints
You can use llmprogram with any OpenAI-compatible endpoint, such as Ollama. To do this, you can pass the api_key and base_url to the LLMProgram constructor:
let program = new_with_options?;
Caching
llmprogram supports caching of LLM responses to Redis to improve performance and reduce costs. To enable caching, you need to have a Redis server running.
By default, caching is enabled. You can disable it or configure the Redis connection and cache TTL (time-to-live) when you create an LLMProgram instance:
let program = new_with_options?;
Logging and Dataset Generation
llmprogram automatically logs every execution of a program to a SQLite database. The database file is created in the same directory as the program YAML file, with a .db extension.
This logging feature is not just for debugging; it's also a powerful tool for creating high-quality datasets for fine-tuning your own LLMs. Each record in the log contains:
function_input: The input given to the program.function_output: The output received from the LLM.llm_input: The prompt sent to the LLM.llm_output: The raw response from the LLM.
Generating a Dataset
You can use the built-in CLI to generate an instruction dataset from the logged data. The dataset is created in JSONL format, which is commonly used for fine-tuning.
Each line in the output file will be a JSON object with the following keys:
instruction: The system prompt and the user prompt, combined to form the instruction for the LLM.output: The output from the LLM.
Command-Line Interface (CLI)
llmprogram comes with a command-line interface for common tasks.
run
Run an LLM program with inputs from command line or files.
Usage:
# First, set your OpenAI API key
# Run with inputs from a JSON file
# Run with inputs from command line
# Run with inputs from stdin
|
# Run with streaming output
# Save output to a file
Arguments:
program_path: The path to the program YAML file.--inputs,-i: Path to JSON/YAML file containing inputs.--input-json: JSON string of inputs.--output,-o: Path to output file (default: stdout).--stream,-s: Stream the response.
generate-yaml
Generate an LLM program YAML file based on description using an AI assistant.
Usage:
# Generate a YAML program with a simple description
# Generate a YAML program with examples
# Generate a YAML program and output to stdout
Arguments:
description: A detailed description of what the LLM program should do.--example-input: Example of the input the program will receive.--example-output: Example of the output the program should generate.--output,-o: Path to output YAML file (default: stdout).--api-key: OpenAI API key (optional, defaults to OPENAI_API_KEY env var).
analytics
Show analytics data collected from LLM program executions.
Usage:
# Show all analytics data
# Show analytics for a specific program
# Show analytics for a specific model
# Use a custom analytics database path
Arguments:
--db-path: Path to the analytics database (default: llmprogram_analytics.db).--program: Filter by program name.--model: Filter by model name.
generate-dataset
Generate an instruction dataset for LLM fine-tuning from a SQLite log file.
Usage:
Arguments:
database_path: The path to the SQLite database file.output_path: The path to write the generated dataset to.
Examples
You can find more examples in the examples directory:
- Sentiment Analysis: A simple program to analyze the sentiment of a piece of text. (
examples/sentiment_analysis.yaml) - Code Generator: A program that generates Python code from a natural language description. (
examples/code_generator.yaml) - Email Generator: A program that generates professional emails based on input parameters. (
examples/email_generator.yaml)
To run the examples:
-
Navigate to the project directory.
-
Run the corresponding example command:
# Using the CLI with a JSON input file # Using the CLI with batch processing # Using the CLI with streaming # Using the CLI and saving output to a file # View analytics data # View analytics for a specific program # Generate a new YAML program # Generate a dataset
Development
To run the tests for this package:
To build the documentation:
License
MIT
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.