LLMY
All-in-one LLM utilities for Rust — plug OpenAI / Azure settings straight into clap, track spend with built-in billing, and replay every request when things go wrong.
Harnessing An Agent
The harness layer gives you a concrete in-memory Agent that can hold conversation state, expose tools to the model, and run a full user turn through any tool-call loop. A minimal coding agent only needs a system prompt, an LLM, and a ToolBox with the tools you want to expose.
The example below builds a basic agent that can read files, list directories, and search for files by glob pattern:
[]
= { = "4", = ["derive"] }
= "0.7"
= { = "1", = ["macros", "rt-multi-thread"] }
use PathBuf;
use Parser;
use ToolBox;
use ;
use OpenAISetup;
use Agent;
async
Run it with your OpenAI settings:
OPENAI_API_KEY=sk-...
CLI
Install the command-line tool:
llmy chat — interactive chat
OPENAI_API_KEY=sk-...
You: Explain async Rust in one sentence.
Assistant: Async Rust uses futures and an executor to let you write non-blocking,
concurrent code with zero-cost abstractions at compile time.
Supports --system for a custom system prompt. Reads from stdin when not a TTY.
llmy tokenizer — count tokens offline
|
# 4
# 0 9906 "Hello"
# 1 11 ","
# 2 1917 " world"
# 3 0 "!"
# 4
llmy models — list supported models
Model Input (per 1M) Output (per 1M) Max Input Max Output Encoding
anthropic/claude-sonnet-4 $3.00 $15.00 136000 64000 claude
google/gemini-2.5-flash $0.30 $2.50 936000 64000 o200k_base
google/gemini-2.5-pro $1.25 $10.00 983040 65536 o200k_base
openai/gpt-4.1 $2.00 $8.00 1014808 32768 o200k_base
openai/gpt-4o $2.50 $10.00 111616 16384 o200k_base
openai/gpt-4o-mini $0.15 $0.60 111616 16384 o200k_base
openai/o1 $15.00 $60.00 100000 100000 o200k_base
openai/o3 $2.00 $8.00 100000 100000 o200k_base
openai/o4-mini $1.10 $4.40 100000 100000 o200k_base
… (112 models total)
Library
Add the dependency (the root crate re-exports everything):
[]
= "0.7"
1. Clap integration — up to 3 LLM slots
llmy-clap provides three generated arg structs (OpenAISetup, OptOpenAISetup, OptOptOpenAISetup) so you can wire one, two, or three LLMs into any clap-based CLI with zero boilerplate. Each slot is controlled by its own set of env-vars / flags, and can be converted to the core LLM client in one call.
use Parser;
use OpenAISetup; // primary
use OptOpenAISetup; // optional secondary
async
Run it:
# OpenAI
OPENAI_API_KEY=sk-...
# Azure
OPENAI_API_KEY=...
Every setting (temperature, timeout, retries, max tokens, reasoning effort, tool choice, …) is exposed as a flag and an env-var:
| Flag | Env var | Default |
|---|---|---|
--model |
OPENAI_API_MODEL |
o1 |
--llm-temperature |
LLM_TEMPERATURE |
— |
--llm-presence-penalty |
LLM_PRESENCE_PENALTY |
— |
--llm-max-completion-tokens |
LLM_MAX_COMPLETION_TOKENS |
— |
--top-p |
LLM_TOP_P |
— |
--llm-retry |
LLM_RETRY |
5 |
--llm-prompt-timeout |
LLM_PROMPT_TIMEOUT |
1200 (s) |
--llm-stream |
LLM_STREAM |
false |
--reasoning-effort |
LLM_REASONING_EFFORT |
— |
The second and third slots use the prefixes OPT_ and OPT_OPT_ for their env-vars (e.g. OPT_OPENAI_API_KEY, OPT_OPT_OPENAI_API_MODEL).
2. Detailed debug logging (LLM_DEBUG)
Point LLM_DEBUG at a directory and every LLM round-trip is saved as an XML-like .xml (not strict XML — just an easy-to-skim tagged format) and a raw .json — perfect for post-mortem debugging or dataset building.
LLM_DEBUG=./debug_logs OPENAI_API_KEY=sk-...
This creates a per-process subfolder with numbered files:
debug_logs/
└── 48291-0-main/
├── llm-000000000001.xml
├── llm-000000000001.json
├── llm-000000000002.xml
└── llm-000000000002.json
The .xml file looks like:
=====================
You are a helpful assistant.
Explain async Rust in one sentence.
{
"type": "object",
"properties": { "query": { "type": "string" } }
}
=====================
=====================
Async Rust lets you write concurrent code ...
=====================
The .json companion contains the full serialised CreateChatCompletionRequest / CreateChatCompletionResponse objects for programmatic analysis.
3. Built-in billing with automatic budget enforcement
llmy ships with up-to-date per-token pricing for 110+ models (GPT-4o, o1, o3, GPT-5 family, Claude, Gemini, …). Token usage is tracked in real-time including cached-input and reasoning token discounts. When spend exceeds the budget cap the client returns LLMYError::Billing immediately — no more surprise bills.
use ;
use LLMSettings;
let settings = default;
let model = "gpt-4o".parse.unwrap;
let llm = LLMnew;
match llm.prompt_once.await
Via clap the cap defaults to $10 and can be overridden:
For models not in the built-in list, pass pricing inline:
# name, in, out, cached
4. Offline token estimation
llmy includes a built-in tokenizer with fast, offline BPE token estimation for 110+ models across OpenAI, Anthropic, Google, and more. Encodings and model metadata are baked into the binary at compile time — no network calls, no data files to ship.
Four encodings are supported: cl100k_base, o200k_base, p50k_base (OpenAI / tiktoken) and claude (Anthropic).
use ;
// Encode text into token IDs
let tokens: = encode;
// Count tokens directly
let n = count_tokens;
// Or let the library resolve the encoding from a model ID
let n = count_tokens_for_model; // Some(4)
let n = count_tokens_for_model; // Some(4)
The model registry is generated from the same source-of-truth JSON used by the billing system, so model look-ups, pricing, and token counts always stay in sync.
5. Defining tools with Tool and #[tool(...)]
llmy-agent models callable tools as a Rust trait. A tool has a strongly typed argument struct, a stable tool name, an optional description, and an async invoke method that returns Result<String, LLMYError>.
You can depend on either the focused crate pair:
[]
= "0.5"
= "0.7"
or the root crate plus the derive crate:
[]
= "0.7"
= "0.7"
The trait contract is:
use Future;
use ;
use JsonSchema;
use DeserializeOwned;
In practice you usually write the typed arguments and the async method, then let llmy-agent-derive generate the impl Tool for you:
use PathBuf;
use LLMYError;
use tool;
use JsonSchema;
use Deserialize;
Notes:
argumentsandinvokeare required in#[tool(...)].descriptionis optional.nameis optional; if omitted, the struct name is converted tosnake_case, for exampleReadFileTool -> read_file_tool.- The generated impl works with either
llmy_agent::Toolorllmy::agent::Tool.
License
MIT