llm-core 0.11.0

# llm-rs

A Unix-philosophy agentic CLI for Large Language Models. Inspired by [simonw/llm](https://github.com/simonw/llm), built for composability --- stdin/stdout pipelines, subprocess-based tool and provider extensibility (`llm-tool-*`, `llm-provider-*`), and multi-target output (native CLI, WASM, Python).

**Scope.** llm-rs is a library (with a CLI), not an orchestration framework. Hierarchical workflows compose via specialist tools — small `llm-tool-*` executables that may internally invoke `llm prompt` with a narrow agent. See [`doc/spec/external-tools.md`](doc/spec/external-tools.md) for the protocol, [`doc/cookbook/specialist-tools.md`](doc/cookbook/specialist-tools.md) for a worked example, and [`doc/research/specialist-tools-vs-sub-agents.md`](doc/research/specialist-tools-vs-sub-agents.md) for why llm-rs is not building recursive sub-agent delegation.

## Usage

```bash
# Send a prompt (streams to stdout)
echo "Hello" | llm

# Positional text works too
llm "Explain monads in one sentence" -m claude-sonnet-4-6

# Specify model and system prompt
llm "What is 2+2?" -m gpt-4o -s "Answer only with the number"

# Use Anthropic models
llm "Hello" -m claude-sonnet-4-6

# Disable streaming
llm "Hello" --no-stream

# Show token usage on stderr
llm "Hello" -u

# Skip logging this prompt
llm "Hello" -n
```

### Tool calling

Built-in tools let the model call functions during a conversation. The CLI manages the chain loop automatically --- it sends tool calls to the executor, feeds results back, and repeats until the model responds with text.

```bash
# Enable a built-in tool
llm "What time is it?" -T llm_time

# Multiple tools
llm "What version are you and what time is it?" -T llm_version -T llm_time

# Limit chain iterations (default: 5 for prompt/chat, 10 for agents)
llm "Do something" -T llm_version --chain-limit 3

# Debug mode: show tool calls/results on stderr
llm "What version?" -T llm_version --tools-debug

# Verbose mode: see chain loop iterations (-v summary, -vv full messages)
llm "What time is it?" -T llm_time --verbose
llm "What time is it?" -T llm_time -vv

# List available built-in tools
llm tools list
```

Available built-in tools:
- `llm_version` --- returns the CLI version
- `llm_time` --- returns current UTC and local time with timezone

### Verbose chain observability

When using tools, the `-v`/`--verbose` flag reveals what happens inside the chain loop --- which iteration you're on, what messages are being sent, per-iteration token usage, and tool call/result details.

```bash
# Level 1 (-v): iteration summary + tool debug
llm "What time is it?" -T llm_time -v
# stderr output:
#   [chain] Iteration 1/5 | 1 message [user]
#   [chain] Iteration 1 complete | usage: 10 input, 5 output | 1 tool call(s)
#   Tool call: llm_time (id: call_1)
#   Arguments: {}
#   Tool result: {"utc_time":"...","local_time":"...","timezone":"..."}
#   [chain] Iteration 2/5 | 3 messages [user, assistant+tools(1), tool(1)]
#   [chain] Iteration 2 complete | usage: 20 input, 10 output | 0 tool call(s)

# Level 2 (-vv): also dumps full message JSON per iteration
llm "What time is it?" -T llm_time -vv
# stderr additionally includes:
#   [chain] Messages:
#   [
#     {"role": "user", "content": "What time is it?"}
#   ]
```

`--verbose` implies `--tools-debug` --- no need for both flags. Works on both `prompt` and `chat` commands.

### External tools

Any executable on `$PATH` named `llm-tool-*` is automatically discovered and usable with `-T`. External tools can be written in any language.

```bash
# List all tools (built-in + external)
llm tools list

# Use an external tool
llm "Make this loud: hello" -T upper -m gpt-4o

# Mix built-in and external tools
llm "What time is it, and shout it" -T llm_time -T shout
```

Writing an external tool requires two things:

1. **Schema**: respond to `--schema` with JSON describing the tool:
   ```bash
   $ llm-tool-upper --schema
   {"name":"upper","description":"Uppercase text","input_schema":{"type":"object","properties":{"text":{"type":"string"}},"required":["text"]}}
   ```

2. **Execution**: read arguments JSON from stdin, write result to stdout:
   ```bash
   $ echo '{"text":"hello"}' | llm-tool-upper
   HELLO
   ```

Exit 0 means success (stdout = output). Non-zero means error (stderr = error message). Default timeout: 30 seconds.

**Specialist tools.** An external tool can itself call `llm prompt` internally with a cheaper model and a narrow system prompt — an opaque "specialist" function from the parent LLM's perspective. This is llm-rs's answer to hierarchical workflows, in place of a recursive sub-agent runtime. Worked example: [`doc/cookbook/specialist-tools.md`](doc/cookbook/specialist-tools.md). Rationale: [`doc/research/specialist-tools-vs-sub-agents.md`](doc/research/specialist-tools-vs-sub-agents.md).

### External providers

Any executable on `$PATH` named `llm-provider-*` extends llm-rs with new model providers. External providers can serve models from Ollama, llama.cpp, or any custom backend.

```bash
# Models from external providers appear alongside built-in ones
llm models list

# Use a model from an external provider
llm "Hello" -m llama3

# See all providers and tools
llm plugins list
```

Writing an external provider requires metadata flags and a JSON stdin/stdout protocol:

- `--id` --- print the provider name (e.g. `ollama`)
- `--models` --- print JSON array of model metadata
- `--needs-key` --- print `{"needed":false}` or `{"needed":true,"env_var":"MY_KEY"}`

On invocation, the provider reads a JSON request from stdin and writes either streaming JSONL lines or a single JSON response to stdout. See `doc/implementation.md` for the full protocol specification.

### Conversations

Continue previous conversations, use multi-turn message input, and chat interactively.

```bash
# Continue the most recent conversation
llm -c "And what about 3+3?"

# Continue a specific conversation by ID
llm --cid 01j5a... "Follow up question"

# Load messages from a JSON file
llm --messages conversation.json "What next?"

# Load messages from stdin
echo '[{"role":"user","content":"hi"},{"role":"assistant","content":"hello!"}]' | llm --messages - "Follow up"

# Get JSON output instead of streaming text
llm --json "What is 2+2?"

# Combine: messages input with JSON output
llm --messages history.json --json "Summarize"
```

### Interactive chat

```bash
# Start an interactive chat session
llm chat

# Chat with a specific model and system prompt
llm chat -m claude-sonnet-4-6 -s "You are a helpful assistant"

# Chat with tools enabled
llm chat -T llm_time -T llm_version

# Chat with verbose tool chain output
llm chat -T llm_time -v
```

### Parallel tool execution

When the model requests multiple tool calls in a single turn, llm-rs dispatches them concurrently by default. Results are returned in the same order the model asked for them.

```bash
# Default: parallel dispatch, unlimited concurrency within a single iteration
llm "Check version and time" -T llm_version -T llm_time

# Cap concurrency
llm "Run N tools" -T tool_a -T tool_b --max-parallel-tools 2

# Force sequential dispatch (e.g. to inspect tools one at a time)
llm "Run N tools" -T tool_a -T tool_b --sequential-tools
```

`--tools-approve` forces sequential dispatch automatically so approval prompts don't interleave on stdin. Flags apply to `prompt`, `chat`, and `agent run`. Agents can set `parallel_tools` / `max_parallel_tools` in TOML; CLI flags override.

### Agents

Agents are TOML files that bundle a system prompt, model, tools, chain limit, options, budget, retry, and parallel-tool config. Global agents live in `~/.config/llm/agents/`; project-local agents in `./.llm/agents/` (local shadows global).

```bash
llm agent init researcher              # Scaffold a local agent template
llm agent init planner --global        # Scaffold a global agent
llm agent list                         # List discovered agents (name, model, source)
llm agent show researcher              # Print resolved agent config
llm agent path                         # Print global and local agent directory paths

# Run an agent
llm agent run researcher "summarize recent changes"
echo "some input" | llm agent run researcher

# CLI flags override agent TOML
llm agent run researcher "hi" -m claude-sonnet-4-6 --chain-limit 3 -v

# Dry-run: resolve model, provider, tools, options, budget, retry, and parallel config without calling the LLM
llm agent run researcher "hi" --dry-run
llm agent run researcher "hi" --dry-run --json
llm agent run researcher "hi" --dry-run -vv   # also includes the serialized Prompt payload
```

Example `~/.config/llm/agents/researcher.toml`:

```toml
model = "claude-sonnet-4-6"
system_prompt = "You are a careful research assistant."
tools = ["llm_time", "llm_version"]
chain_limit = 10
parallel_tools = true
max_parallel_tools = 4

[options]
temperature = 0.2

[budget]
max_tokens = 50000

[retry]
max_retries = 3
base_delay_ms = 1000
```

### Budget tracking

Token usage accumulates across chain iterations. Pass `-u` to print cumulative totals; set `[budget] max_tokens` in an agent file to stop the chain when the total exceeds the cap. The chain finishes the current turn, emits a `[budget]` warning, and returns the partial result.

```bash
# Show cumulative usage across all chain iterations
llm "Plan a trip" -T llm_time -u

# llm chat prints a session-wide usage summary on exit
llm chat -u
```

### Retry and backoff

Transient HTTP errors (429, 5xx) are retried with exponential backoff and jitter before any response bytes are streamed. Configure per-invocation with `--retries` or per-agent via `[retry]`.

```bash
llm "Hello" --retries 5
llm chat --retries 3
llm agent run researcher "hi" --retries 5   # overrides agent TOML
```

### Options and aliases

Set persistent per-model options and model-name aliases in `config.toml`. CLI `-o` flags override config defaults per invocation.

```bash
# Options
llm options set gpt-4o temperature 0.7
llm options set gpt-4o max_tokens 1000
llm options get gpt-4o
llm options list
llm options clear gpt-4o temperature

# Aliases
llm aliases set fast gpt-4o-mini
llm aliases set claude claude-sonnet-4-6
llm aliases list
llm aliases show fast
llm aliases remove fast
llm aliases path

llm "Hello" -m claude                       # Uses the alias
llm "Hello" -o temperature 0.9              # Overrides config default
```

### Structured output

Force the model to return JSON conforming to a schema. Works with both OpenAI (native `response_format`) and Anthropic (transparent tool wrapping).

```bash
# Schema DSL: simple field definitions
llm "Extract: John is 30" --schema "name str, age int"

# With field descriptions
llm "Extract: John is 30" --schema "name str:The person's name, age int:Their age"

# JSON Schema literal
llm "Extract name" --schema '{"type":"object","properties":{"name":{"type":"string"}},"required":["name"]}'

# Schema from a file
llm "Extract data" --schema schema.json

# Multiple items: wrap in array
llm "List the planets" --schema "name str, diameter_km int" --schema-multi

# Preview DSL output
llm schemas dsl "name str, age int"
```

Schema DSL types: `str` (default), `int`, `float`, `bool`.

### Key management

```bash
llm keys set openai          # Prompted for key (hidden input)
llm keys set anthropic       # Set Anthropic API key
llm keys get openai          # Print stored key
llm keys list                # List all stored key names
llm keys path                # Print path to keys.toml
```

Keys are resolved in order: `--key` flag, `keys.toml`, environment variable (e.g. `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`).

### Model management

```bash
llm models list              # List available models (OpenAI + Anthropic)
llm models default           # Show current default model
llm models default gpt-4o    # Set default model
```

Available models:
- **OpenAI:** `gpt-4o`, `gpt-4o-mini`
- **Anthropic:** `claude-opus-4-6`, `claude-sonnet-4-6`, `claude-haiku-4-5`

### Conversation logs

Every prompt is logged to a JSONL file (one per conversation). Logs are plain text --- inspect them with `cat`, `grep`, `jq`.

```bash
llm logs list                # List recent conversations
llm logs list --json         # JSON output (pipe to jq)
llm logs list -r             # Print the most recent response text
llm logs list -m gpt-4o      # Filter by model
llm logs list -q "rust"      # Full-text search
llm logs list -u             # Show token usage
llm logs path                # Print logs directory path
llm logs status              # Show logging on/off state
llm logs on                  # Enable logging
llm logs off                 # Disable logging
```

Log files live at `~/.local/share/llm/logs/`. Each file is a JSONL conversation:

```jsonl
{"type":"conversation","v":1,"id":"01j5a...","model":"gpt-4o","name":"Hello","created":"2026-04-03T12:00:00Z"}
{"type":"response","id":"01j5b...","model":"gpt-4o","prompt":"Hello","response":"Hi!","usage":{"input":5,"output":3},"duration_ms":230,...}
```

### Schema management

```bash
llm schemas dsl "name str, age int"   # Preview DSL -> JSON Schema
llm schemas list                      # List schemas used in logs
llm schemas show <id>                 # Show schema by ID
```

### Plugins

```bash
llm plugins list    # Show all providers (compiled + external) and external tools
```

Example output:
```
Compiled providers:
  openai (2 models: gpt-4o, gpt-4o-mini)
  anthropic (3 models: claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5)

External providers:
  ollama (/usr/local/bin/llm-provider-ollama) (3 models: llama3, mistral, phi3)

External tools:
  web_search (/usr/local/bin/llm-tool-web-search) — Search the web
  upper (/usr/local/bin/llm-tool-upper) — Uppercase text
```

### Exit codes

| Code | Meaning |
|------|---------|
| 0 | Success |
| 1 | Runtime error (I/O failure, storage error) |
| 2 | Configuration error (missing key, unknown model, bad config) |
| 3 | Provider error (API failure, network timeout) |

## Library usage

In addition to the CLI, llm-rs can be used as a library from JavaScript/TypeScript (via WASM) or Python (via native module). Both support OpenAI and Anthropic.

### WASM (browser / Obsidian plugin)

```typescript
import init, { LlmClient } from '@llm-rs/wasm';

await init();

// Auto-detects provider from model name
const openai = new LlmClient('sk-...', 'gpt-4o');
const claude = new LlmClient('sk-ant-...', 'claude-sonnet-4-6');

// Or use explicit constructors
const client = LlmClient.newAnthropic('sk-ant-...', 'claude-sonnet-4-6');
const custom = LlmClient.newAnthropicWithBaseUrl('sk-ant-...', 'claude-sonnet-4-6', 'https://my-proxy.example.com');

// Non-streaming
const response = await client.prompt('Hello');

// With system prompt
const answer = await client.promptWithSystem('What is 2+2?', 'Answer only with the number');

// Streaming (callback per chunk)
await client.promptStreaming('Tell me a story', (chunk) => {
    process.stdout.write(chunk);
});

// With options
const result = await client.promptWithOptions(
    'Hello',
    null,  // system prompt (optional)
    '{"temperature": 0.7, "max_tokens": 1000}'
);
```

Build from source:

```bash
wasm-pack build crates/llm-wasm --target web       # ES module for browsers
wasm-pack build crates/llm-wasm --target bundler    # For webpack/rollup (Obsidian plugins)
```

The WASM module is stateless --- no config files, no log storage. HTTP goes through the browser's `fetch()` API. The host application manages API keys and persistence.

### Python

```python
import llm_rs

# Auto-detects provider from model name
client = llm_rs.LlmClient("sk-...", "gpt-4o-mini")
claude = llm_rs.LlmClient("sk-ant-...", "claude-sonnet-4-6")

# Or specify provider explicitly
client = llm_rs.LlmClient("sk-ant-...", "claude-sonnet-4-6", provider="anthropic")

# Non-streaming
response = client.prompt("Hello, world!")
print(response)

# With system prompt
answer = client.prompt("What is 2+2?", system="Answer only with the number")

# Streaming (Python iterator)
for chunk in client.prompt_stream("Tell me a story"):
    print(chunk, end="", flush=True)
```

Build from source (requires [uv](https://docs.astral.sh/uv/)):

```bash
cd crates/llm-python
uv venv && uv pip install maturin
uv run maturin develop           # Install to current venv
uv run maturin build --release   # Build wheel for distribution
```

Optional parameters: `provider` (`"openai"` or `"anthropic"`), `base_url` for custom API endpoints, `log_dir` to enable JSONL logging.

## Installation

Requires Rust 1.85+ (2024 edition).

```bash
git clone https://github.com/user/llm-rs
cd llm-rs
cargo install --path crates/llm-cli
```

Or build from the workspace:

```bash
cargo build --release -p llm-cli
# Binary is at target/release/llm
```

## Configuration

Config files live in XDG-standard directories:

```
~/.config/llm/config.toml     # Main configuration
~/.config/llm/keys.toml       # API keys (0600 permissions)
~/.local/share/llm/logs/      # Conversation logs (JSONL)
```

Set `LLM_USER_PATH` to put everything in one directory (useful for testing or migrating from Python `llm`).

**config.toml:**

```toml
default_model = "gpt-4o-mini"
logging = true

[aliases]
claude = "claude-sonnet-4-6"
fast = "gpt-4o-mini"
```

**keys.toml:**

```toml
openai = "sk-..."
anthropic = "sk-ant-..."
```

### Environment variables

| Variable | Purpose |
|----------|---------|
| `OPENAI_API_KEY` | OpenAI API key (fallback if not in keys.toml) |
| `ANTHROPIC_API_KEY` | Anthropic API key (fallback if not in keys.toml) |
| `OPENAI_BASE_URL` | Override OpenAI API endpoint (for compatible APIs) |
| `ANTHROPIC_BASE_URL` | Override Anthropic API endpoint |
| `LLM_DEFAULT_MODEL` | Override default model |
| `LLM_USER_PATH` | Override config/data directory (flat layout) |

## Architecture

Seven Rust crates in a Cargo workspace:

```
crates/
  llm-core/      Traits, types, streaming, errors, config, keys, schema DSL, chain loop
  llm-openai/    OpenAI Chat API provider (streaming + tools + structured output)
  llm-anthropic/ Anthropic Messages API provider (streaming + tools + structured output)
  llm-store/     JSONL conversation log storage and queries
  llm-cli/       CLI binary (the `llm` command)
  llm-wasm/      WASM library for browser/Obsidian (excluded from workspace)
  llm-python/    Python native module via PyO3 (excluded from workspace)
```

Dependency flow: `llm-cli`, `llm-wasm`, and `llm-python` are top-level entry points -> `llm-openai` + `llm-anthropic` + `llm-store` -> `llm-core`. No cycles.

Key design choices vs the Python original:

- **Subprocess extensibility, not in-process plugins.** Instead of Python's pluggy-based plugin system, external tools (`llm-tool-*`) and providers (`llm-provider-*`) are standalone executables discovered on `$PATH`. Any language can implement the JSON stdin/stdout protocol. Compiled-in providers (OpenAI, Anthropic) are feature-gated for a minimal core binary.
- **JSONL storage.** One file per conversation instead of SQLite. Append-only, human-readable, no migrations.
- **Async-first.** Single `Provider` trait using futures streams, no sync/async class duplication.
- **TOML config.** Two files (`config.toml` + `keys.toml`) instead of six scattered JSON/YAML/text files.
- **Feature-gated providers.** Compile only the providers you need: `--features openai,anthropic` (both default), or `--no-default-features` for a minimal binary.
- **Multi-target.** Core crates compile for both native and `wasm32-unknown-unknown`. The same provider code runs in the CLI, in a browser, and in a Python module.

See [`doc/design/architecture.md`](doc/design/architecture.md) for design rationale, [`doc/roadmap.md`](doc/roadmap.md) for the phased roadmap.

## Testing

```bash
cargo test --workspace    # 530 tests across core workspace crates
```

| Crate | Tests | What's covered |
|-------|------:|----------------|
| `llm-core` | 198 | Types, config, keys, streams, schema DSL, chain loop, ChainEvent, ParallelConfig dispatch, messages, agent config, retry, budget (mock provider) |
| `llm-openai` | 44 | HTTP mocking (wiremock), SSE parsing, tool calls, structured output, multi-turn, HttpError mapping |
| `llm-anthropic` | 50 | HTTP mocking (wiremock), typed SSE, tool_use blocks, transparent schema wrapping, multi-turn, HttpError mapping |
| `llm-store` | 49 | JSONL round-trips, unicode, malformed recovery, listing/queries, message reconstruction |
| `llm-cli` | 189 | Subprocess protocol/discovery/execution, retry wrapper, dry-run rendering (62 unit), CLI integration (127 e2e with assert_cmd) |

Library targets are verified by their build toolchains: `wasm-pack build` for WASM, `maturin develop` for Python.

## Status

Current version: **v0.9**. Phases 1–9 complete. See [`doc/roadmap.md`](doc/roadmap.md) for the full status table and remaining work.

- **v0.1** --- CLI, WASM library, Python module; OpenAI + Anthropic providers end-to-end.
- **v0.2** --- Tool calling, chain loop, built-in tools, structured output, schema DSL.
- **v0.3** --- Multi-turn conversations, `-c`/`--cid`, `llm chat` REPL, full `llm logs`.
- **v0.4** --- Subprocess extensibility (`llm-tool-*`, `llm-provider-*`), `llm plugins`, `-v/--verbose`, `-o/--option`, aliases.
- **v0.5** --- Agent config & discovery (`llm agent run/list/show/init/path`).
- **v0.6** --- Budget tracking with cumulative usage and per-chain enforcement.
- **v0.7** --- Retry/backoff with exponential delay and jitter for transient HTTP errors.
- **v0.8** --- `--dry-run` for `llm agent run` (plain or `--json`).
- **v0.9** --- Parallel tool execution within a chain iteration, order-preserving, opt-out with `--sequential-tools`.

Next up: Ollama provider, attachments, extract flags. See the Future Work section of the roadmap. Sub-agent delegation and an agent memory system are explicitly parked — llm-rs delegates hierarchical workflows to [specialist tools](doc/cookbook/specialist-tools.md). See [the design note](doc/research/specialist-tools-vs-sub-agents.md) for the rationale.

**Out of scope:** token budget enforcement across nested invocations. Each `llm` call tracks its own budget; users who need hierarchical budget caps should do shell-level accounting (e.g., sum usage from `--json` output across a wrapping script).

## License

GPLv3