exp 0.1.2

CLI experiment tracker for agent runs, prompt testing, and simulations
exp-0.1.2 is not a library.

exp

A CLI experiment tracker for agent runs, prompt testing, and simulations.

exp helps you create experiments, define variables, execute runs, record structured results, and compare outcomes — all from the command line. It's designed to be used by both humans and autonomous agents.

Features

  • Domain-agnostic — works for LLM prompt testing, agent strategy evaluation, parameter sweeps, and scientific simulations
  • Self-contained storage — single SQLite database file with artifacts stored as blobs
  • Agent-friendly — built-in guide and describe commands let agents discover capabilities at runtime
  • Composable — stdin/stdout interface works with shell scripts, Python, or any language
  • Zero config — single binary, no runtime dependencies

Installation

From source

cargo install --path .

Build from source

git clone https://github.com/jstockdi/exp.git
cd exp
cargo build --release
# Binary is at target/release/exp

Quick Start

# Create an experiment
exp create "cot-eval" --description "Test chain-of-thought on legal docs"

# Define variables
exp var set cot-eval --control model=claude-sonnet-4-20250514
exp var set cot-eval --independent strategy="direct,cot,cot+fanout"

# Run and record results
RUN=$(exp run start cot-eval --strategy="direct")
# ... run your evaluation ...
exp run record "$RUN" --output '{"accuracy": 0.85, "tokens": 940}'

RUN=$(exp run start cot-eval --strategy="cot")
# ... run your evaluation ...
exp run record "$RUN" --output '{"accuracy": 0.92, "tokens": 1240}'

# Compare results
exp compare cot-eval --sort-by accuracy

Usage

Experiment Lifecycle

exp create <name>                    # Create a new experiment
exp list [--status <status>]         # List experiments
exp status <experiment>              # Show experiment status
exp delete <experiment> [--force]    # Delete an experiment

Variables

exp var set <experiment> --control <name>=<value>           # Set a constant
exp var set <experiment> --independent <name>="val1,val2"   # Set a variable that changes per run
exp var list <experiment>                                    # List variables
exp var rm <experiment> <name>                               # Remove a variable

Runs

exp run start <experiment> --key=value    # Start a run (prints run ID)
exp run record <run-id> --output <json>   # Record results (file, stdin with -, or inline JSON)
exp run fail <run-id> [--reason <text>]   # Mark a run as failed
exp run comment <run-id> <text>           # Add a comment to a run
exp run artifact <run-id> <file>          # Attach a file artifact
exp run list <experiment>                 # List runs
exp run show <run-id>                     # Show run details

Analysis

exp compare <experiment>                  # Compare all runs side by side
exp compare <experiment> --sort-by accuracy --desc
exp compare <experiment> --where "tokens<2000"
exp compare <experiment> --format csv
exp export <experiment> --format json     # Export full data

Agent Discovery

exp guide                     # Full usage walkthrough
exp guide --format json       # Structured for programmatic consumption
exp templates                 # List built-in templates
exp describe <experiment>     # Introspect current state and remaining work
exp plan <experiment>         # Generate shell script for remaining runs

Templates

Name Description
prompt-ab Compare prompt variants (A/B or multi-way)
model-compare Same task across different models
strategy-sweep Compare agent strategies/approaches
param-sweep Sweep numeric parameters
custom Blank — no pre-set variables
exp create my-test --template prompt-ab

Examples

Runnable Demo

A self-contained shell script that walks through the full experiment lifecycle with simulated data:

./examples/prompt-eval.sh

This creates an experiment, defines variables, runs four prompt strategies, records results, and compares outcomes. Uses a temp directory — no cleanup needed.

Using exp with an AI Agent

See examples/agent-workflow.md for a guide on instructing an AI agent to use exp. Covers what to put in your system prompt, the typical agent session flow, and tips for resumability, failure handling, and artifact management.

Storage

All data lives in a single SQLite file at .exp/experiments.db (relative to the working directory). Override with EXP_DB env var or --db <path>.

License

MIT