exp
A CLI experiment tracker for agent runs, prompt testing, and simulations.
exp helps you create experiments, define variables, execute runs, record structured results, and compare outcomes — all from the command line. It's designed to be used by both humans and autonomous agents.
Features
- Domain-agnostic — works for LLM prompt testing, agent strategy evaluation, parameter sweeps, and scientific simulations
- Self-contained storage — single SQLite database file with artifacts stored as blobs
- Agent-friendly — built-in
guideanddescribecommands let agents discover capabilities at runtime - Composable — stdin/stdout interface works with shell scripts, Python, or any language
- Zero config — single binary, no runtime dependencies
Installation
From source
Build from source
# Binary is at target/release/exp
Quick Start
# Create an experiment
# Define variables
# Run and record results
RUN=
# ... run your evaluation ...
RUN=
# ... run your evaluation ...
# Compare results
Usage
Experiment Lifecycle
Variables
Runs
Analysis
Agent Discovery
Templates
| Name | Description |
|---|---|
prompt-ab |
Compare prompt variants (A/B or multi-way) |
model-compare |
Same task across different models |
strategy-sweep |
Compare agent strategies/approaches |
param-sweep |
Sweep numeric parameters |
custom |
Blank — no pre-set variables |
Examples
Runnable Demo
A self-contained shell script that walks through the full experiment lifecycle with simulated data:
This creates an experiment, defines variables, runs four prompt strategies, records results, and compares outcomes. Uses a temp directory — no cleanup needed.
Using exp with an AI Agent
See examples/agent-workflow.md for a guide on instructing an AI agent to use exp. Covers what to put in your system prompt, the typical agent session flow, and tips for resumability, failure handling, and artifact management.
Storage
All data lives in a single SQLite file at .exp/experiments.db (relative to the working directory). Override with EXP_DB env var or --db <path>.
License
MIT