synaps 0.3.3 - Docs.rs

# Cache Benchmark Suite

21-turn tool-heavy conversation benchmark for measuring Anthropic prompt caching effectiveness.

## What it measures
- Per-turn: input tokens, output tokens, cache read, cache write, hit %, cost
- Session totals: aggregate cost, average hit rate, cache efficiency
- Latency: TTFT per turn (when streaming)

## How it works
Each run simulates a realistic coding session:
1. Scaffolds a project in a temp dir
2. Fires 21 sequential prompts that create, read, edit, grep, and refactor files
3. Every answer is verifiable — the benchmark checks file state after each turn
4. Outputs a JSONL log + summary table

## Usage
```bash
# Set auth
export ANTHROPIC_API_KEY=sk-ant-...

# Run with default caching (current strategy)
python3 bench/run.py

# Run with a specific strategy
python3 bench/run.py --strategy single-last    # 1 marker on last message
python3 bench/run.py --strategy sliding-4       # current: 2 markers, stride 4
python3 bench/run.py --strategy last-3          # OpenCode style: last 3 messages

# Compare runs
python3 bench/compare.py results/run-*.jsonl
```