# testwall
CLI tool that enforces test immutability for agentic TDD workflows. Prevents implementing agents from cheating test gates by snapshotting test files, locking them read-only, and verifying integrity before accepting implementation results.
## Background
LLM coding agents routinely cheat test gates — weakening assertions, deleting failing tests, modifying test config, or special-casing test inputs. Research (ImpossibleBench, arxiv 2510.20270) shows frontier models exploit test cases 76% of the time when given write access, but cheating drops to near zero when tests are hidden or read-only. testwall enforces that boundary.
## Architecture
Single-binary Rust CLI. No runtime dependencies. The core mechanism:
1. **Snapshot** test files + compute SHA-256 checksums → `.testwall/manifest.json` + `.testwall/snapshot/`
2. **Lock** test files read-only (`chmod 444`) in the working tree
3. **Run** always restores from snapshot before executing the test runner — even if the agent bypassed file permissions, the real tests execute
4. **Verify** compares current checksums against manifest — exits nonzero on any mismatch
5. **Accept** is the merge gate — verify + unlock + clean up snapshot
## Current State
- **Rust source** (`src/main.rs`): Complete, compiles, includes unit tests for SHA-256, glob matching, and file permissions. ~550 lines, single-file.
- **Python reference** (`testwall.py`): Functionally identical implementation used for integration testing. Can be removed once Rust achieves full parity (it has — keep only as a test oracle if useful).
- **Integration tested**: Full workflow validated — init, lock, verify (clean), tamper simulation, verify (catches it), accept (rejects), run (restores from snapshot), accept (passes).
- **Not yet done**: See roadmap below.
## Commands
```
testwall init [-p PATTERN...] [-c CMD] # Snapshot test files, record checksums
testwall lock # Set test files read-only
testwall unlock # Restore write permissions
testwall run [-c CMD] [-- extra args] # Restore from snapshot + execute tests
testwall verify [--report-only] # Check checksums, exit 1 on mismatch
testwall accept # Verify + unlock + clean snapshot
testwall status # Show current testwall state
```
## Roadmap (priority order)
### 1. Git hook integration
Add `testwall install-hooks` that drops a `pre-commit` hook running `testwall verify`. This makes it impossible to commit tampered tests even if the agent bypasses file permissions during its session.
### 2. Config hardening (`--strict` mode)
Extend `init` to also snapshot test runner config that agents use to cheat without touching test files directly:
- `conftest.py`, `pytest.ini`, `setup.cfg`, `tox.ini`
- `jest.config.*`, `vitest.config.*`, `.babelrc`
- `Makefile`, `justfile` (if they contain test targets)
- `.cargo/config.toml`
- CI config (`.github/workflows/`, `.gitlab-ci.yml`)
Some of these are already in the default patterns but `--strict` should be explicit and aggressive.
### 3. Multi-agent session orchestration
The novel feature. Formalize the two-agent workflow:
```
testwall session new --tests-from <branch> # Pull tests, lock, create worktree
testwall session submit # Verify + PR/merge implementation
```
Agent A writes tests on a branch. Agent B implements in an isolated worktree where test files are immutable. `session submit` is the gate.
### 4. Publishing
- `cargo publish` to crates.io (name `testwall` is available)
- `pip install testwall` via pyproject.toml entry point (if keeping Python version)
- `npm` wrapper package that downloads the binary (like `esbuild` does)
- GitHub Actions workflow for cross-platform release binaries
### 5. Polish
- `testwall diff` — show what the agent changed in test files (before/after from snapshot)
- `testwall restore` — restore test files from snapshot without running tests
- `--watch` mode for `verify` — continuous integrity monitoring during agent sessions
- JSON output mode (`--json`) for CI integration
- Configurable exclusion patterns (`--exclude`)
## Development
```bash
cargo build # Build
cargo test # Run unit tests
cargo run -- init # Run locally
```
## Design Decisions
- **Single file**: Kept everything in `src/main.rs` intentionally. Modularize when it gets past ~800 lines.
- **BTreeMap for files**: Deterministic ordering in manifest JSON for clean diffs.
- **Custom glob matching**: Avoids pulling in the `glob` crate for a small set of patterns. Supports `*`, `?`, `**` prefix, and `/**/` middle. If this gets more complex, switch to the `globset` crate.
- **Snapshot dir in .gitignore**: The snapshot contains copies of test files — it's ephemeral working state, not source of truth.
- **Manifest kept after accept**: Audit trail. You can see what was locked and when.