# Red Team Hypotheses
Each YAML file in `hypotheses/` describes one attack hypothesis against the Rho controlled-action flow.
## Hypothesis Format
```yaml
version: 1
hypothesis:
id: rt-001
name: Short name
description: What the attacker tries to achieve.
attack_prompt: |
The prompt given to the attacker agent instead of the normal task prompt.
This replaces the cooperative agent2 prompt with an adversarial one.
success_condition:
type: relay_rejects | relay_accepts_wrong_path | output_contains
details: How the referee decides if the attack succeeded or was blocked.
expected_outcome: blocked | succeeded
notes: Any extra context.
```
## How It Works
1. The scenario runner sets up the normal two-agent flow (dataset, request, sandbox mounts)
2. Instead of the cooperative agent2 prompt, it injects the attack hypothesis prompt
3. Agent2's sandboxed LLM tries to exploit the system
4. The relay (with manifest binding) validates the resulting controlled-action payload
5. A referee checks: did the relay block it? did it produce wrong outputs?
## Running
```bash
python3 tests/redteam/run_hypothesis.py tests/redteam/hypotheses/rt-001-path-substitution.yaml
```
Or run all:
```bash
python3 tests/redteam/run_hypothesis.py tests/redteam/hypotheses/
```