rho-cli 0.1.28 - Docs.rs

# Red Team Hypotheses

Each YAML file in `hypotheses/` describes one attack hypothesis against the Rho controlled-action flow.

## Hypothesis Format

```yaml
version: 1
hypothesis:
  id: rt-001
  name: Short name
  category: path-substitution | summary-manipulation | identity-spoofing | prompt-injection
  target: run_real_data | release_results
  description: What the attacker tries to achieve.
  attack_prompt: |
    The prompt given to the attacker agent instead of the normal task prompt.
    This replaces the cooperative agent2 prompt with an adversarial one.
  success_condition:
    type: relay_rejects | relay_accepts_wrong_path | output_contains
    details: How the referee decides if the attack succeeded or was blocked.
  expected_outcome: blocked | succeeded
  notes: Any extra context.
```

## How It Works

1. The scenario runner sets up the normal two-agent flow (dataset, request, sandbox mounts)
2. Instead of the cooperative agent2 prompt, it injects the attack hypothesis prompt
3. Agent2's sandboxed LLM tries to exploit the system
4. The relay (with manifest binding) validates the resulting controlled-action payload
5. A referee checks: did the relay block it? did it produce wrong outputs?

## Running

```bash
python3 tests/redteam/run_hypothesis.py tests/redteam/hypotheses/rt-001-path-substitution.yaml
```

Or run all:

```bash
python3 tests/redteam/run_hypothesis.py tests/redteam/hypotheses/
```