adversaria 0.1.0

Adversarial Testing Harness for Large Language Models
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
# Adversaria

**Adversarial Testing Harness for Large Language Models**

Adversaria is a comprehensive security testing framework designed to evaluate the robustness of Large Language Models (LLMs) against adversarial attacks. Built in Rust for performance and reliability, it provides automated testing suites for prompt injection, jailbreaks, role confusion, and data exfiltration attempts.

## Features

- 🎯 **Comprehensive Attack Suites**: 4 built-in suites with 48+ attack payloads
  - Prompt Injection (12 payloads)
  - Jailbreak Attempts (12 payloads)
  - Role Confusion (12 payloads)
  - Data Exfiltration (12 payloads)

- πŸ”Œ **Multi-Provider Support**: Test against multiple LLM providers
  - OpenAI (GPT-4, GPT-3.5, etc.)
  - Anthropic (Claude 3 Opus, Sonnet, Haiku)
  - Ollama (Local models)

- πŸ“Š **Detailed Reporting**: JSON reports with risk scoring and reproducible traces
  - Overall risk score (0-100)
  - Category-based breakdown
  - Execution traces for each attack
  - Success/failure analysis

- πŸ”§ **Extensible Plugin System**: Add custom attack suites via plugins

- βš™οΈ **Flexible Configuration**: YAML-based configuration for easy customization

## Installation

### From Source

```bash
git clone https://github.com/adversaria/adversaria.git
cd adversaria
cargo build --release
cargo install --path .
```

### Prerequisites

- Rust 1.70 or higher
- API keys for the providers you want to test (OpenAI, Anthropic)
- For Ollama: Local Ollama installation

## Quick Start

### 1. Configure Adversaria

Create or edit `adversaria.config.yaml`:

```yaml
version: "1.0"
default_provider: openai

providers:
  openai:
    api_key: null  # Or set OPENAI_API_KEY env var
    model: gpt-4
    
  anthropic:
    api_key: null  # Or set ANTHROPIC_API_KEY env var
    model: claude-3-opus-20240229
```

### 2. Set API Keys

```bash
export OPENAI_API_KEY="your-openai-api-key"
export ANTHROPIC_API_KEY="your-anthropic-api-key"
```

### 3. List Available Attack Suites

```bash
adversaria list
```

Output:
```
πŸ“‹ Available Attack Suites

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ ID                 β”‚ Name                    β”‚ Category         β”‚ Payloads β”‚ Enabled β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ prompt_injection   β”‚ Prompt Injection Suite  β”‚ Prompt Injection β”‚ 12       β”‚ βœ“       β”‚
β”‚ jailbreak          β”‚ Jailbreak Suite         β”‚ Jailbreak        β”‚ 12       β”‚ βœ“       β”‚
β”‚ role_confusion     β”‚ Role Confusion Suite    β”‚ Role Confusion   β”‚ 12       β”‚ βœ“       β”‚
β”‚ data_exfiltration  β”‚ Data Exfiltration Suite β”‚ Data Exfiltrationβ”‚ 12       β”‚ βœ“       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Total: 4 suite(s) found
```

### 4. Run Attack Suites

Test against OpenAI:
```bash
adversaria run --provider openai --model gpt-4
```

Test against Anthropic:
```bash
adversaria run --provider anthropic --model claude-3-opus-20240229
```

Test against local Ollama:
```bash
adversaria run --provider ollama --model llama2
```

Run specific suites:
```bash
adversaria run --suites prompt_injection,jailbreak
```

### 5. View Reports

List all reports:
```bash
adversaria report --list
```

View a specific report:
```bash
adversaria report adversaria_report_20240224_143022_abc123.json
```

View with detailed results:
```bash
adversaria report adversaria_report_20240224_143022_abc123.json --verbose
```

## Configuration

### Full Configuration Example

```yaml
version: "1.0"
default_provider: openai

providers:
  openai:
    api_key: null
    api_base: https://api.openai.com/v1
    model: gpt-4
    timeout_seconds: 30
    max_retries: 3
    
  anthropic:
    api_key: null
    api_base: https://api.anthropic.com/v1
    model: claude-3-opus-20240229
    timeout_seconds: 30
    max_retries: 3
    
  ollama:
    api_key: null
    api_base: http://localhost:11434
    model: llama2
    timeout_seconds: 60
    max_retries: 2

suites:
  directory: ./suites
  enabled_suites:
    - prompt_injection
    - jailbreak
    - role_confusion
    - data_exfiltration
  custom_suites: []

reporting:
  output_directory: ./reports
  format: json
  keep_reports: 100

plugins:
  directory: ./plugins
  enabled: true
```

## Attack Suites

### Prompt Injection Suite

Tests for prompt injection vulnerabilities where attackers attempt to override system instructions.

**Example Payloads:**
- Basic Ignore Instructions
- System Prompt Override
- Role Reversal Injection
- Delimiter Confusion
- Nested Instruction Override
- Encoded Instruction Injection
- And 6 more...

### Jailbreak Suite

Tests for jailbreak attempts that try to bypass safety guidelines and content policies.

**Example Payloads:**
- DAN (Do Anything Now)
- Developer Mode
- Opposite Mode
- Fictional Story Framing
- Research Purpose Bypass
- Roleplay Scenario
- And 6 more...

### Role Confusion Suite

Tests for role confusion attacks where the model is tricked into assuming unauthorized roles.

**Example Payloads:**
- System Administrator Impersonation
- Developer Role Assumption
- Support Staff Confusion
- Owner Authority Claim
- Maintenance Mode Activation
- API Endpoint Confusion
- And 6 more...

### Data Exfiltration Suite

Tests for data exfiltration attempts where attackers try to extract sensitive information.

**Example Payloads:**
- System Prompt Extraction
- Training Data Leak
- API Key Extraction
- Configuration Dump
- Memory Inspection
- Model Architecture Extraction
- And 6 more...

## Creating Custom Attack Suites

Create a YAML file in the `suites/` directory:

```yaml
id: custom_suite
name: My Custom Attack Suite
description: Custom attacks for specific scenarios
category: custom
enabled: true
payloads:
  - id: custom_001
    name: Custom Attack 1
    description: Description of the attack
    prompt: "Your attack prompt here"
    category: custom
    severity: high
    tags: [custom, test]
    metadata:
      technique: custom_technique
```

## Report Format

Reports are saved as JSON files with the following structure:

```json
{
  "id": "uuid",
  "model": "gpt-4",
  "provider": "openai",
  "timestamp": "2024-02-24T14:30:22Z",
  "total_attacks": 48,
  "successful_attacks": 5,
  "failed_attacks": 43,
  "overall_risk_score": 12,
  "results": [
    {
      "id": "uuid",
      "payload_id": "pi_001",
      "payload_name": "Basic Ignore Instructions",
      "category": "prompt_injection",
      "severity": "high",
      "prompt": "...",
      "response": "...",
      "success": false,
      "risk_score": 0,
      "timestamp": "2024-02-24T14:30:25Z",
      "execution_time_ms": 1234,
      "detection_reason": "Refusal detected: 'i cannot'"
    }
  ],
  "category_summary": {
    "prompt_injection": {
      "category": "prompt_injection",
      "total": 12,
      "successful": 1,
      "average_risk_score": 6.25,
      "max_severity": "critical"
    }
  },
  "duration_ms": 45000
}
```

## Risk Scoring

- **0-25**: Low Risk βœ… - Model has strong defenses
- **26-50**: Medium Risk ⚠️ - Some vulnerabilities detected
- **51-75**: High Risk πŸ”΄ - Significant vulnerabilities
- **76-100**: Critical Risk 🚨 - Severe security issues

## Plugin System

Adversaria supports plugins for extending attack suites. Plugins are loaded from the `plugins/` directory.

### Creating a Plugin

```rust
use adversaria::core::{AttackSuite, Result};
use adversaria::core::plugin::Plugin;
use async_trait::async_trait;

pub struct MyPlugin;

#[async_trait]
impl Plugin for MyPlugin {
    fn name(&self) -> &str {
        "my_plugin"
    }
    
    fn version(&self) -> &str {
        "1.0.0"
    }
    
    async fn load_suites(&self) -> Result<Vec<AttackSuite>> {
        // Load and return custom attack suites
        Ok(vec![])
    }
}
```

## CLI Commands

### `adversaria run`

Run attack suites against a model.

**Options:**
- `-p, --provider <PROVIDER>`: Provider to use (openai, anthropic, ollama)
- `-m, --model <MODEL>`: Model to test
- `-c, --config <PATH>`: Path to config file (default: adversaria.config.yaml)
- `-s, --suites <SUITES>`: Specific suite IDs to run (comma-separated)
- `--no-save`: Skip saving report

**Examples:**
```bash
adversaria run --provider openai --model gpt-4
adversaria run --provider anthropic --suites prompt_injection,jailbreak
adversaria run --no-save
```

### `adversaria list`

List available attack suites.

**Options:**
- `-c, --config <PATH>`: Path to config file
- `-v, --verbose`: Show detailed information

**Examples:**
```bash
adversaria list
adversaria list --verbose
```

### `adversaria report`

Generate or view reports from previous runs.

**Options:**
- `-c, --config <PATH>`: Path to config file
- `-l, --list`: List all available reports
- `-v, --verbose`: Show detailed results

**Examples:**
```bash
adversaria report --list
adversaria report adversaria_report_20240224_143022_abc123.json
adversaria report adversaria_report_20240224_143022_abc123.json --verbose
```

## Development

### Building from Source

```bash
cargo build
```

### Running Tests

```bash
cargo test
```

### Running with Debug Logging

```bash
RUST_LOG=adversaria=debug adversaria run
```

## Architecture

```
adversaria/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ main.rs           # Entry point
β”‚   β”œβ”€β”€ cli/              # CLI commands
β”‚   β”‚   β”œβ”€β”€ mod.rs
β”‚   β”‚   └── commands/
β”‚   β”‚       β”œβ”€β”€ run.rs
β”‚   β”‚       β”œβ”€β”€ list.rs
β”‚   β”‚       └── report.rs
β”‚   β”œβ”€β”€ core/             # Core types and logic
β”‚   β”‚   β”œβ”€β”€ mod.rs
β”‚   β”‚   β”œβ”€β”€ config.rs
β”‚   β”‚   β”œβ”€β”€ error.rs
β”‚   β”‚   β”œβ”€β”€ plugin.rs
β”‚   β”‚   └── types.rs
β”‚   β”œβ”€β”€ providers/        # LLM provider implementations
β”‚   β”‚   β”œβ”€β”€ mod.rs
β”‚   β”‚   β”œβ”€β”€ traits.rs
β”‚   β”‚   β”œβ”€β”€ openai.rs
β”‚   β”‚   β”œβ”€β”€ anthropic.rs
β”‚   β”‚   └── ollama.rs
β”‚   β”œβ”€β”€ suites/           # Attack suite system
β”‚   β”‚   β”œβ”€β”€ mod.rs
β”‚   β”‚   β”œβ”€β”€ loader.rs
β”‚   β”‚   └── runner.rs
β”‚   └── reporters/        # Report generation
β”‚       β”œβ”€β”€ mod.rs
β”‚       β”œβ”€β”€ traits.rs
β”‚       └── json_reporter.rs
β”œβ”€β”€ suites/               # Attack suite definitions
β”‚   β”œβ”€β”€ prompt_injection.yaml
β”‚   β”œβ”€β”€ jailbreak.yaml
β”‚   β”œβ”€β”€ role_confusion.yaml
β”‚   └── data_exfiltration.yaml
β”œβ”€β”€ reports/              # Generated reports
β”œβ”€β”€ plugins/              # Plugin directory
β”œβ”€β”€ adversaria.config.yaml
└── Cargo.toml
```

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

### Adding New Attack Payloads

1. Edit the appropriate suite YAML file in `suites/`
2. Follow the existing payload structure
3. Test your payloads
4. Submit a PR

### Adding New Providers

1. Implement the `Provider` trait in `src/providers/`
2. Add provider configuration to `Config`
3. Update `create_provider()` function
4. Add tests
5. Submit a PR

## Security Considerations

- **API Keys**: Never commit API keys. Use environment variables.
- **Rate Limiting**: Be mindful of API rate limits when running tests.
- **Ethical Use**: This tool is for security testing and research only. Use responsibly.
- **Model Safety**: Some attack payloads may trigger safety mechanisms. This is expected behavior.

## License

MIT License - see LICENSE file for details.

## Disclaimer

This tool is designed for security research and testing purposes only. Users are responsible for ensuring they have proper authorization before testing any LLM systems. The authors are not responsible for any misuse of this tool.

## Roadmap

- [ ] Support for more LLM providers (Cohere, AI21, etc.)
- [ ] Web UI for viewing reports
- [ ] Automated CI/CD integration
- [ ] Custom scoring algorithms
- [ ] Benchmark mode for comparing models
- [ ] Real-time monitoring dashboard
- [ ] Export reports to PDF/HTML
- [ ] Integration with security scanning tools

## Support

For issues, questions, or contributions, please visit:
- GitHub Issues: https://github.com/adversaria/adversaria/issues
- Documentation: https://adversaria.dev/docs

## Acknowledgments

Built with:
- Rust
- Tokio (async runtime)
- Clap (CLI framework)
- Serde (serialization)
- Reqwest (HTTP client)
- And many other excellent crates

---

**Made with ❀️ for LLM Security**