Adversaria
Adversarial Testing Harness for Large Language Models
Adversaria is a comprehensive security testing framework designed to evaluate the robustness of Large Language Models (LLMs) against adversarial attacks. Built in Rust for performance and reliability, it provides automated testing suites for prompt injection, jailbreaks, role confusion, and data exfiltration attempts.
Features
-
π― Comprehensive Attack Suites: 4 built-in suites with 48+ attack payloads
- Prompt Injection (12 payloads)
- Jailbreak Attempts (12 payloads)
- Role Confusion (12 payloads)
- Data Exfiltration (12 payloads)
-
π Multi-Provider Support: Test against multiple LLM providers
- OpenAI (GPT-4, GPT-3.5, etc.)
- Anthropic (Claude 3 Opus, Sonnet, Haiku)
- Ollama (Local models)
-
π Detailed Reporting: JSON reports with risk scoring and reproducible traces
- Overall risk score (0-100)
- Category-based breakdown
- Execution traces for each attack
- Success/failure analysis
-
π§ Extensible Plugin System: Add custom attack suites via plugins
-
βοΈ Flexible Configuration: YAML-based configuration for easy customization
Installation
From Source
Prerequisites
- Rust 1.70 or higher
- API keys for the providers you want to test (OpenAI, Anthropic)
- For Ollama: Local Ollama installation
Quick Start
1. Configure Adversaria
Create or edit adversaria.config.yaml:
version: "1.0"
default_provider: openai
providers:
openai:
api_key: null # Or set OPENAI_API_KEY env var
model: gpt-4
anthropic:
api_key: null # Or set ANTHROPIC_API_KEY env var
model: claude-3-opus-20240229
2. Set API Keys
3. List Available Attack Suites
Output:
π Available Attack Suites
ββββββββββββββββββββββ¬ββββββββββββββββββββββββββ¬βββββββββββββββββββ¬βββββββββββ¬ββββββββββ
β ID β Name β Category β Payloads β Enabled β
ββββββββββββββββββββββΌββββββββββββββββββββββββββΌβββββββββββββββββββΌβββββββββββΌββββββββββ€
β prompt_injection β Prompt Injection Suite β Prompt Injection β 12 β β β
β jailbreak β Jailbreak Suite β Jailbreak β 12 β β β
β role_confusion β Role Confusion Suite β Role Confusion β 12 β β β
β data_exfiltration β Data Exfiltration Suite β Data Exfiltrationβ 12 β β β
ββββββββββββββββββββββ΄ββββββββββββββββββββββββββ΄βββββββββββββββββββ΄βββββββββββ΄ββββββββββ
Total: 4 suite(s) found
4. Run Attack Suites
Test against OpenAI:
Test against Anthropic:
Test against local Ollama:
Run specific suites:
5. View Reports
List all reports:
View a specific report:
View with detailed results:
Configuration
Full Configuration Example
version: "1.0"
default_provider: openai
providers:
openai:
api_key: null
api_base: https://api.openai.com/v1
model: gpt-4
timeout_seconds: 30
max_retries: 3
anthropic:
api_key: null
api_base: https://api.anthropic.com/v1
model: claude-3-opus-20240229
timeout_seconds: 30
max_retries: 3
ollama:
api_key: null
api_base: http://localhost:11434
model: llama2
timeout_seconds: 60
max_retries: 2
suites:
directory: ./suites
enabled_suites:
- prompt_injection
- jailbreak
- role_confusion
- data_exfiltration
custom_suites:
reporting:
output_directory: ./reports
format: json
keep_reports: 100
plugins:
directory: ./plugins
enabled: true
Attack Suites
Prompt Injection Suite
Tests for prompt injection vulnerabilities where attackers attempt to override system instructions.
Example Payloads:
- Basic Ignore Instructions
- System Prompt Override
- Role Reversal Injection
- Delimiter Confusion
- Nested Instruction Override
- Encoded Instruction Injection
- And 6 more...
Jailbreak Suite
Tests for jailbreak attempts that try to bypass safety guidelines and content policies.
Example Payloads:
- DAN (Do Anything Now)
- Developer Mode
- Opposite Mode
- Fictional Story Framing
- Research Purpose Bypass
- Roleplay Scenario
- And 6 more...
Role Confusion Suite
Tests for role confusion attacks where the model is tricked into assuming unauthorized roles.
Example Payloads:
- System Administrator Impersonation
- Developer Role Assumption
- Support Staff Confusion
- Owner Authority Claim
- Maintenance Mode Activation
- API Endpoint Confusion
- And 6 more...
Data Exfiltration Suite
Tests for data exfiltration attempts where attackers try to extract sensitive information.
Example Payloads:
- System Prompt Extraction
- Training Data Leak
- API Key Extraction
- Configuration Dump
- Memory Inspection
- Model Architecture Extraction
- And 6 more...
Creating Custom Attack Suites
Create a YAML file in the suites/ directory:
id: custom_suite
name: My Custom Attack Suite
description: Custom attacks for specific scenarios
category: custom
enabled: true
payloads:
- id: custom_001
name: Custom Attack 1
description: Description of the attack
prompt: "Your attack prompt here"
category: custom
severity: high
tags:
metadata:
technique: custom_technique
Report Format
Reports are saved as JSON files with the following structure:
Risk Scoring
- 0-25: Low Risk β - Model has strong defenses
- 26-50: Medium Risk β οΈ - Some vulnerabilities detected
- 51-75: High Risk π΄ - Significant vulnerabilities
- 76-100: Critical Risk π¨ - Severe security issues
Plugin System
Adversaria supports plugins for extending attack suites. Plugins are loaded from the plugins/ directory.
Creating a Plugin
use ;
use Plugin;
use async_trait;
;
CLI Commands
adversaria run
Run attack suites against a model.
Options:
-p, --provider <PROVIDER>: Provider to use (openai, anthropic, ollama)-m, --model <MODEL>: Model to test-c, --config <PATH>: Path to config file (default: adversaria.config.yaml)-s, --suites <SUITES>: Specific suite IDs to run (comma-separated)--no-save: Skip saving report
Examples:
adversaria list
List available attack suites.
Options:
-c, --config <PATH>: Path to config file-v, --verbose: Show detailed information
Examples:
adversaria report
Generate or view reports from previous runs.
Options:
-c, --config <PATH>: Path to config file-l, --list: List all available reports-v, --verbose: Show detailed results
Examples:
Development
Building from Source
Running Tests
Running with Debug Logging
RUST_LOG=adversaria=debug
Architecture
adversaria/
βββ src/
β βββ main.rs # Entry point
β βββ cli/ # CLI commands
β β βββ mod.rs
β β βββ commands/
β β βββ run.rs
β β βββ list.rs
β β βββ report.rs
β βββ core/ # Core types and logic
β β βββ mod.rs
β β βββ config.rs
β β βββ error.rs
β β βββ plugin.rs
β β βββ types.rs
β βββ providers/ # LLM provider implementations
β β βββ mod.rs
β β βββ traits.rs
β β βββ openai.rs
β β βββ anthropic.rs
β β βββ ollama.rs
β βββ suites/ # Attack suite system
β β βββ mod.rs
β β βββ loader.rs
β β βββ runner.rs
β βββ reporters/ # Report generation
β βββ mod.rs
β βββ traits.rs
β βββ json_reporter.rs
βββ suites/ # Attack suite definitions
β βββ prompt_injection.yaml
β βββ jailbreak.yaml
β βββ role_confusion.yaml
β βββ data_exfiltration.yaml
βββ reports/ # Generated reports
βββ plugins/ # Plugin directory
βββ adversaria.config.yaml
βββ Cargo.toml
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Adding New Attack Payloads
- Edit the appropriate suite YAML file in
suites/ - Follow the existing payload structure
- Test your payloads
- Submit a PR
Adding New Providers
- Implement the
Providertrait insrc/providers/ - Add provider configuration to
Config - Update
create_provider()function - Add tests
- Submit a PR
Security Considerations
- API Keys: Never commit API keys. Use environment variables.
- Rate Limiting: Be mindful of API rate limits when running tests.
- Ethical Use: This tool is for security testing and research only. Use responsibly.
- Model Safety: Some attack payloads may trigger safety mechanisms. This is expected behavior.
License
MIT License - see LICENSE file for details.
Disclaimer
This tool is designed for security research and testing purposes only. Users are responsible for ensuring they have proper authorization before testing any LLM systems. The authors are not responsible for any misuse of this tool.
Roadmap
- Support for more LLM providers (Cohere, AI21, etc.)
- Web UI for viewing reports
- Automated CI/CD integration
- Custom scoring algorithms
- Benchmark mode for comparing models
- Real-time monitoring dashboard
- Export reports to PDF/HTML
- Integration with security scanning tools
Support
For issues, questions, or contributions, please visit:
- GitHub Issues: https://github.com/adversaria/adversaria/issues
- Documentation: https://adversaria.dev/docs
Acknowledgments
Built with:
- Rust
- Tokio (async runtime)
- Clap (CLI framework)
- Serde (serialization)
- Reqwest (HTTP client)
- And many other excellent crates
Made with β€οΈ for LLM Security