Expand description
Perplexity and entropy analysis for adversarial suffix detection
This module provides tools to detect adversarial suffixes like those generated by AutoDAN and GCG attacks by analyzing character-level perplexity and token entropy.
§Research References
- AutoDAN - Genetic algorithm adversarial prompts
- GCG Attack - Zou et al., 2023 Gradient-based universal attacks that produce gibberish suffixes
§Detection Approach
Adversarial suffixes typically exhibit unusual statistical properties:
- Very high perplexity: Random/gibberish character sequences
- Very low perplexity: Repeated characters or patterns
- Unusual character n-gram distributions
- Low token entropy (many repeated tokens)
Structs§
- Anomaly
Segment - Represents an anomalous segment detected in text
- Perplexity
Analyzer - Perplexity analyzer for detecting adversarial patterns
- Perplexity
Config - Configuration for perplexity analysis
Enums§
- Anomaly
Type - Types of anomalies that can be detected
Constants§
- DEFAULT_
NGRAM_ ORDER - Default character n-gram order for perplexity calculation
- DEFAULT_
WINDOW_ SIZE - Default window size for sliding window analysis