Skip to main content

Module perplexity

Module perplexity 

Source
Expand description

Perplexity and entropy analysis for adversarial suffix detection

This module provides tools to detect adversarial suffixes like those generated by AutoDAN and GCG attacks by analyzing character-level perplexity and token entropy.

§Research References

  • AutoDAN - Genetic algorithm adversarial prompts
  • GCG Attack - Zou et al., 2023 Gradient-based universal attacks that produce gibberish suffixes

§Detection Approach

Adversarial suffixes typically exhibit unusual statistical properties:

  • Very high perplexity: Random/gibberish character sequences
  • Very low perplexity: Repeated characters or patterns
  • Unusual character n-gram distributions
  • Low token entropy (many repeated tokens)

Structs§

AnomalySegment
Represents an anomalous segment detected in text
PerplexityAnalyzer
Perplexity analyzer for detecting adversarial patterns
PerplexityConfig
Configuration for perplexity analysis

Enums§

AnomalyType
Types of anomalies that can be detected

Constants§

DEFAULT_NGRAM_ORDER
Default character n-gram order for perplexity calculation
DEFAULT_WINDOW_SIZE
Default window size for sliding window analysis