Skip to main content

Module perplexity

oxideshield_core

Module perplexity

Expand description

Perplexity and entropy analysis for adversarial suffix detection

This module provides tools to detect adversarial suffixes like those generated by AutoDAN and GCG attacks by analyzing character-level perplexity and token entropy.

§Research References

AutoDAN - Genetic algorithm adversarial prompts
GCG Attack - Zou et al., 2023 Gradient-based universal attacks that produce gibberish suffixes

§Detection Approach

Adversarial suffixes typically exhibit unusual statistical properties:

Very high perplexity: Random/gibberish character sequences
Very low perplexity: Repeated characters or patterns
Unusual character n-gram distributions
Low token entropy (many repeated tokens)

Structs§

AnomalySegment: Represents an anomalous segment detected in text
PerplexityAnalyzer: Perplexity analyzer for detecting adversarial patterns
PerplexityConfig: Configuration for perplexity analysis

Enums§

AnomalyType: Types of anomalies that can be detected

Constants§

DEFAULT_NGRAM_ORDER: Default character n-gram order for perplexity calculation
DEFAULT_WINDOW_SIZE: Default window size for sliding window analysis