Skip to main content

Module perplexity

Module perplexity 

Source
Expand description

Character-class bigram perplexity filter for detecting adversarial suffixes.

Adversarial prompt injection attacks often append high-perplexity suffix strings that look nothing like natural language. This filter scores the suffix window of incoming prompts using character-class bigram frequencies and blocks prompts that exceed the perplexity threshold.

Enums§

PerplexityResult
Result of perplexity filter analysis.

Functions§

analyze_suffix
Analyze the suffix window of a prompt for adversarial content.
bigram_perplexity
Compute character-class bigram perplexity for a string.
symbol_ratio
Compute the ratio of symbol/punctuation characters in the text.