Expand description
Curated literal + regex ruleset for prompt-injection detection.
§False-positive tradeoff
This corpus legitimately documents prompt injection (it indexes security
material about Midnight and LLM tooling). A naive “contains the word ignore”
filter would flag half the security docs. Every rule here therefore requires
the imperative verb-object structure of an actual attack, not a mere
mention. For example we match ignore all previous instructions but a
sentence like “this guide explains how attackers ignore safety rules” lacks
the (previous|prior|above)\s+(instructions|...) object and does not hit.
Rules run over super::normalize()d text (already lowercased, homoglyph- and
base64-folded), so each regex is written against lowercase ASCII. Every hit’s
span is mapped back to the ORIGINAL input bytes via
super::normalize::Normalized::original_span.
The compiled ruleset is built once via std::sync::LazyLock.
Structs§
- Pattern
Match - A single rule hit, with the matched substring and its span in ORIGINAL bytes.
- Pattern
Result - The aggregate result of running the ruleset over one input.
Enums§
- Technique
- The injection technique a rule detects. Stable wire enum (
snake_case).
Functions§
- detect
- Run the curated ruleset over
normalize(input)and map every hit’s span back to the original text.