Expand description
Unicode and HTML entity encoding strategies. Unicode and HTML entity encoding strategies.
Constants§
- ZERO_
WIDTH_ DEFAULTS - Recommended cycle of invisible characters for zero-width injection.
[U+200B ZWSP, U+200C ZWNJ, U+200D ZWJ, U+FEFF BOM, U+034F CGJ].
Functions§
- bidi_
inject - Bidi override wrapper — wraps
reversed_keywordbetween U+202E (RIGHT-TO-LEFT OVERRIDE) and U+202C (POP DIRECTIONAL FORMATTING). - combining_
mark_ inject - Inject a combining diacritical mark after each letter of
payload. - fullwidth_
encode - Fullwidth Unicode encoding — replaces ASCII with fullwidth equivalents.
- homoglyph_
encode - Homoglyph substitution — replaces select ASCII characters with visually identical Unicode characters from other scripts.
- html_
entity_ decimal_ encode - HTML decimal entity encoding — each character becomes
&#DD;. - html_
entity_ encode - HTML entity encoding — each character becomes
&#xXX;. - html_
entity_ variants - HTML entity encoding with per-character variant rotation.
- html_
entity_ zero_ pad - HTML entity encoding with zero-padded numeric reference — every
character becomes either
&#x{:0>width$X};(hex form) or&#{:0>width$};(decimal form). Leading zeros pad the number topadcharacters. - iis_
unicode_ encode - IIS/ASP percent Unicode encoding — each character becomes
%uXXXX. - json_
key_ unicode_ escape - AWS WAF JSON-pointer escape — encode every char of
keyas\uXXXXso the WAF’s JSON-pointer rule (e.g./idliteral-match) misses, while the backend JSON parser decodes the escape and routes the value to the original field. - json_
string_ encode - JSON string-content escape — produces the escaped INTERIOR of a
JSON string literal (no surrounding
"..."quotes). - json_
unicode_ alnum - Partial JSON Unicode escape — encodes ASCII alphanumeric chars as
\uXXXXwhile leaving structural punctuation (quotes, operators, whitespace) bare. - json_
unicode_ full - Full JSON
\uXXXXescape — escapes EVERY character of the input (including punctuation, whitespace, and control chars). Stronger thanjson_unicode_alnumwhich only touches alnum chars. Use when the WAF tokenises on punctuation boundaries thatjson_unicode_alnumleaves intact, OR when the WAF rule is a regex over the raw bytes of the keyword + adjacent punctuation. - json_
unicode_ mixed_ case - Mixed-case JSON
\uXXXXescape — alternates\uand\Uplus upper/lowercase hex digits. Some WAF regexes are case-sensitive against\u[0-9A-F]{4}; JSON parsers RFC 8259 only accept\ulowercase, but JavaScriptJSON.parseand PHPjson_decodetolerate both — pick the form the backend tolerates and the WAF’s regex misses. - letterlike_
encode - Letterlike-symbols + circled-Latin selective substitution — replaces individual ASCII letters in the payload with codepoints from U+2100-214F and U+24B6-24E9 that NFKC-normalize back to the original ASCII letter. Unlike the math-*-encode functions which substitute every letter from a single block, this picks the most visually- distinct codepoint per letter to maximise WAF-rule mismatch while keeping the encoded string visibly identifiable.
- math_
bold_ encode - Mathematical Alphanumeric Symbols encoding — replaces ASCII letters and
digits with their Math-Bold counterparts in the Unicode
U+1D400block. - math_
double_ struck_ encode - Mathematical Double-Struck (blackboard bold) alphabet — uppercase U+1D538, lowercase U+1D552. Holes at C/H/N/P/Q/R/Z filled from the letterlike-symbols block.
- math_
fraktur_ encode - Mathematical Fraktur (blackletter) alphabet — uppercase U+1D504, lowercase U+1D51E. Fraktur has holes at C/H/I/R/Z which are filled by U+212D ℭ, U+210C ℌ, U+2111 ℑ, U+211C ℜ, U+2128 ℨ.
- math_
italic_ encode - Mathematical Italic alphabet — same NFKC trick as
math_bold_encodebut in a different Unicode block (U+1D434 uppercase, U+1D44E lowercase). WAFs that have added detection for the bold range (U+1D400-) do not always cover italic. - math_
script_ encode - Mathematical Script alphabet — uppercase U+1D49C, lowercase U+1D4B6. Script has SIX holes (U+1D49D B, U+1D4A0 E, U+1D4A1 F, U+1D4A3 H, U+1D4A4 I, U+1D4A7 M, U+1D4AD R, U+1D4BA e, U+1D4BC g, U+1D4C4 o) — each filled by the letterlike-symbols block (U+212C BCRIPT CAPITAL B, U+2130 SCRIPT CAPITAL E, etc.) so the encoded string stays NFKC-equivalent to ASCII.
- overlong_
utf8_ path - Overlong UTF-8 encoding of
.and/for path traversal. - pg_
chr_ decompose - Postgres / Oracle CHR()-function decomposition —
CHR(N) || CHR(N) || ...per char of every single-quoted string literal. - script_
homoglyph_ encode - Cross-script Cyrillic / Greek letter substitution.
- sharp_
s_ encode - Sharp-s (ß U+00DF) substitution for
s/S. - sql_
adjacent_ string_ concat - SQL adjacent-string-literal concatenation — every
'string'literal of length ≥ 2 is rewritten as a sequence of single-character adjacent literals:'admin'→'a' 'd' 'm' 'i' 'n'. - sql_
char_ decompose - SQL CHAR()-function decomposition — converts every single-quoted string
literal in the payload to a
CHAR(N1,N2,...)function call with one codepoint per argument. - sql_
concat_ split - SQL string-literal CONCAT splitter — converts every single-quoted string
in the payload to a
CONCAT('a','b',...)expression with one char per argument. - turkish_
i_ encode - Turkish dotless-i substitution: replace
i/Iwith U+0131/U+0130. - unicode_
encode - Unicode encoding — each character becomes
\uXXXX. - zero_
width_ inject - Inject zero-width / format characters between letters of
payload.