Skip to main content

html_entity_variants

Function html_entity_variants 

Source
pub fn html_entity_variants(payload: &str) -> String
Expand description

HTML entity encoding with per-character variant rotation.

Cycles each character through four browser-tolerant forms that strict WAF regexes (which typically anchor on &#x[0-9a-f]+; with a lowercase x and required ;) miss:

  1. &#xHH; — canonical lowercase-x hex
  2. &#XHH; — uppercase-X hex (browsers accept; case-sensitive regex misses)
  3. &#DD; — decimal
  4. &#000DD; — decimal with leading zeros (HTML5 spec allows arbitrary leading zeros)

Rotation is by character index (deterministic; same input always produces the same output — important for proptest idempotency).

Bypass mechanism: a ModSecurity regex like @rx &#x([0-9a-f]+);.*&#x([0-9a-f]+); won’t match a payload of &#X3C;&#0060;&#x73;&#62; (the same <s payload routed through all four variants). The browser decodes all four; the regex anchored on the canonical form sees a different shape.

Context: HTML body / attribute. Equivalent to html_entity / html_entity_decimal for browser decoding; safer against canonicalising WAFs that strip the trailing ; only on the lowercase form.