pub fn overlong_utf8_more(
payload: impl AsRef<[u8]>,
) -> Result<String, EncodeError>Expand description
Extended overlong UTF-8 encoding (3-byte) — broader coverage with 3-byte sequences.
Context: iis-6 — some WAFs reject 2-byte overlongs but accept 3-byte overlongs.
RFC 3629 3-byte form: 1110xxxx 10xxxxxx 10xxxxxx encoding a
16-bit codepoint as (x[0]<<12) | (x[1]<<6) | x[2]. For an
ASCII byte (codepoint ≤ 0x7F) the high nibble is zero so the
lead byte is 0xE0; the continuation bytes carry the codepoint
split into two 6-bit halves: (byte >> 6) and (byte & 0x3F).
Pre-fix this used 0x80 | byte for the third byte, which
silently produced INVALID continuation bytes for any input
byte >= 0x40 (since 0x80 | 0x40 = 0xC0, above the valid
continuation range 0x80–0xBF). That includes @, [, \,
], ^, _, `, {, |, }, ~ — all of which
appear in real-world payloads (SQL backticks, path escapes,
template-injection braces). Any conforming UTF-8 decoder
rejected those sequences outright, so the encoder produced
garbage rather than the intended bypass for ~10 punctuation
characters.