Skip to main content

overlong_utf8_more

Function overlong_utf8_more 

Source
pub fn overlong_utf8_more(
    payload: impl AsRef<[u8]>,
) -> Result<String, EncodeError>
Expand description

Extended overlong UTF-8 encoding (3-byte) — broader coverage with 3-byte sequences.

Context: iis-6 — some WAFs reject 2-byte overlongs but accept 3-byte overlongs.

RFC 3629 3-byte form: 1110xxxx 10xxxxxx 10xxxxxx encoding a 16-bit codepoint as (x[0]<<12) | (x[1]<<6) | x[2]. For an ASCII byte (codepoint ≤ 0x7F) the high nibble is zero so the lead byte is 0xE0; the continuation bytes carry the codepoint split into two 6-bit halves: (byte >> 6) and (byte & 0x3F).

Pre-fix this used 0x80 | byte for the third byte, which silently produced INVALID continuation bytes for any input byte >= 0x40 (since 0x80 | 0x40 = 0xC0, above the valid continuation range 0x80–0xBF). That includes @, [, \, ], ^, _, `, {, |, }, ~ — all of which appear in real-world payloads (SQL backticks, path escapes, template-injection braces). Any conforming UTF-8 decoder rejected those sequences outright, so the encoder produced garbage rather than the intended bypass for ~10 punctuation characters.