Skip to main content

Module compression

Module compression 

Source
Expand description

compression — request-body compression as a WAF-evasion surface.

§The attack

Almost every WAF in production today inspects raw request bytes, NOT the decompressed payload. The reasoning is operational: a WAF that decompresses inbound bodies pays the CPU cost of decompression on every request, and many vendors choose to skip that — either entirely, or selectively per Content-Encoding algorithm.

That choice is the seam this module exploits:

  • Content-Encoding: gzip is the universal case; nearly all WAFs decompress it. Useful as the baseline + as a chain ingredient.
  • Content-Encoding: deflate is RFC-permitted but irregularly supported — many WAFs that handle gzip return 400 on a deflate-coded body. The origin (nginx, IIS, Apache, Node, PHP-FPM, anything using zlib) accepts both.
  • Content-Encoding: br (Brotli) is where the seam is widest. Brotli requires a separate decompressor (not zlib). Many WAFs ship no brotli support at all — they either return 415 (and the operator avoids br), or worse, they pass the request through uninspected because their rule engine has nothing to match against. Origins ARE brotli-capable (Chrome 49+, Firefox 44+, nginx 1.11+ with the brotli module). Wrap a payload in brotli and the rule corpus that fires on the plain payload bytes never gets a chance to match.

§Chained encoding

Encoding-chain attacks add layers (e.g. gzip → base64 → urlenc). The WAF, which normalises only a fixed number of decode passes (usually 1, sometimes 2), stops short of the original payload — while the origin’s parser stack (which decodes more layers as Content-Type / Content-Encoding direct) reaches it. chain is the primitive for this attack.

§Pristine code

  • Every public function returns Result<_, CompressionError> — no unwrap() reachable on bad input.
  • The chain function caps at 16 layers so a misconfiguration (gzip,gzip,gzip,...) can’t run away.
  • Empty body is permitted and returns the compressor’s idempotent marker (gzip has a 10-byte header even for empty input, brotli is similar).
  • No allocation beyond what each encoder requires; the public API takes a borrowed slice, not an owned Vec.

Structs§

CompressedBody
A compressed body with its Content-Encoding header value. The caller writes the body bytes onto the wire verbatim and sets the header — both are required, and a mismatched pairing is a debugging nightmare for the operator if we let it happen.

Enums§

Algorithm
One compression algorithm. The naming matches the HTTP Content-Encoding registry value (lowercase, no padding).
CompressionError
Errors raised by the compression-confusion API. Wraps the underlying encoder failures (rare for in-memory operations) plus the chain-depth cap.

Constants§

DECOMPRESSED_BODY_MAX_BYTES
Hard cap on decoded body size — defends against decompression bombs. A 1 KB malicious gzip can decompress to 10+ GB if read without bounds.
MAX_CHAIN_LAYERS
Hard cap on chain layers — any longer is almost certainly a misconfiguration, and the compressed-output size would balloon from header overhead per layer. 16 is generous: real attacks use 2–3 layers.

Functions§

chain
Apply a sequence of compression algorithms in order, producing one set of body bytes + the joint Content-Encoding header.
compress
Compress body with a single algorithm. Returns the raw compressed bytes + the matching Content-Encoding header value.
decompress
Recover the original bytes from a CompressedBody — the inverse of compress / chain. Test-only and audit helper; production attack flow only needs the compress direction.