Skip to main content

parse_rfc822

Function parse_rfc822 

Source
pub fn parse_rfc822(input: &[u8]) -> Result<Message, MessageParseError>
Expand description

Parse RFC822/MIME bytes into a structured Message.

§Decoding behavior

  • Body charset. Bodies declared utf-8, us-ascii, iso-8859-1, or latin1 are decoded faithfully. Bodies in other charsets, or bodies declared utf-8 with invalid UTF-8 byte sequences, are passed through String::from_utf8_lossy, invalid bytes become U+FFFD. The parser does not error on undecodable bytes; users needing strict decode semantics should pre-validate.
  • Encoded words. RFC 2047 encoded words (=?charset?Q?…?= / =?charset?B?…?=) are decoded for the same charset allowlist. Encoded words in other charsets (e.g. windows-1252, gbk, shift_jis) pass through as the raw =?…?= literal.
  • Duplicate headers. Multiple To:, Cc:, Bcc:, or Reply-To: header lines are merged into a single recipient list. RFC 5322 §3.6 forbids duplicates, but real MTAs occasionally emit them; the parser is liberal in what it accepts. Outbound rendering emits one line per category.
  • RFC 6532 (SMTPUTF8). Header lines must be ASCII-only. Senders that put UTF-8 directly in header bodies (without RFC 2047 encoding) are rejected with MessageParseError::InvalidHeaderLine. Most senders RFC 2047-encode for compat; this rarely surfaces.

§Returned message

The returned Message has not been promoted through outbound validation. Wrapping it via email_message::OutboundMessage::new may reject inbound-shaped messages that lack a From: header or have no recipients, both legitimate states for an inbound parse.

§Round-trip caveats

parse_rfc822 is a typed-model deserializer, not a byte-faithful re-emitter. A parse → render_rfc822 round-trip is not guaranteed to produce identical bytes:

  • Header order. Headers are emitted in a fixed canonical order (From, Sender, To, Cc, Bcc, Reply-To, Subject, Date, Message-ID, generic headers, MIME headers). Trace metadata such as Received: is preserved as a generic header but appears below the typed fields rather than at its original parse position.
  • Generic-header decoding asymmetry. RFC 2047 encoded-words are decoded for Subject and the address headers (From, Sender, To, Cc, Bcc, Reply-To). For arbitrary other headers, values are preserved literally, a header value emitted as X-Note: =?utf-8?B?w6Fy?= round-trips as the literal bytes =?utf-8?B?w6Fy?=, not the decoded text ár. Auto-decoding every unstructured header would be a security regression because opaque-bytes headers (X-Auth-Token, DKIM-Signature, Authentication-Results, ARC-*) carry data that must not be silently rewritten. Callers who know a header is unstructured-text shaped can opt into decoding via decode_rfc2047_phrase.

§Resource bounds

The parser is best-effort and bounded against adversarial input:

  • Input length. Inputs larger than MAX_INPUT_BYTES (16 MiB) are rejected outright with MessageParseError::MimeBodyParse.
  • Multipart depth. Nested multipart/* parts are limited to MAX_MULTIPART_DEPTH (100 levels). Deeper inputs would otherwise stack-overflow on the mutual recursion between the multipart body parser and the part parser.
  • Multipart fan-out. A single multipart body cannot contain more than MAX_MULTIPART_PARTS (1024) sibling parts.

These caps cover the recursive parser surface. The renderer (render_rfc822 and render_rfc822_with) enforces the symmetric MAX_MULTIPART_DEPTH cap on outbound trees, including up to two frames of attachment-wrapping added by the renderer itself when inline and/or regular attachments are present (one multipart/related frame for inline parts, one multipart/mixed frame for regular parts). It returns MessageRenderError::MimeNestingTooDeep when a Body::Mime value plus those wrap frames exceeds the cap. A Body::Mime value at exactly MAX_MULTIPART_DEPTH therefore renders cleanly when no attachments are present but errors when wrapped.

The kernel does not depth-cap serde::Deserialize<Body> / Deserialize<MimePart> because the recursive MimePart::Multipart { parts: Vec<Self> } shape is the data model, not a parser artifact. Callers who deserialize untrusted JSON into email_message::Body are responsible for pre-bounding the input themselves (e.g. via serde_json::de::Deserializer::disable_recursion_limit left at its 128-level default, or a separate length cap). The render path enforces its own cap regardless, so an unbounded deserialize followed by render_rfc822 errors cleanly rather than overflowing the stack.

§Errors

Returns MessageParseError when headers, mailbox fields, dates, message ids, MIME metadata, or transfer-encoded bodies are malformed.