pub fn parse_rfc822(input: &[u8]) -> Result<Message, MessageParseError>Expand description
Parse RFC822/MIME bytes into a structured Message.
§Decoding behavior
- Body charset. Bodies declared
utf-8,us-ascii,iso-8859-1, orlatin1are decoded faithfully. Bodies in other charsets, or bodies declaredutf-8with invalid UTF-8 byte sequences, are passed throughString::from_utf8_lossy, invalid bytes becomeU+FFFD. The parser does not error on undecodable bytes; users needing strict decode semantics should pre-validate. - Encoded words. RFC 2047 encoded words (
=?charset?Q?…?=/=?charset?B?…?=) are decoded for the same charset allowlist. Encoded words in other charsets (e.g.windows-1252,gbk,shift_jis) pass through as the raw=?…?=literal. - Duplicate headers. Multiple
To:,Cc:,Bcc:, orReply-To:header lines are merged into a single recipient list. RFC 5322 §3.6 forbids duplicates, but real MTAs occasionally emit them; the parser is liberal in what it accepts. Outbound rendering emits one line per category. - RFC 6532 (SMTPUTF8). Header lines must be ASCII-only. Senders
that put UTF-8 directly in header bodies (without RFC 2047 encoding)
are rejected with
MessageParseError::InvalidHeaderLine. Most senders RFC 2047-encode for compat; this rarely surfaces.
§Returned message
The returned Message has not been promoted through outbound
validation. Wrapping it via email_message::OutboundMessage::new
may reject inbound-shaped messages that lack a From: header or
have no recipients, both legitimate states for an inbound parse.
§Round-trip caveats
parse_rfc822 is a typed-model deserializer, not a byte-faithful
re-emitter. A parse → render_rfc822 round-trip is not guaranteed
to produce identical bytes:
- Header order. Headers are emitted in a fixed canonical order
(
From,Sender,To,Cc,Bcc,Reply-To,Subject,Date,Message-ID, generic headers, MIME headers). Trace metadata such asReceived:is preserved as a generic header but appears below the typed fields rather than at its original parse position. - Generic-header decoding asymmetry. RFC 2047 encoded-words are
decoded for
Subjectand the address headers (From,Sender,To,Cc,Bcc,Reply-To). For arbitrary other headers, values are preserved literally, a header value emitted asX-Note: =?utf-8?B?w6Fy?=round-trips as the literal bytes=?utf-8?B?w6Fy?=, not the decoded textár. Auto-decoding every unstructured header would be a security regression because opaque-bytes headers (X-Auth-Token,DKIM-Signature,Authentication-Results,ARC-*) carry data that must not be silently rewritten. Callers who know a header is unstructured-text shaped can opt into decoding viadecode_rfc2047_phrase.
§Resource bounds
The parser is best-effort and bounded against adversarial input:
- Input length. Inputs larger than
MAX_INPUT_BYTES(16 MiB) are rejected outright withMessageParseError::MimeBodyParse. - Multipart depth. Nested
multipart/*parts are limited toMAX_MULTIPART_DEPTH(100 levels). Deeper inputs would otherwise stack-overflow on the mutual recursion between the multipart body parser and the part parser. - Multipart fan-out. A single multipart body cannot contain more
than
MAX_MULTIPART_PARTS(1024) sibling parts.
These caps cover the recursive parser surface. The renderer
(render_rfc822 and render_rfc822_with) enforces the symmetric
MAX_MULTIPART_DEPTH cap on outbound trees, including up to two
frames of attachment-wrapping added by the renderer itself when
inline and/or regular attachments are present (one
multipart/related frame for inline parts, one multipart/mixed
frame for regular parts). It returns
MessageRenderError::MimeNestingTooDeep when a Body::Mime value
plus those wrap frames exceeds the cap. A Body::Mime value at
exactly MAX_MULTIPART_DEPTH therefore renders cleanly when no
attachments are present but errors when wrapped.
The kernel does not depth-cap serde::Deserialize<Body> /
Deserialize<MimePart> because the recursive
MimePart::Multipart { parts: Vec<Self> } shape is the data model,
not a parser artifact. Callers who deserialize untrusted JSON into
email_message::Body are responsible for pre-bounding the input
themselves (e.g. via serde_json::de::Deserializer::disable_recursion_limit
left at its 128-level default, or a separate length cap). The render
path enforces its own cap regardless, so an unbounded deserialize
followed by render_rfc822 errors cleanly rather than overflowing
the stack.
§Errors
Returns MessageParseError when headers, mailbox fields, dates,
message ids, MIME metadata, or transfer-encoded bodies are malformed.