mime-tree 0.2.0

RFC 5322/MIME parser producing a byte-range-indexed part tree
Documentation

mime-tree

License: MIT OR Apache-2.0 MSRV: 1.85

RFC 5322 / MIME parser that produces a walkable, byte-range-indexed part tree. Given raw message bytes, it returns a ParsedMessage with the full MIME structure, RFC 8621-compatible body views, and on-demand body decoding.

Why this crate exists

Most MIME parsers either give back owned strings (losing the original byte positions needed for S/MIME signature verification) or expose the underlying parsing library's types in their API (locking callers to that dependency). mime-tree gives you (offset, length) byte ranges into your original &[u8] buffer — so you can feed the exact bytes of a signed part directly to a cryptographic verifier without copying or re-encoding. The parsed result is fully owned, lifetime-free, and Serialize + Deserialize, so it round-trips through any store or message bus.

For S/MIME sign/verify/encrypt/decrypt, see the companion crate smime-tree.

Quick example

use mime_tree::{parse, decode_body_value};

let raw: &[u8] = b"From: alice@example.com\r\n\
                   Content-Type: text/plain; charset=utf-8\r\n\
                   \r\n\
                   Hello, world!\r\n";

let msg = parse(raw).expect("parse failed");

// Walk the text_body part IDs to find plain-text parts.
for id in &msg.text_body {
    let part = msg.part_index.find_by_id(id).unwrap();
    let decoded = decode_body_value(raw, part, None).unwrap();
    println!("{}", decoded.value);
}

Key types

ParsedMessage

The result of parse(). All fields are owned; no lifetime parameters.

Field Type Description
part_index ParsedPart Root of the MIME part tree
text_body Vec<String> Part IDs of text/plain body parts (RFC 8621 §4.1.4)
html_body Vec<String> Part IDs of text/html body parts
attachments Vec<String> Part IDs of attachment parts
headers Vec<ParsedHeader> Top-level message headers
preview Option<String> First ~256 chars of text content
warnings Vec<String> Non-fatal parse warnings

ParsedMessage implements Serialize + Deserialize — store it however you like.

ParsedPart

A single node in the MIME tree.

Field Type Description
part_id String IMAP dotted-path ID: "1", "1.1", "1.2", …
content_type String Media type/subtype, e.g. "text/plain"
charset Option<String> Charset from Content-Type, if present
transfer_encoding TransferEncoding Identity | QuotedPrintable | Base64 | SevenBit | EightBit | Binary
disposition Option<String> Content-Disposition value
filename Option<String> Filename from Content-Disposition or Content-Type
cid Option<String> Content-ID header value
header_range (u32, u32) (offset, length) of part headers in original bytes
body_range (u32, u32) (offset, length) of part body (pre-decode) in original bytes
children Vec<ParsedPart> Child parts — non-empty only for multipart/*

Byte ranges use u32 so the serialized representation is identical on 32-bit and 64-bit hosts. MIME messages are bounded well within 4 GiB.

DecodedBodyValue

Returned by decode_body_value().

Field Type Description
value String Decoded, charset-converted text
is_truncated bool True if max_bytes limit was reached
is_encoding_problem bool True if charset conversion found unmappable characters

Decoding body content

decode_body_value slices the raw bytes using a part's body_range, applies transfer-encoding decode (Base64, Quoted-Printable, etc.), and charset-converts the result to UTF-8 via encoding_rs. Decoding is on-demand — parse time is fast.

// Decode with a 64 KiB cap (pass None for unlimited).
let decoded = decode_body_value(raw, &part, Some(65_536))?;
if decoded.is_truncated {
    // body was larger than max_bytes
}

Design invariants

  • No JMAP dependency. General-purpose MIME parser; no jmap-mail-types.
  • No S/MIME crypto. application/pkcs7-mime and application/pkcs7-signature parts are treated as opaque binary leaves. Use smime-tree for S/MIME processing.
  • Best-effort parsing. Malformed input yields a partial result plus warnings; only truly unparsable input (empty bytes, no headers) returns Err.
  • No async. Synchronous only.
  • Byte ranges, not stored bytes. The crate never retains the raw message bytes.

Specification references

RFC Title
RFC 5322 Internet Message Format
RFC 2045 MIME Part One: Format of Internet Message Bodies
RFC 2046 MIME Part Two: Media Types (multipart boundaries)
RFC 2047 MIME Part Three: Encoded-Word in headers
RFC 2183 Content-Disposition header
RFC 2231 MIME Parameter Value and Encoded Word Extensions
RFC 8621 §4.1.4 JMAP for Mail — body structure algorithm (textBody / htmlBody / attachments)

License

Licensed under either of MIT or Apache-2.0 at your option.