mime-tree 0.2.2

RFC 5322/MIME parser producing a byte-range-indexed part tree
Documentation

mime-tree

License: MIT OR Apache-2.0 MSRV: 1.85

RFC 5322 / MIME parser that produces a walkable, byte-range-indexed part tree. Given raw message bytes, it returns a ParsedMessage with the full MIME structure, RFC 8621-compatible body views, and on-demand body decoding.

Why this crate exists

Most MIME parsers either give back owned strings (losing the original byte positions needed for S/MIME signature verification) or expose the underlying parsing library's types in their API (locking callers to that dependency). mime-tree gives you (offset, length) byte ranges into your original &[u8] buffer — so you can feed the exact bytes of a signed part directly to a cryptographic verifier without copying or re-encoding. The parsed result is fully owned, lifetime-free, and Serialize + Deserialize, so it round-trips through any store or message bus.

For S/MIME sign/verify/encrypt/decrypt, see the companion crate smime-tree.

Quick example

use mime_tree::{parse, decode_body_value};

let raw: &[u8] = b"From: alice@example.com\r\n\
                   Content-Type: text/plain; charset=utf-8\r\n\
                   \r\n\
                   Hello, world!\r\n";

let msg = parse(raw).expect("parse failed");

for id in &msg.text_body {
    let part = msg.part_index.find_by_id(id).unwrap();
    let decoded = decode_body_value(raw, part, None).unwrap();
    println!("{}", decoded.value);
}

Key types

ParsedMessage

The result of parse(). All fields are owned; no lifetime parameters.

Field Type Description
part_index ParsedPart Root of the MIME part tree
text_body Vec<String> Part IDs of text/plain body parts (RFC 8621 §4.1.4)
html_body Vec<String> Part IDs of text/html body parts
attachments Vec<String> Part IDs of attachment parts
headers Vec<ParsedHeader> Top-level message headers
preview Option<String> First ~256 chars of text content
warnings Vec<String> Non-fatal parse warnings

ParsedMessage implements Serialize + Deserialize.

ParsedPart

A single node in the MIME tree.

Field Type Description
part_id String IMAP dotted-path ID: "1", "1.1", "1.2", …
content_type String Media type/subtype, e.g. "text/plain"
charset Option<String> Charset from Content-Type, if present
transfer_encoding TransferEncoding See table below
disposition Option<String> Content-Disposition value
filename Option<String> Filename from Content-Disposition or Content-Type
cid Option<String> Content-ID header value
header_range (u32, u32) (offset, length) of part headers in original bytes
body_range (u32, u32) (offset, length) of part body (pre-decode) in original bytes
children Vec<ParsedPart> Child parts — non-empty for multipart/* only

Byte ranges use u32 so the serialized representation is stable across 32-bit and 64-bit hosts. MIME messages are bounded well within 4 GiB.

TransferEncoding variants

Variant CTE header value(s)
Identity none / 7bit / 8bit / binary (also the fallback for unknown values)
QuotedPrintable quoted-printable
Base64 base64
UUEncode x-uuencode, x-uue, uuencode
SevenBit 7bit
EightBit 8bit
Binary binary

Unknown CTE values fall back to Identity and add a warning to ParsedMessage::warnings.

DecodedBodyValue

Returned by decode_body_value().

Field Type Description
value String Decoded, charset-converted text
is_truncated bool True if max_bytes limit was reached
is_encoding_problem bool True if transfer-decode or charset conversion encountered an error

Decoding body content

decode_body_value slices the raw bytes using a part's body_range, applies transfer-encoding decode (Base64, Quoted-Printable, UUencode, etc.), and charset-converts the result to UTF-8 via encoding_rs. Decoding is on-demand — parse time is O(message size) and does not decode any bodies.

// Decode with a 64 KiB preview cap (pass None for unlimited).
let decoded = decode_body_value(raw, &part, Some(65_536))?;
if decoded.is_truncated {
    // body was larger than max_bytes
}
if decoded.is_encoding_problem {
    // transfer-decode or charset conversion hit an error; `value` may be partial
}

Inline UUencoded blocks

Some legacy messages — especially Usenet archives and mailing-list digests from the 1990s — embed UU-encoded files inside text/plain bodies with no Content-Transfer-Encoding header. Use scan_inline_uuencode to locate and decode those blocks:

use mime_tree::{parse, scan_inline_uuencode};

let raw: &[u8] = /* raw message bytes */;
let msg = parse(raw).unwrap();

for id in &msg.text_body {
    let part = msg.part_index.find_by_id(id).unwrap();
    for block in scan_inline_uuencode(raw, part) {
        if !block.is_encoding_problem {
            println!("found {} ({} bytes, mode {:o})",
                block.filename, block.data.len(), block.mode);
        }
    }
}

InlineUUBlock fields:

Field Type Description
begin_offset u32 Absolute byte offset of the begin line in raw
begin_length u32 Byte length of the entire block (through end\n)
mode u32 Unix permission mode from the begin line
filename String Filename from the begin line
data Vec<u8> Decoded binary payload
is_encoding_problem bool True if the block was truncated or malformed

Inline yEnc blocks

Usenet binary posts from the 2000s onward typically use yEnc encoding with no Content-Transfer-Encoding header — the article body is simply text/plain with =ybegin/=yend framing embedded in it. Use scan_inline_yencode to locate and decode those blocks:

use mime_tree::{parse, scan_inline_yencode};

let raw: &[u8] = /* raw message bytes */;
let msg = parse(raw).unwrap();

for id in &msg.text_body {
    let part = msg.part_index.find_by_id(id).unwrap();
    for block in scan_inline_yencode(raw, part) {
        if !block.is_encoding_problem {
            println!("found {} ({} bytes)", block.filename, block.data.len());
        }
    }
}

A reasonable heuristic before calling: check whether the part's decoded text contains the byte sequence b"=ybegin ".

InlineYEncBlock fields:

Field Type Description
begin_offset u32 Absolute byte offset of the =ybegin line in raw
begin_length u32 Byte length of the entire block (through =yend\n)
filename String Filename from =ybegin name=
file_size u64 Total file size from =ybegin size=
part Option<u32> Part number (multi-part only)
total_parts Option<u32> Total parts (multi-part only)
part_begin Option<u64> 1-based start offset in full file (multi-part only)
part_end Option<u64> 1-based end offset in full file (multi-part only)
data Vec<u8> Decoded binary payload
crc32_verified bool True if CRC32 was present and matched
is_encoding_problem bool True if the block was truncated, had a bad header, or CRC mismatch

For multi-part reassembly, pass each InlineYEncBlock's fields to yencoding_multi::Assembler.

Design invariants

  • No JMAP dependency. General-purpose MIME parser; no jmap-mail-types.
  • No S/MIME crypto. application/pkcs7-mime and application/pkcs7-signature parts are treated as opaque binary leaves. Use smime-tree for S/MIME processing.
  • Best-effort parsing. Malformed input yields a partial result plus warnings; only truly unparsable input (empty bytes, no headers) returns Err.
  • No async. Synchronous only.
  • Byte ranges, not stored bytes. The crate never retains the raw message bytes.

Specification references

RFC Title
RFC 5322 Internet Message Format
RFC 2045 MIME Part One: Format of Internet Message Bodies
RFC 2046 MIME Part Two: Media Types (multipart boundaries)
RFC 2047 MIME Part Three: Encoded-Word in headers
RFC 2183 Content-Disposition header
RFC 2231 MIME Parameter Value and Encoded Word Extensions
RFC 8621 §4.1.4 JMAP for Mail — body structure algorithm

License

Licensed under either of MIT or Apache-2.0 at your option.