Skip to main content

Crate mailrs_clean

Crate mailrs_clean 

Source
Expand description

Email content cleanup primitives — HTML → readable text + sender heuristics.

Four entry points cover what a mail client / inbound pipeline typically needs after parsing an RFC 5322 message:

  • clean_email_html — multi-stage HTML pipeline that strips tracking pixels, hidden blocks, marketing-template chrome, and unsafe elements, then converts what’s left to a paragraph-aware plain-text view. Returns CleanResult with the cleaned text plus boolean flags the caller can fold into an importance / spam score.
  • detect_bulk_sender — RFC 2369 List-* header heuristic, used to demote mailing-list traffic in inbox sorting.
  • is_automated_sender — local-part pattern check for no-reply@, notification@, etc.
  • split_quoted_content — separate a fresh reply from its quoted ancestry so UIs can collapse old context.

Zero I/O, no async runtime — give it strings, get strings back.

Structs§

CleanResult
Result of clean_email_html.

Functions§

clean_email_html
clean html email content through multi-stage pipeline
detect_bulk_sender
detect if sender is a bulk/automated sender based on email headers
is_automated_sender
detect automated/noreply senders
split_quoted_content
extract quoted text boundary from email text returns (new_content, quoted_parts) where new_content is the original reply text