Expand description
Email content cleanup primitives — HTML → readable text + sender heuristics.
Four entry points cover what a mail client / inbound pipeline typically needs after parsing an RFC 5322 message:
clean_email_html— multi-stage HTML pipeline that strips tracking pixels, hidden blocks, marketing-template chrome, and unsafe elements, then converts what’s left to a paragraph-aware plain-text view. ReturnsCleanResultwith the cleaned text plus boolean flags the caller can fold into an importance / spam score.detect_bulk_sender— RFC 2369List-*header heuristic, used to demote mailing-list traffic in inbox sorting.is_automated_sender— local-part pattern check forno-reply@,notification@, etc.split_quoted_content— separate a fresh reply from its quoted ancestry so UIs can collapse old context.
Zero I/O, no async runtime — give it strings, get strings back.
Structs§
- Clean
Result - Result of
clean_email_html.
Functions§
- clean_
email_ html - clean html email content through multi-stage pipeline
- detect_
bulk_ sender - detect if sender is a bulk/automated sender based on email headers
- is_
automated_ sender - detect automated/noreply senders
- split_
quoted_ content - extract quoted text boundary from email text returns (new_content, quoted_parts) where new_content is the original reply text