August is a library for converting HTML to plain text.
The main goal of this library is to provide readable and efficent results when converting HTML emails into text, and so it is designed with that in mind. For example
- There's no way to reliably convert the output of this program back into HTML. Adding the extra markup for that impeeds readability and isn’t useful in an email anyway.
- A fair bit of work is done to make sure that tables are rendered nicely. Emails often use tables for layout because CSS support is patchy.
- We try hard to get whitespace correct so you don't end up withtextlikethis or like this around element boundaries.
- Currently we don't support CSS at all
- There are a few elements <bdo>, <sup>, and <sub> that we should support but don’t.
- We don’t support <ruby> and related elements. Ruby was intentionally designed to fallback, so that’s probably fine.
Converts HTML text into plain text
Converts a loaded markup5ever DOM into a text string
Converts HTML text into plain text, using an I/O reader & writer
Grapheme width of text