[][src]Crate august

August is a library for converting HTML to plain text.

Design

The main goal of this library is to provide readable and efficent results when converting HTML emails into text, and so it is designed with that in mind. For example

  • There's no way to reliably convert the output of this program back into HTML. Adding the extra markup for that impeeds readability and isn’t useful in an email anyway.
  • A fair bit of work is done to make sure that tables are rendered nicely. Emails often use tables for layout because CSS support is patchy.
  • We try hard to get whitespace correct so you don't end up withtextlikethis  or  like  this around element boundaries.

Limitations

  • Currently we don't support CSS at all
  • There are a few elements <bdo>, <sup>, and <sub> that we should support but don’t.
  • We don’t support <ruby> and related elements. Ruby was intentionally designed to fallback, so that’s probably fine.

Usage

Just call the convert or convert_io functions.

Functions

convert

Converts HTML text into plain text

convert_dom

Converts a loaded markup5ever DOM into a text string

convert_dom_io

Take a loaded markup5ever DOM, and send the converted text to an I/O writer

convert_dom_io_unstyled

Take a loaded markup5ever DOM, and send the converted unstyled text to an I/O writer

convert_dom_unstyled

Converts a loaded markup5ever DOM into an unstyled text string

convert_io

Converts HTML text into plain text, using an I/O reader & writer

convert_unstyled

Converts HTML text into unstyled plain text

Type Definitions

Width

Grapheme width of text