Crate html2text

source ·
Expand description

Convert HTML to text formats.

This crate renders HTML into a text format, wrapped to a specified width. This can either be plain text or with extra annotations to (for example) show in a terminal which supports colours.

§Examples

let html = b"
       <ul>
         <li>Item one</li>
         <li>Item two</li>
         <li>Item three</li>
       </ul>";
assert_eq!(from_read(&html[..], 20),
           "\
* Item one
* Item two
* Item three
");

A couple of simple demonstration programs are included as examples:

§html2text

The simplest example uses from_read to convert HTML on stdin into plain text:

$ cargo run --example html2text < foo.html
[...]

§html2term

A very simple example of using the rich interface (from_read_rich) for a slightly interactive console HTML viewer is provided as html2term.

$ cargo run --example html2term foo.html
[...]

Note that this example takes the HTML file as a parameter so that it can read keys from stdin.

Modules§

  • Configure the HTML to text translation using the Config type, which can be constructed using one of the functions in this module.
  • Module containing the Renderer interface for constructing a particular text output.

Structs§

Enums§

  • Errors from reading or rendering HTML
  • The node-specific information distilled from the DOM.

Functions§

  • Convert a DOM tree or subtree into a render tree.
  • Reads HTML from input, and returns a String with text wrapped to width columns.
  • Reads HTML from input, and returns text wrapped to width columns. The text is returned as a Vec<TaggedLine<_>>; the annotations are vectors of RichAnnotation. The “outer” annotation comes first in the Vec.
  • Reads HTML from input, and returns text wrapped to width columns. The text is returned as a Vec<TaggedLine<_>>; the annotations are vectors of RichAnnotation. The “outer” annotation comes first in the Vec.
  • Reads HTML from input, decorates it using decorator, and returns a String with text wrapped to width columns.
  • Reads and parses HTML from input and prepares a render tree.