[][src]Crate html2text

Convert HTML to text formats.

This crate renders HTML into a text format, wrapped to a specified width. This can either be plain text or with extra annotations to (for example) show in a terminal which supports colours.

Examples

let html = b"
       <ul>
         <li>Item one</li>
         <li>Item two</li>
         <li>Item three</li>
       </ul>";
assert_eq!(from_read(&html[..], 20),
           "\
* Item one
* Item two
* Item three
");

A couple of simple demonstration programs are included as examples:

html2text

The simplest example uses from_read to convert HTML on stdin into plain text:

$ cargo run --example html2text < foo.html
[...]

html2term

A very simple example of using the rich interface (from_read_rich) for a slightly interactive console HTML viewer is provided as html2term.

$ cargo run --example html2term foo.html
[...]

Note that this example takes the HTML file as a parameter so that it can read keys from stdin.

Modules

render

Module containing the Renderer interface for constructing a particular text output.

Structs

RenderNode

Common fields from a node.

RenderTable

A representation of a table render tree with metadata.

RenderTableCell

Render tree table cell

RenderTableRow

Render tree table row

RenderTree

The structure of an HTML document that can be rendered using a TextDecorator.

RenderedText

A rendered HTML document.

SizeEstimate

Size information/estimate

Enums

RenderNodeInfo

The node-specific information distilled from the DOM.

Functions

dom_to_render_tree

Convert a DOM tree or subtree into a render tree.

from_read

Reads HTML from input, and returns a String with text wrapped to width columns.

from_read_rich

Reads HTML from input, and returns text wrapped to width columns. The text is returned as a Vec<TaggedLine<_>>; the annotations are vectors of RichAnnotation. The "outer" annotation comes first in the Vec.

from_read_with_decorator

Reads HTML from input, decorates it using decorator, and returns a String with text wrapped to width columns.

parse

Reads and parses HTML from input and prepares a render tree.