Crate table_extract

Source
Expand description

Utility for extracting data from HTML tables.

This library allows you to parse tables from HTML documents and iterate over their rows. There are three entry points:

Each of these returns an Option<Table>, since there might not be any matching table in the HTML. Once you have a table, you can iterate over it and access the contents of each Row.

§Examples

Here is a simple example that uses Table::find_first to print the cells in each row of a table:

let html = r#"
    <table>
        <tr><th>Name</th><th>Age</th></tr>
        <tr><td>John</td><td>20</td></tr>
    </table>
"#;
let table = table_extract::Table::find_first(html).unwrap();
for row in &table {
    println!(
        "{} is {} years old",
        row.get("Name").unwrap_or("<name missing>"),
        row.get("Age").unwrap_or("<age missing>")
    )
}

If the document has multiple tables, we can use Table::find_by_headers to identify the one we want:

let html = r#"
    <table></table>
    <table>
        <tr><th>Name</th><th>Age</th></tr>
        <tr><td>John</td><td>20</td></tr>
    </table>
"#;
let table = table_extract::Table::find_by_headers(html, &["Age"]).unwrap();
for row in &table {
    for cell in row {
        println!("Table cell: {}", cell);
    }
}

Structs§

Iter
An iterator over the rows in a Table.
Row
A row in a Table.
Table
A parsed HTML table.

Type Aliases§

Headers
A map from <th> table headers to their zero-based positions.