Crate table_extract

Expand description

Utility for extracting data from HTML tables.

This library allows you to parse tables from HTML documents and iterate over their rows. There are three entry points:

Table::find_first finds the first table.
Table::find_by_id finds a table by its HTML id.
Table::find_by_headers finds a table that has certain headers.

Each of these returns an Option<Table>, since there might not be any matching table in the HTML. Once you have a table, you can iterate over it and access the contents of each Row.

§Examples

Here is a simple example that uses Table::find_first to print the cells in each row of a table:

let html = r#"
    <table>
        <tr><th>Name</th><th>Age</th></tr>
        <tr><td>John</td><td>20</td></tr>
    </table>
"#;
let table = table_extract::Table::find_first(html).unwrap();
for row in &table {
    println!(
        "{} is {} years old",
        row.get("Name").unwrap_or("<name missing>"),
        row.get("Age").unwrap_or("<age missing>")
    )
}

If the document has multiple tables, we can use Table::find_by_headers to identify the one we want:

let html = r#"
    <table></table>
    <table>
        <tr><th>Name</th><th>Age</th></tr>
        <tr><td>John</td><td>20</td></tr>
    </table>
"#;
let table = table_extract::Table::find_by_headers(html, &["Age"]).unwrap();
for row in &table {
    for cell in row {
        println!("Table cell: {}", cell);
    }
}

Structs§

Iter: An iterator over the rows in a Table.
Row: A row in a Table.
Table: A parsed HTML table.

Type Aliases§

Headers: A map from <th> table headers to their zero-based positions.

Crate table_extractCopy item path

§Examples

Structs§

Type Aliases§

Crate table_extract