Crate table_extract
source ·Expand description
Utility for extracting data from HTML tables.
This library allows you to parse tables from HTML documents and iterate over their rows. There are three entry points:
Table::find_first
finds the first table.Table::find_by_id
finds a table by its HTML id.Table::find_by_headers
finds a table that has certain headers.
Each of these returns an Option<
Table
>
, since there might not be any
matching table in the HTML. Once you have a table, you can iterate over it
and access the contents of each Row
.
Examples
Here is a simple example that uses Table::find_first
to print the cells
in each row of a table:
let html = r#"
<table>
<tr><th>Name</th><th>Age</th></tr>
<tr><td>John</td><td>20</td></tr>
</table>
"#;
let table = table_extract::Table::find_first(html).unwrap();
for row in &table {
println!(
"{} is {} years old",
row.get("Name").unwrap_or("<name missing>"),
row.get("Age").unwrap_or("<age missing>")
)
}
If the document has multiple tables, we can use Table::find_by_headers
to identify the one we want:
let html = r#"
<table></table>
<table>
<tr><th>Name</th><th>Age</th></tr>
<tr><td>John</td><td>20</td></tr>
</table>
"#;
let table = table_extract::Table::find_by_headers(html, &["Age"]).unwrap();
for row in &table {
for cell in row {
println!("Table cell: {}", cell);
}
}
Structs
Type Definitions
- A map from
<th>
table headers to their zero-based positions.