Crate mlscraper_rust

source ·
Expand description

Tool for scraping structured data from webpages automatically.

This project is inspired by the python package mlscraper. See README.md for a comparison with the python version and example code.

Quick example:

let html = reqwest::blocking::get("http://quotes.toscrape.com/author/Albert-Einstein/")
    .expect("request") // Scrappy error handling for demonstration purposes
    .text()
    .expect("text");

let result = mlscraper_rust::train(
    vec![html.as_str()],
    vec![
        AttributeBuilder::new("name")
            .values(&[Some("Albert Einstein")])
            .build(),

        AttributeBuilder::new("born")
            .values(&[Some("March 14, 1879")])
            .build(),
    ],
    Default::default(),
    1
).expect("training");

// Prints `{"born": .author-born-date, "name": h3}`
println!("{:?}", result.selectors());

Modules

Functions

  • Find suitable selectors for attributes in HTML documents documents.
  • Same as train, but with a custom random number generator (Rng).