Crate html2json

Crate html2json 

Source
Expand description

html2json - HTML to JSON extractor using html5ever

A Rust port of cheerio-json-mapper using html5ever for HTML parsing.

§Overview

This library extracts structured JSON data from HTML using CSS selectors defined in a JSON spec format.

§Basic Example

use html2json::{extract, Spec};

let html = r#"<html><body><h1>Hello</h1><p class="desc">World</p></body></html>"#;
let spec_json = r#"{"title": "h1", "description": "p.desc"}"#;
let spec: Spec = serde_json::from_str(spec_json)?;
let result = extract(html, &spec)?;
assert_eq!(result["title"], "Hello");
assert_eq!(result["description"], "World");

Re-exports§

pub use extractor::Extractor;
pub use spec::Spec;

Modules§

dom
DOM module wrapping scraper (html5ever) for HTML parsing and CSS selector matching
extractor
Main extractor module
pipe
Pipe transformation module
spec
Spec parsing module

Functions§

extract
Extract JSON from HTML using a spec