DOM_FINDER
dom_finder is a Rust crate that provides functionality for finding elements in the Document Object Model (DOM) of HTML documents.
It allows you to easily locate specific elements based on various CSS criteria.
With dom_finder, you can extract data from HTML documents and transform it before getting the result.
Currently, the functionality relies on YAML configuration.
Examples
General
use ;
const CFG_YAML: &str = r"
name: root
base_path: html
children:
- name: results
base_path: div.serp__results div.result
many: true
children:
- name: url
base_path: h2.result__title > a[href]
extract: href
- name: title
base_path: h2.result__title
extract: text
- name: snippet
base_path: a.result__snippet
extract: html
sanitize_policy: highlight
pipeline: [ [ normalize_spaces ] ]
";
const HTML_DOC: &str = include_str!;
Remove selection
use ;
use Document;
const HTML_DOC: &str = include_str!;
More examples
Features
json_cfg-- optional, allow to load config from JSON string.
License
This project is licensed under either of
- Apache License, Version 2.0, (LICENSE-APACHE or https://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or https://opensource.org/licenses/MIT)
at your option.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in this crate by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.