Expand description
§scrape-core
High-performance HTML parsing library with CSS selector support.
This crate provides the core functionality for parsing HTML documents and querying them using CSS selectors. It is designed to be fast, memory-efficient, and spec-compliant.
§Quick Start
use scrape_core::{Html5everParser, Parser, Soup, SoupConfig};
// Parse HTML using Soup (high-level API)
let html = "<html><body><div class=\"product\">Hello</div></body></html>";
let soup = Soup::parse(html);
// Find elements using CSS selectors
if let Ok(Some(div)) = soup.find("div.product") {
assert_eq!(div.text(), "Hello");
}
// Or use the parser directly (low-level API)
let parser = Html5everParser;
let document = parser.parse(html).unwrap();
assert!(document.root().is_some());§Features
- Fast parsing: Built on
html5everfor spec-compliant HTML5 parsing - CSS selectors: Full CSS selector support via the
selectorscrate - Memory efficient: Arena-based allocation for DOM nodes
- SIMD acceleration: Optional SIMD support for faster byte scanning
§CSS Selector Support
The query engine supports most CSS3 selectors:
use scrape_core::Soup;
let html = r#"
<div class="container">
<ul id="list">
<li class="item active">One</li>
<li class="item">Two</li>
<li class="item">Three</li>
</ul>
</div>
"#;
let soup = Soup::parse(html);
// Type selector
let divs = soup.find_all("div").unwrap();
// Class selector
let items = soup.find_all(".item").unwrap();
// ID selector
let list = soup.find("#list").unwrap();
// Compound selector
let active = soup.find("li.item.active").unwrap();
// Descendant combinator
let nested = soup.find_all("div li").unwrap();
// Child combinator
let direct = soup.find_all("ul > li").unwrap();
// Attribute selectors
let with_id = soup.find_all("[id]").unwrap();Re-exports§
pub use query::Filter;pub use query::QueryError;pub use query::QueryResult;
Modules§
- query
- Query engine for finding elements in the DOM.
Structs§
- Ancestors
Iter - Iterator over ancestors of a node (parent, grandparent, …).
- Children
Iter - Iterator over direct children of a node.
- Descendants
Iter - Iterator over descendants in depth-first pre-order.
- Document
- An HTML document containing a tree of nodes.
- Html5ever
Parser - HTML5 spec-compliant parser using html5ever.
- Node
- A node in the DOM tree.
- NodeId
- A node ID in the DOM tree.
- Parse
Config - Configuration for HTML parsing behavior.
- Soup
- A parsed HTML document.
- Soup
Config - Configuration options for HTML parsing.
- Tag
- A reference to an element in the document.
Enums§
- Error
- Errors that can occur during HTML parsing and querying.
- Node
Kind - Types of nodes in the DOM tree.
- Parse
Error - Errors that can occur during HTML parsing.
Traits§
- Parser
- A sealed trait for HTML parsers.
Type Aliases§
- Parse
Result - Result type for parser operations.
- Result
- Result type alias using
Error.