pub struct Document { /* private fields */ }Expand description
An HTML document containing a tree of nodes.
The document owns all nodes via an arena allocator, ensuring
cache-friendly contiguous storage. Navigation is performed
using NodeId handles.
§Architecture
Nodes are stored in a single contiguous Arena<Node>. Parent-child
relationships use first_child/last_child links (O(1) append),
and siblings are doubly linked via prev_sibling/next_sibling.
§Navigation
The document provides both direct navigation methods and lazy iterators:
parent,first_child,last_child- direct linkschildren- iterate over direct childrenancestors- iterate from parent to rootdescendants- depth-first subtree traversal
Implementations§
Source§impl Document
impl Document
Sourcepub fn new() -> Self
pub fn new() -> Self
Creates a new empty document with default capacity.
The default capacity is 256 nodes, which is sufficient for typical HTML pages and reduces reallocations during parsing.
Sourcepub fn with_capacity(capacity: usize) -> Self
pub fn with_capacity(capacity: usize) -> Self
Creates a new empty document with the specified capacity.
Use this when you know the approximate number of nodes to avoid reallocations.
Sourcepub fn get(&self, id: NodeId) -> Option<&Node>
pub fn get(&self, id: NodeId) -> Option<&Node>
Returns a reference to the node with the given ID.
Sourcepub fn get_mut(&mut self, id: NodeId) -> Option<&mut Node>
pub fn get_mut(&mut self, id: NodeId) -> Option<&mut Node>
Returns a mutable reference to the node with the given ID.
Sourcepub fn create_element(
&mut self,
name: impl Into<String>,
attributes: HashMap<String, String>,
) -> NodeId
pub fn create_element( &mut self, name: impl Into<String>, attributes: HashMap<String, String>, ) -> NodeId
Creates a new element node and returns its ID.
Sourcepub fn create_text(&mut self, content: impl Into<String>) -> NodeId
pub fn create_text(&mut self, content: impl Into<String>) -> NodeId
Creates a new text node and returns its ID.
Sourcepub fn create_comment(&mut self, content: impl Into<String>) -> NodeId
pub fn create_comment(&mut self, content: impl Into<String>) -> NodeId
Creates a new comment node and returns its ID.
Sourcepub fn append_child(&mut self, parent_id: NodeId, child_id: NodeId)
pub fn append_child(&mut self, parent_id: NodeId, child_id: NodeId)
Appends a child node to a parent.
Updates parent, first_child, last_child, and sibling links.
§Panics
Panics in debug builds if parent_id or child_id are invalid.
Sourcepub fn nodes(&self) -> impl Iterator<Item = (NodeId, &Node)>
pub fn nodes(&self) -> impl Iterator<Item = (NodeId, &Node)>
Returns an iterator over all nodes.
Sourcepub fn first_child(&self, id: NodeId) -> Option<NodeId>
pub fn first_child(&self, id: NodeId) -> Option<NodeId>
Returns the first child of a node.
Sourcepub fn last_child(&self, id: NodeId) -> Option<NodeId>
pub fn last_child(&self, id: NodeId) -> Option<NodeId>
Returns the last child of a node.
Sourcepub fn next_sibling(&self, id: NodeId) -> Option<NodeId>
pub fn next_sibling(&self, id: NodeId) -> Option<NodeId>
Returns the next sibling of a node.
Sourcepub fn prev_sibling(&self, id: NodeId) -> Option<NodeId>
pub fn prev_sibling(&self, id: NodeId) -> Option<NodeId>
Returns the previous sibling of a node.
Sourcepub fn children(&self, id: NodeId) -> ChildrenIter<'_> ⓘ
pub fn children(&self, id: NodeId) -> ChildrenIter<'_> ⓘ
Returns an iterator over children of a node.
The iterator yields children in order from first to last.
§Examples
use std::collections::HashMap;
use scrape_core::Document;
let mut doc = Document::new();
let parent = doc.create_element("div", HashMap::new());
let child1 = doc.create_element("span", HashMap::new());
let child2 = doc.create_element("span", HashMap::new());
doc.append_child(parent, child1);
doc.append_child(parent, child2);
let children: Vec<_> = doc.children(parent).collect();
assert_eq!(children.len(), 2);Sourcepub fn ancestors(&self, id: NodeId) -> AncestorsIter<'_> ⓘ
pub fn ancestors(&self, id: NodeId) -> AncestorsIter<'_> ⓘ
Returns an iterator over ancestors of a node.
The iterator yields ancestors from parent to root (does not include the node itself).
§Examples
use std::collections::HashMap;
use scrape_core::Document;
let mut doc = Document::new();
let grandparent = doc.create_element("html", HashMap::new());
let parent = doc.create_element("body", HashMap::new());
let child = doc.create_element("div", HashMap::new());
doc.append_child(grandparent, parent);
doc.append_child(parent, child);
let ancestors: Vec<_> = doc.ancestors(child).collect();
assert_eq!(ancestors.len(), 2); // parent, grandparentSourcepub fn descendants(&self, id: NodeId) -> DescendantsIter<'_> ⓘ
pub fn descendants(&self, id: NodeId) -> DescendantsIter<'_> ⓘ
Returns an iterator over descendants in depth-first pre-order.
Does not include the starting node itself.
§Examples
use std::collections::HashMap;
use scrape_core::Document;
let mut doc = Document::new();
let root = doc.create_element("html", HashMap::new());
let child1 = doc.create_element("head", HashMap::new());
let child2 = doc.create_element("body", HashMap::new());
let grandchild = doc.create_element("div", HashMap::new());
doc.append_child(root, child1);
doc.append_child(root, child2);
doc.append_child(child2, grandchild);
let descendants: Vec<_> = doc.descendants(root).collect();
assert_eq!(descendants.len(), 3); // head, body, div