pub struct DocumentImpl<S: DocumentState = Queryable> { /* private fields */ }Expand description
An HTML document containing a tree of nodes.
The document owns all nodes via an arena allocator, ensuring
cache-friendly contiguous storage. Navigation is performed
using NodeId handles.
§Architecture
Nodes are stored in a single contiguous Arena<Node>. Parent-child
relationships use first_child/last_child links (O(1) append),
and siblings are doubly linked via prev_sibling/next_sibling.
§Navigation
The document provides both direct navigation methods and lazy iterators:
parent,first_child,last_child- direct linkschildren- iterate over direct childrenancestors- iterate from parent to rootdescendants- depth-first subtree traversal
§Typestate
Internally, the document uses typestate pattern to enforce lifecycle
guarantees at compile time. The type parameter S tracks whether
the document is being built, is queryable, or is sealed.
Implementations§
Source§impl DocumentImpl<Building>
impl DocumentImpl<Building>
Sourcepub fn new() -> Self
pub fn new() -> Self
Creates a new empty document in building state.
The default capacity is 256 nodes, which is sufficient for typical HTML pages and reduces reallocations during parsing.
Sourcepub fn with_capacity(capacity: usize) -> Self
pub fn with_capacity(capacity: usize) -> Self
Creates a new empty document with the specified capacity.
Use this when you know the approximate number of nodes to avoid reallocations.
Sourcepub fn create_element(
&mut self,
name: impl Into<String>,
attributes: HashMap<String, String>,
) -> NodeId
pub fn create_element( &mut self, name: impl Into<String>, attributes: HashMap<String, String>, ) -> NodeId
Creates a new element node and returns its ID.
Sourcepub fn create_text(&mut self, content: impl Into<String>) -> NodeId
pub fn create_text(&mut self, content: impl Into<String>) -> NodeId
Creates a new text node and returns its ID.
Sourcepub fn create_comment(&mut self, content: impl Into<String>) -> NodeId
pub fn create_comment(&mut self, content: impl Into<String>) -> NodeId
Creates a new comment node and returns its ID.
Sourcepub fn append_child(&mut self, parent_id: NodeId, child_id: NodeId)
pub fn append_child(&mut self, parent_id: NodeId, child_id: NodeId)
Appends a child node to a parent.
Updates parent, first_child, last_child, and sibling links.
§Panics
Panics in debug builds if parent_id or child_id are invalid.
Sourcepub fn build(self) -> DocumentImpl<Queryable>
pub fn build(self) -> DocumentImpl<Queryable>
Transitions the document from Building to Queryable state.
This is a one-way transition. Once built, the document structure cannot be modified.
§Examples
use std::collections::HashMap;
use scrape_core::{Building, DocumentImpl};
let mut doc = DocumentImpl::<Building>::new();
let root = doc.create_element("div", HashMap::new());
doc.set_root(root);
// Transition to queryable state
let doc = doc.build();Source§impl DocumentImpl<Queryable>
impl DocumentImpl<Queryable>
Sourcepub fn new() -> Self
pub fn new() -> Self
Creates a new empty document in queryable state.
This is a convenience method for backward compatibility. Internally, it creates a Building document and immediately transitions to Queryable.
Sourcepub fn with_capacity(capacity: usize) -> Self
pub fn with_capacity(capacity: usize) -> Self
Creates a new empty document with the specified capacity in queryable state.
This is a convenience method for backward compatibility.
Sourcepub fn set_root(&mut self, id: NodeId)
pub fn set_root(&mut self, id: NodeId)
Sets the root node ID.
Available on Queryable for backward compatibility with tests.
Sourcepub fn create_element(
&mut self,
name: impl Into<String>,
attributes: HashMap<String, String>,
) -> NodeId
pub fn create_element( &mut self, name: impl Into<String>, attributes: HashMap<String, String>, ) -> NodeId
Creates a new element node and returns its ID.
Available on Queryable for backward compatibility with tests.
Sourcepub fn create_text(&mut self, content: impl Into<String>) -> NodeId
pub fn create_text(&mut self, content: impl Into<String>) -> NodeId
Creates a new text node and returns its ID.
Available on Queryable for backward compatibility with tests.
Sourcepub fn create_comment(&mut self, content: impl Into<String>) -> NodeId
pub fn create_comment(&mut self, content: impl Into<String>) -> NodeId
Creates a new comment node and returns its ID.
Available on Queryable for backward compatibility with tests.
Sourcepub fn append_child(&mut self, parent_id: NodeId, child_id: NodeId)
pub fn append_child(&mut self, parent_id: NodeId, child_id: NodeId)
Appends a child node to a parent.
Available on Queryable for backward compatibility with tests.
Updates parent, first_child, last_child, and sibling links.
§Panics
Panics in debug builds if parent_id or child_id are invalid.
Sourcepub fn seal(self) -> DocumentImpl<Sealed>
pub fn seal(self) -> DocumentImpl<Sealed>
Seals the document, preventing any future modifications.
This is a one-way transition for when you need to guarantee the document will never change.
Sourcepub fn set_index(&mut self, index: DocumentIndex)
pub fn set_index(&mut self, index: DocumentIndex)
Sets the document index.
Only available in Queryable state since index would be invalidated by structural modifications.
Source§impl<S: DocumentState> DocumentImpl<S>
impl<S: DocumentState> DocumentImpl<S>
Source§impl<S: MutableState> DocumentImpl<S>
impl<S: MutableState> DocumentImpl<S>
Source§impl<S: QueryableState> DocumentImpl<S>
impl<S: QueryableState> DocumentImpl<S>
Sourcepub fn index(&self) -> Option<&DocumentIndex>
pub fn index(&self) -> Option<&DocumentIndex>
Returns the document index, if built.
Only available in queryable states (Queryable, Sealed).
Source§impl<S: DocumentState> DocumentImpl<S>
impl<S: DocumentState> DocumentImpl<S>
Sourcepub fn first_child(&self, id: NodeId) -> Option<NodeId>
pub fn first_child(&self, id: NodeId) -> Option<NodeId>
Returns the first child of a node.
Sourcepub fn last_child(&self, id: NodeId) -> Option<NodeId>
pub fn last_child(&self, id: NodeId) -> Option<NodeId>
Returns the last child of a node.
Sourcepub fn next_sibling(&self, id: NodeId) -> Option<NodeId>
pub fn next_sibling(&self, id: NodeId) -> Option<NodeId>
Returns the next sibling of a node.
Sourcepub fn prev_sibling(&self, id: NodeId) -> Option<NodeId>
pub fn prev_sibling(&self, id: NodeId) -> Option<NodeId>
Returns the previous sibling of a node.
Sourcepub fn children(&self, id: NodeId) -> ChildrenIter<'_, S> ⓘ
pub fn children(&self, id: NodeId) -> ChildrenIter<'_, S> ⓘ
Returns an iterator over children of a node.
The iterator yields children in order from first to last.
§Examples
use std::collections::HashMap;
use scrape_core::Document;
let mut doc = Document::new();
let parent = doc.create_element("div", HashMap::new());
let child1 = doc.create_element("span", HashMap::new());
let child2 = doc.create_element("span", HashMap::new());
doc.append_child(parent, child1);
doc.append_child(parent, child2);
let children: Vec<_> = doc.children(parent).collect();
assert_eq!(children.len(), 2);Sourcepub fn ancestors(&self, id: NodeId) -> AncestorsIter<'_, S> ⓘ
pub fn ancestors(&self, id: NodeId) -> AncestorsIter<'_, S> ⓘ
Returns an iterator over ancestors of a node.
The iterator yields ancestors from parent to root (does not include the node itself).
§Examples
use std::collections::HashMap;
use scrape_core::Document;
let mut doc = Document::new();
let grandparent = doc.create_element("html", HashMap::new());
let parent = doc.create_element("body", HashMap::new());
let child = doc.create_element("div", HashMap::new());
doc.append_child(grandparent, parent);
doc.append_child(parent, child);
let ancestors: Vec<_> = doc.ancestors(child).collect();
assert_eq!(ancestors.len(), 2); // parent, grandparentSourcepub fn descendants(&self, id: NodeId) -> DescendantsIter<'_, S> ⓘ
pub fn descendants(&self, id: NodeId) -> DescendantsIter<'_, S> ⓘ
Returns an iterator over descendants in depth-first pre-order.
Does not include the starting node itself.
§Examples
use std::collections::HashMap;
use scrape_core::Document;
let mut doc = Document::new();
let root = doc.create_element("html", HashMap::new());
let child1 = doc.create_element("head", HashMap::new());
let child2 = doc.create_element("body", HashMap::new());
let grandchild = doc.create_element("div", HashMap::new());
doc.append_child(root, child1);
doc.append_child(root, child2);
doc.append_child(child2, grandchild);
let descendants: Vec<_> = doc.descendants(root).collect();
assert_eq!(descendants.len(), 3); // head, body, divSourcepub fn next_siblings(&self, id: NodeId) -> NextSiblingsIter<'_, S> ⓘ
pub fn next_siblings(&self, id: NodeId) -> NextSiblingsIter<'_, S> ⓘ
Returns an iterator over siblings following a node.
Does not include the node itself.
§Examples
use std::collections::HashMap;
use scrape_core::Document;
let mut doc = Document::new();
let parent = doc.create_element("ul", HashMap::new());
let child1 = doc.create_element("li", HashMap::new());
let child2 = doc.create_element("li", HashMap::new());
let child3 = doc.create_element("li", HashMap::new());
doc.append_child(parent, child1);
doc.append_child(parent, child2);
doc.append_child(parent, child3);
let next: Vec<_> = doc.next_siblings(child1).collect();
assert_eq!(next.len(), 2); // child2, child3Sourcepub fn prev_siblings(&self, id: NodeId) -> PrevSiblingsIter<'_, S> ⓘ
pub fn prev_siblings(&self, id: NodeId) -> PrevSiblingsIter<'_, S> ⓘ
Returns an iterator over siblings preceding a node.
Does not include the node itself. Iterates in reverse order (from immediate predecessor toward first sibling).
§Examples
use std::collections::HashMap;
use scrape_core::Document;
let mut doc = Document::new();
let parent = doc.create_element("ul", HashMap::new());
let child1 = doc.create_element("li", HashMap::new());
let child2 = doc.create_element("li", HashMap::new());
let child3 = doc.create_element("li", HashMap::new());
doc.append_child(parent, child1);
doc.append_child(parent, child2);
doc.append_child(parent, child3);
let prev: Vec<_> = doc.prev_siblings(child3).collect();
assert_eq!(prev.len(), 2); // child2, child1 (reverse order)Sourcepub fn siblings(&self, id: NodeId) -> SiblingsIter<'_, S> ⓘ
pub fn siblings(&self, id: NodeId) -> SiblingsIter<'_, S> ⓘ
Returns an iterator over all siblings of a node (excluding the node itself).
Iterates in document order from first sibling to last.
§Examples
use std::collections::HashMap;
use scrape_core::Document;
let mut doc = Document::new();
let parent = doc.create_element("ul", HashMap::new());
let child1 = doc.create_element("li", HashMap::new());
let child2 = doc.create_element("li", HashMap::new());
let child3 = doc.create_element("li", HashMap::new());
doc.append_child(parent, child1);
doc.append_child(parent, child2);
doc.append_child(parent, child3);
let siblings: Vec<_> = doc.siblings(child2).collect();
assert_eq!(siblings.len(), 2); // child1, child3