DocumentImpl

Struct DocumentImpl 

Source
pub struct DocumentImpl<S: DocumentState = Queryable> { /* private fields */ }
Expand description

An HTML document containing a tree of nodes.

The document owns all nodes via an arena allocator, ensuring cache-friendly contiguous storage. Navigation is performed using NodeId handles.

§Architecture

Nodes are stored in a single contiguous Arena<Node>. Parent-child relationships use first_child/last_child links (O(1) append), and siblings are doubly linked via prev_sibling/next_sibling.

The document provides both direct navigation methods and lazy iterators:

§Typestate

Internally, the document uses typestate pattern to enforce lifecycle guarantees at compile time. The type parameter S tracks whether the document is being built, is queryable, or is sealed.

Implementations§

Source§

impl DocumentImpl<Building>

Source

pub fn new() -> Self

Creates a new empty document in building state.

The default capacity is 256 nodes, which is sufficient for typical HTML pages and reduces reallocations during parsing.

Source

pub fn with_capacity(capacity: usize) -> Self

Creates a new empty document with the specified capacity.

Use this when you know the approximate number of nodes to avoid reallocations.

Source

pub fn set_root(&mut self, id: NodeId)

Sets the root node ID.

Source

pub fn create_element( &mut self, name: impl Into<String>, attributes: HashMap<String, String>, ) -> NodeId

Creates a new element node and returns its ID.

Source

pub fn create_text(&mut self, content: impl Into<String>) -> NodeId

Creates a new text node and returns its ID.

Source

pub fn create_comment(&mut self, content: impl Into<String>) -> NodeId

Creates a new comment node and returns its ID.

Source

pub fn append_child(&mut self, parent_id: NodeId, child_id: NodeId)

Appends a child node to a parent.

Updates parent, first_child, last_child, and sibling links.

§Panics

Panics in debug builds if parent_id or child_id are invalid.

Source

pub fn build(self) -> DocumentImpl<Queryable>

Transitions the document from Building to Queryable state.

This is a one-way transition. Once built, the document structure cannot be modified.

§Examples
use std::collections::HashMap;

use scrape_core::{Building, DocumentImpl};

let mut doc = DocumentImpl::<Building>::new();
let root = doc.create_element("div", HashMap::new());
doc.set_root(root);

// Transition to queryable state
let doc = doc.build();
Source§

impl DocumentImpl<Queryable>

Source

pub fn new() -> Self

Creates a new empty document in queryable state.

This is a convenience method for backward compatibility. Internally, it creates a Building document and immediately transitions to Queryable.

Source

pub fn with_capacity(capacity: usize) -> Self

Creates a new empty document with the specified capacity in queryable state.

This is a convenience method for backward compatibility.

Source

pub fn set_root(&mut self, id: NodeId)

Sets the root node ID.

Available on Queryable for backward compatibility with tests.

Source

pub fn create_element( &mut self, name: impl Into<String>, attributes: HashMap<String, String>, ) -> NodeId

Creates a new element node and returns its ID.

Available on Queryable for backward compatibility with tests.

Source

pub fn create_text(&mut self, content: impl Into<String>) -> NodeId

Creates a new text node and returns its ID.

Available on Queryable for backward compatibility with tests.

Source

pub fn create_comment(&mut self, content: impl Into<String>) -> NodeId

Creates a new comment node and returns its ID.

Available on Queryable for backward compatibility with tests.

Source

pub fn append_child(&mut self, parent_id: NodeId, child_id: NodeId)

Appends a child node to a parent.

Available on Queryable for backward compatibility with tests.

Updates parent, first_child, last_child, and sibling links.

§Panics

Panics in debug builds if parent_id or child_id are invalid.

Source

pub fn seal(self) -> DocumentImpl<Sealed>

Seals the document, preventing any future modifications.

This is a one-way transition for when you need to guarantee the document will never change.

Source

pub fn set_index(&mut self, index: DocumentIndex)

Sets the document index.

Only available in Queryable state since index would be invalidated by structural modifications.

Source§

impl<S: DocumentState> DocumentImpl<S>

Source

pub fn root(&self) -> Option<NodeId>

Returns the root node ID, if any.

Source

pub fn get(&self, id: NodeId) -> Option<&Node>

Returns a reference to the node with the given ID.

Source

pub fn len(&self) -> usize

Returns the number of nodes in the document.

Source

pub fn is_empty(&self) -> bool

Returns true if the document has no nodes.

Source

pub fn nodes(&self) -> impl Iterator<Item = (NodeId, &Node)>

Returns an iterator over all nodes.

Source§

impl<S: MutableState> DocumentImpl<S>

Source

pub fn get_mut(&mut self, id: NodeId) -> Option<&mut Node>

Returns a mutable reference to the node with the given ID.

Only available for documents in mutable states (Building).

Source§

impl<S: QueryableState> DocumentImpl<S>

Source

pub fn index(&self) -> Option<&DocumentIndex>

Returns the document index, if built.

Only available in queryable states (Queryable, Sealed).

Source§

impl<S: DocumentState> DocumentImpl<S>

Source

pub fn parent(&self, id: NodeId) -> Option<NodeId>

Returns the parent of a node.

Source

pub fn first_child(&self, id: NodeId) -> Option<NodeId>

Returns the first child of a node.

Source

pub fn last_child(&self, id: NodeId) -> Option<NodeId>

Returns the last child of a node.

Source

pub fn next_sibling(&self, id: NodeId) -> Option<NodeId>

Returns the next sibling of a node.

Source

pub fn prev_sibling(&self, id: NodeId) -> Option<NodeId>

Returns the previous sibling of a node.

Source

pub fn children(&self, id: NodeId) -> ChildrenIter<'_, S>

Returns an iterator over children of a node.

The iterator yields children in order from first to last.

§Examples
use std::collections::HashMap;

use scrape_core::Document;

let mut doc = Document::new();
let parent = doc.create_element("div", HashMap::new());
let child1 = doc.create_element("span", HashMap::new());
let child2 = doc.create_element("span", HashMap::new());

doc.append_child(parent, child1);
doc.append_child(parent, child2);

let children: Vec<_> = doc.children(parent).collect();
assert_eq!(children.len(), 2);
Source

pub fn ancestors(&self, id: NodeId) -> AncestorsIter<'_, S>

Returns an iterator over ancestors of a node.

The iterator yields ancestors from parent to root (does not include the node itself).

§Examples
use std::collections::HashMap;

use scrape_core::Document;

let mut doc = Document::new();
let grandparent = doc.create_element("html", HashMap::new());
let parent = doc.create_element("body", HashMap::new());
let child = doc.create_element("div", HashMap::new());

doc.append_child(grandparent, parent);
doc.append_child(parent, child);

let ancestors: Vec<_> = doc.ancestors(child).collect();
assert_eq!(ancestors.len(), 2); // parent, grandparent
Source

pub fn descendants(&self, id: NodeId) -> DescendantsIter<'_, S>

Returns an iterator over descendants in depth-first pre-order.

Does not include the starting node itself.

§Examples
use std::collections::HashMap;

use scrape_core::Document;

let mut doc = Document::new();
let root = doc.create_element("html", HashMap::new());
let child1 = doc.create_element("head", HashMap::new());
let child2 = doc.create_element("body", HashMap::new());
let grandchild = doc.create_element("div", HashMap::new());

doc.append_child(root, child1);
doc.append_child(root, child2);
doc.append_child(child2, grandchild);

let descendants: Vec<_> = doc.descendants(root).collect();
assert_eq!(descendants.len(), 3); // head, body, div
Source

pub fn next_siblings(&self, id: NodeId) -> NextSiblingsIter<'_, S>

Returns an iterator over siblings following a node.

Does not include the node itself.

§Examples
use std::collections::HashMap;

use scrape_core::Document;

let mut doc = Document::new();
let parent = doc.create_element("ul", HashMap::new());
let child1 = doc.create_element("li", HashMap::new());
let child2 = doc.create_element("li", HashMap::new());
let child3 = doc.create_element("li", HashMap::new());

doc.append_child(parent, child1);
doc.append_child(parent, child2);
doc.append_child(parent, child3);

let next: Vec<_> = doc.next_siblings(child1).collect();
assert_eq!(next.len(), 2); // child2, child3
Source

pub fn prev_siblings(&self, id: NodeId) -> PrevSiblingsIter<'_, S>

Returns an iterator over siblings preceding a node.

Does not include the node itself. Iterates in reverse order (from immediate predecessor toward first sibling).

§Examples
use std::collections::HashMap;

use scrape_core::Document;

let mut doc = Document::new();
let parent = doc.create_element("ul", HashMap::new());
let child1 = doc.create_element("li", HashMap::new());
let child2 = doc.create_element("li", HashMap::new());
let child3 = doc.create_element("li", HashMap::new());

doc.append_child(parent, child1);
doc.append_child(parent, child2);
doc.append_child(parent, child3);

let prev: Vec<_> = doc.prev_siblings(child3).collect();
assert_eq!(prev.len(), 2); // child2, child1 (reverse order)
Source

pub fn siblings(&self, id: NodeId) -> SiblingsIter<'_, S>

Returns an iterator over all siblings of a node (excluding the node itself).

Iterates in document order from first sibling to last.

§Examples
use std::collections::HashMap;

use scrape_core::Document;

let mut doc = Document::new();
let parent = doc.create_element("ul", HashMap::new());
let child1 = doc.create_element("li", HashMap::new());
let child2 = doc.create_element("li", HashMap::new());
let child3 = doc.create_element("li", HashMap::new());

doc.append_child(parent, child1);
doc.append_child(parent, child2);
doc.append_child(parent, child3);

let siblings: Vec<_> = doc.siblings(child2).collect();
assert_eq!(siblings.len(), 2); // child1, child3

Trait Implementations§

Source§

impl<S: Debug + DocumentState> Debug for DocumentImpl<S>

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Default for DocumentImpl<Building>

Source§

fn default() -> Self

Returns the “default value” for a type. Read more

Auto Trait Implementations§

§

impl<S> Freeze for DocumentImpl<S>

§

impl<S> RefUnwindSafe for DocumentImpl<S>
where S: RefUnwindSafe,

§

impl<S> Send for DocumentImpl<S>
where S: Send,

§

impl<S> Sync for DocumentImpl<S>
where S: Sync,

§

impl<S> Unpin for DocumentImpl<S>
where S: Unpin,

§

impl<S> UnwindSafe for DocumentImpl<S>
where S: UnwindSafe,

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.