Document

Struct Document 

Source
pub struct Document { /* private fields */ }
Expand description

An HTML document containing a tree of nodes.

The document owns all nodes via an arena allocator, ensuring cache-friendly contiguous storage. Navigation is performed using NodeId handles.

§Architecture

Nodes are stored in a single contiguous Arena<Node>. Parent-child relationships use first_child/last_child links (O(1) append), and siblings are doubly linked via prev_sibling/next_sibling.

The document provides both direct navigation methods and lazy iterators:

Implementations§

Source§

impl Document

Source

pub fn new() -> Self

Creates a new empty document with default capacity.

The default capacity is 256 nodes, which is sufficient for typical HTML pages and reduces reallocations during parsing.

Source

pub fn with_capacity(capacity: usize) -> Self

Creates a new empty document with the specified capacity.

Use this when you know the approximate number of nodes to avoid reallocations.

Source

pub fn root(&self) -> Option<NodeId>

Returns the root node ID, if any.

Source

pub fn set_root(&mut self, id: NodeId)

Sets the root node ID.

Source

pub fn get(&self, id: NodeId) -> Option<&Node>

Returns a reference to the node with the given ID.

Source

pub fn get_mut(&mut self, id: NodeId) -> Option<&mut Node>

Returns a mutable reference to the node with the given ID.

Source

pub fn create_element( &mut self, name: impl Into<String>, attributes: HashMap<String, String>, ) -> NodeId

Creates a new element node and returns its ID.

Source

pub fn create_text(&mut self, content: impl Into<String>) -> NodeId

Creates a new text node and returns its ID.

Source

pub fn create_comment(&mut self, content: impl Into<String>) -> NodeId

Creates a new comment node and returns its ID.

Source

pub fn append_child(&mut self, parent_id: NodeId, child_id: NodeId)

Appends a child node to a parent.

Updates parent, first_child, last_child, and sibling links.

§Panics

Panics in debug builds if parent_id or child_id are invalid.

Source

pub fn len(&self) -> usize

Returns the number of nodes in the document.

Source

pub fn is_empty(&self) -> bool

Returns true if the document has no nodes.

Source

pub fn nodes(&self) -> impl Iterator<Item = (NodeId, &Node)>

Returns an iterator over all nodes.

Source

pub fn parent(&self, id: NodeId) -> Option<NodeId>

Returns the parent of a node.

Source

pub fn first_child(&self, id: NodeId) -> Option<NodeId>

Returns the first child of a node.

Source

pub fn last_child(&self, id: NodeId) -> Option<NodeId>

Returns the last child of a node.

Source

pub fn next_sibling(&self, id: NodeId) -> Option<NodeId>

Returns the next sibling of a node.

Source

pub fn prev_sibling(&self, id: NodeId) -> Option<NodeId>

Returns the previous sibling of a node.

Source

pub fn children(&self, id: NodeId) -> ChildrenIter<'_>

Returns an iterator over children of a node.

The iterator yields children in order from first to last.

§Examples
use std::collections::HashMap;

use scrape_core::dom::Document;

let mut doc = Document::new();
let parent = doc.create_element("div", HashMap::new());
let child1 = doc.create_element("span", HashMap::new());
let child2 = doc.create_element("span", HashMap::new());

doc.append_child(parent, child1);
doc.append_child(parent, child2);

let children: Vec<_> = doc.children(parent).collect();
assert_eq!(children.len(), 2);
Source

pub fn ancestors(&self, id: NodeId) -> AncestorsIter<'_>

Returns an iterator over ancestors of a node.

The iterator yields ancestors from parent to root (does not include the node itself).

§Examples
use std::collections::HashMap;

use scrape_core::dom::Document;

let mut doc = Document::new();
let grandparent = doc.create_element("html", HashMap::new());
let parent = doc.create_element("body", HashMap::new());
let child = doc.create_element("div", HashMap::new());

doc.append_child(grandparent, parent);
doc.append_child(parent, child);

let ancestors: Vec<_> = doc.ancestors(child).collect();
assert_eq!(ancestors.len(), 2); // parent, grandparent
Source

pub fn descendants(&self, id: NodeId) -> DescendantsIter<'_>

Returns an iterator over descendants in depth-first pre-order.

Does not include the starting node itself.

§Examples
use std::collections::HashMap;

use scrape_core::dom::Document;

let mut doc = Document::new();
let root = doc.create_element("html", HashMap::new());
let child1 = doc.create_element("head", HashMap::new());
let child2 = doc.create_element("body", HashMap::new());
let grandchild = doc.create_element("div", HashMap::new());

doc.append_child(root, child1);
doc.append_child(root, child2);
doc.append_child(child2, grandchild);

let descendants: Vec<_> = doc.descendants(root).collect();
assert_eq!(descendants.len(), 3); // head, body, div

Trait Implementations§

Source§

impl Debug for Document

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Default for Document

Source§

fn default() -> Self

Returns the “default value” for a type. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.