mochi_rs::std::html

Struct Node

pub struct Node(/* private fields */);

Expand description

Type which represents a HTML node, which can be a group of elements, an element, or the entire HTML document.

Implementations§

Source §

impl Node

Source

pub fn new<T: AsRef<[u8]>>(buf: T) -> Result<Self>

Parse HTML into a Node. As there is no base URI specified, absolute URL resolution requires the HTML to have a <base href> tag.

Source

pub fn new_with_uri<A: AsRef<[u8]>, B: AsRef<str>>( buf: A, base_uri: B, ) -> Result<Self>

Parse HTML into a Node. The given base_uri will be used for any URLs that occurs before a <base href> tag is defined.

Source

pub fn new_fragment<T: AsRef<[u8]>>(buf: T) -> Result<Self>

Parse a HTML fragment, assuming that it forms the body of the HTML. Similar to Node::new, relative URLs will not be resolved unless there is a <base href> tag.

Source

pub fn new_fragment_with_uri<A: AsRef<[u8]>, B: AsRef<str>>( buf: A, base_uri: B, ) -> Result<Self>

Parse a HTML fragment, assuming that it forms the body of the HTML. Similar to Node::new_with_uri, URL resolution occurs for any that appears before a <base href> tag.

Source

pub unsafe fn from(ptr: i32) -> Self

Get an instance from a PtrRef

§Safety

Ensure that this Ptr is of Kind::Node before converting.

Source

pub fn close(self)

Source

pub fn select<T: AsRef<str>>(&self, selector: T) -> Self

Find elements that matches the given CSS (or JQuery) selector.

Supported selectors

Pattern	Matches	Example
`*`	any element	`*`
`tag`	elements with the given tag name	`div`
`*\|E`	elements of type E in any namespace (including non-namespaced)	`*\|name` finds `<fb:name>` and `<name>` elements
`ns\|E`	elements of type E in the namespace ns	`fb\|name` finds `<fb:name>` elements
`#id`	elements with attribute ID of “id”	`div#wrap`, `#logo`
`.class`	elements with a class name of “class”	`div.left`, `.result`
`[attr]`	elements with an attribute named “attr” (with any value)	`a[href]`, `[title]`
`[^attrPrefix]`	elements with an attribute name starting with “attrPrefix”. Use to find elements with HTML5 datasets	`[^data-]`, `div[^data-]`
`[attr=val]`	elements with an attribute named “attr”, and value equal to “val”	`img[width=500]`, `a[rel=nofollow]`
`[attr="val"]`	elements with an attribute named “attr”, and value equal to “val”	`span[hello="Cleveland"][goodbye="Columbus"]`, `a[rel="nofollow"]`
`[attr^=valPrefix]`	elements with an attribute named “attr”, and value starting with “valPrefix”	`a[href^=http:]`
`[attr$=valSuffix]`	elements with an attribute named “attr”, and value ending with “valSuffix”	`img[src$=.png]`
`[attr*=valContaining]`	elements with an attribute named “attr”, and value containing “valContaining”	`a[href*=/search/]`
`[attr~=regex]`	elements with an attribute named “attr”, and value matching the regular expression	`img[src~=(?i)\\.(png\|jpe?g)]`
	The above may be combined in any order	`div.header[title]`

§Combinators

Pattern	Matches	Example
`E F`	an F element descended from an E element	`div a`, `.logo h1`
`E > F`	an F direct child of E	`ol > li`
`E + F`	an F element immediately preceded by sibling E	`li + li`, `div.head + div`
`E ~ F`	an F element preceded by sibling E	`h1 ~ p`
`E, F, G`	all matching elements E, F, or G	`a[href], div, h3`

§Pseudo selectors

Pattern	Matches	Example
`:lt(n)`	elements whose sibling index is less than n	`td:lt(3)` finds the first 3 cells of each row
`:gt(n)`	elements whose sibling index is greater than n	`td:gt(1)` finds cells after skipping the first two
`:eq(n)`	elements whose sibling index is equal to n	`td:eq(0)` finds the first cell of each row
`:has(selector)`	elements that contains at least one element matching the selector	`div:has(p)` finds divs that contain p elements; `div:has(> a)` selects div elements that have at least one direct child a element.
`:not(selector)`	elements that do not match the selector.	`div:not(.logo)` finds all divs that do not have the “logo” class; `div:not(:has(div))` finds divs that do not contain divs.
`:contains(text)`	elements that contains the specified text. The search is case insensitive. The text may appear in the found element, or any of its descendants.	`p:contains(SwiftSoup)` finds p elements containing the text “SwiftSoup”; `p:contains(hello $there$)` finds p elements containing the text “Hello (There)”
`:matches(regex)`	elements whose text matches the specified regular expression. The text may appear in the found element, or any of its descendants.	`td:matches(\\d+)` finds table cells containing digits. div:matches((?i)login) finds divs containing the text, case insensitively.
`:containsOwn(text)`	elements that directly contain the specified text. The search is case insensitive. The text must appear in the found element, not any of its descendants.	`p:containsOwn(SwiftSoup)` finds p elements with own text “SwiftSoup”.
`:matchesOwn(regex)`	elements whose own text matches the specified regular expression. The text must appear in the found element, not any of its descendants.	`td:matchesOwn(\\d+)` finds table cells directly containing digits. div:matchesOwn((?i)login) finds divs containing the text, case insensitively.

§Structural pseudo-selectors

Pattern	Matches	Example
`:root`	The element that is the root of the document. In HTML, this is the html element
`:nth-child(an+b)`	elements that have an+b-1 siblings before it in the document tree, for any positive integer or zero value of n, and has a parent element. For values of a and b greater than zero, this effectively divides the element’s children into groups of a elements (the last group taking the remainder), and selecting the bth element of each group. For example, this allows the selectors to address every other row in a table, and could be used to alternate the color of paragraph text in a cycle of four. The a and b values must be integers (positive, negative, or zero). The index of the first child of an element is 1.
`:nth-last-child(an+b)`	elements that have an+b-1 siblings after it in the document tree. Otherwise like `:nth-child()`	`tr:nth-last-child(-n+2)` the last two rows of a table
`:nth-of-type(an+b)`	pseudo-class notation represents an element that has an+b-1 siblings with the same expanded element name before it in the document tree, for any zero or positive integer value of n, and has a parent element	`img:nth-of-type(2n+1)`
`:nth-last-of-type(an+b)`	pseudo-class notation represents an element that has an+b-1 siblings with the same expanded element name after it in the document tree, for any zero or positive integer value of n, and has a parent element	`img:nth-last-of-type(2n+1)`
`:first-child`	elements that are the first child of some other element.	`div > p:first-child`
`:last-child`	elements that are the last child of some other element.	`ol > li:last-child`
`:first-of-type`	elements that are the first sibling of its type in the list of children of its parent element	`dl dt:first-of-type`
`:last-of-type`	elements that are the last sibling of its type in the list of children of its parent element	`tr > td:last-of-type`
`:only-child`	elements that have a parent element and whose parent element hasve no other element children
`:only-of-type`	an element that has a parent element and whose parent element has no other element children with the same expanded element name
`:empty`	elements that have no children at all

Source

pub fn attr<T: AsRef<str>>(&self, attr: T) -> String

Get an attribute value by its key. To get an absolute URL from an attribute that may be a relative URL, prefix the key with abs:.

§Example

// Assumes that `el` is a Node
let url = el.attr("abs:src");

Source

pub fn set_html<T: AsRef<str>>(&mut self, html: T) -> Result<()>

Set the element’s inner HTML, clearning the existing HTML.

§Notice

Internally, this operates on SwiftSoup.Element, but not on SwiftSoup.Elements, which is the type you usually get when using methods like Node::select. Either use Node::array to iterate through each element, or use Node::first/Node::last to select an element before calling this function.

Source

pub fn set_text<T: AsRef<str>>(&mut self, text: T) -> Result<()>

Set the element’s text content, clearing any existing content.

§Notice

Internally, this operates on SwiftSoup.Element, but not on SwiftSoup.Elements, which is the type you usually get when using methods like Node::select. Either use Node::array to iterate through each element, or use Node::first/Node::last to select an element before calling this function.

Source

pub fn prepend<T: AsRef<str>>(&mut self, html: T) -> Result<()>

Add inner HTML into this element. The given HTML will be parsed, and each node prepended to the start of the element’s children.

§Notice

Internally, this operates on SwiftSoup.Element, but not on SwiftSoup.Elements, which is the type you usually get when using methods like Node::select. Either use Node::array to iterate through each element, or use Node::first/Node::last to select an element before calling this function.

Source

pub fn append<T: AsRef<str>>(&mut self, html: T) -> Result<()>

Add inner HTML into this element. The given HTML will be parsed, and each node appended to the end of the element’s children.

§Notice

Internally, this operates on SwiftSoup.Element, but not on SwiftSoup.Elements, which is the type you usually get when using methods like Node::select. Either use Node::array to iterate through each element, or use Node::first/Node::last to select an element before calling this function.

Source

pub fn first(&self) -> Self

Get the first sibling of this element, which can be this element

Source

pub fn last(&self) -> Self

Get the last sibling of this element, which can be this element

Source

pub fn next(&self) -> Option<Node>

Get the next sibling of the element, returning None if there isn’t one.

Source

pub fn previous(&self) -> Option<Node>

Get the previous sibling of the element, returning None if there isn’t one.

Source

pub fn base_uri(&self) -> String

Get the base URI of this Node

Source

pub fn body(&self) -> String

Get the document’s body element.

Source

pub fn text(&self) -> String

Get the normalized, combined text of this element and its children. Whitespace is normalized and trimmed.

For example, given HTML <p>Hello <b>there</b> now! </p>, p.text() returns “Hello there now!”

Note that this method returns text that would be presented to a reader. The contents of data nodes (e.g. <script> tags) are not considered text. Use Node::html or Node::data to retrieve that content.

Source

pub fn untrimmed_text(&self) -> String

Get the text of this element and its children. Whitespace is not normalized and trimmed.

Notices from Node::text applies.

Source

pub fn own_text(&self) -> String

Gets the (normalized) text owned by this element only; does not get the combined text of all children.

Node::own_text only operates on a singular element, so calling it after Node::select will not work. You need to get a specific element first, through Node::array and ArrayRef::get, Node::first, or Node::last.

Source

pub fn data(&self) -> String

Get the combined data of this element. Data is e.g. the inside of a <script> tag.

Note that data is NOT the text of the element. Use Node::text to get the text that would be visible to a user, and Node::data for the contents of scripts, comments, CSS styles, etc.

Source