pub struct Node(/* private fields */);
Expand description
Type which represents a HTML node, which can be a group of elements, an element, or the entire HTML document.
Implementations§
Source§impl Node
impl Node
Sourcepub fn new<T: AsRef<[u8]>>(buf: T) -> Result<Self>
pub fn new<T: AsRef<[u8]>>(buf: T) -> Result<Self>
Parse HTML into a Node. As there is no base URI specified, absolute URL
resolution requires the HTML to have a <base href>
tag.
Sourcepub fn new_with_uri<A: AsRef<[u8]>, B: AsRef<str>>(
buf: A,
base_uri: B,
) -> Result<Self>
pub fn new_with_uri<A: AsRef<[u8]>, B: AsRef<str>>( buf: A, base_uri: B, ) -> Result<Self>
Parse HTML into a Node. The given base_uri
will be used for any URLs that
occurs before a <base href>
tag is defined.
Sourcepub fn new_fragment<T: AsRef<[u8]>>(buf: T) -> Result<Self>
pub fn new_fragment<T: AsRef<[u8]>>(buf: T) -> Result<Self>
Parse a HTML fragment, assuming that it forms the body
of the HTML.
Similar to Node::new, relative URLs will not
be resolved unless there is a <base href>
tag.
Sourcepub fn new_fragment_with_uri<A: AsRef<[u8]>, B: AsRef<str>>(
buf: A,
base_uri: B,
) -> Result<Self>
pub fn new_fragment_with_uri<A: AsRef<[u8]>, B: AsRef<str>>( buf: A, base_uri: B, ) -> Result<Self>
Parse a HTML fragment, assuming that it forms the body
of the HTML.
Similar to Node::new_with_uri, URL
resolution occurs for any that appears before a <base href>
tag.
pub fn close(self)
Sourcepub fn select<T: AsRef<str>>(&self, selector: T) -> Self
pub fn select<T: AsRef<str>>(&self, selector: T) -> Self
Find elements that matches the given CSS (or JQuery) selector.
Supported selectors
Pattern | Matches | Example |
---|---|---|
* | any element | * |
tag | elements with the given tag name | div |
*|E | elements of type E in any namespace (including non-namespaced) | *|name finds <fb:name> and <name> elements |
ns|E | elements of type E in the namespace ns | fb|name finds <fb:name> elements |
#id | elements with attribute ID of “id” | div#wrap , #logo |
.class | elements with a class name of “class” | div.left , .result |
[attr] | elements with an attribute named “attr” (with any value) | a[href] , [title] |
[^attrPrefix] | elements with an attribute name starting with “attrPrefix”. Use to find elements with HTML5 datasets | [^data-] , div[^data-] |
[attr=val] | elements with an attribute named “attr”, and value equal to “val” | img[width=500] , a[rel=nofollow] |
[attr="val"] | elements with an attribute named “attr”, and value equal to “val” | span[hello="Cleveland"][goodbye="Columbus"] , a[rel="nofollow"] |
[attr^=valPrefix] | elements with an attribute named “attr”, and value starting with “valPrefix” | a[href^=http:] |
[attr$=valSuffix] | elements with an attribute named “attr”, and value ending with “valSuffix” | img[src$=.png] |
[attr*=valContaining] | elements with an attribute named “attr”, and value containing “valContaining” | a[href*=/search/] |
[attr~=regex] | elements with an attribute named “attr”, and value matching the regular expression | img[src~=(?i)\\.(png|jpe?g)] |
The above may be combined in any order | div.header[title] |
§Combinators
Pattern | Matches | Example |
---|---|---|
E F | an F element descended from an E element | div a , .logo h1 |
E > F | an F direct child of E | ol > li |
E + F | an F element immediately preceded by sibling E | li + li , div.head + div |
E ~ F | an F element preceded by sibling E | h1 ~ p |
E, F, G | all matching elements E, F, or G | a[href], div, h3 |
§Pseudo selectors
Pattern | Matches | Example |
---|---|---|
:lt(n) | elements whose sibling index is less than n | td:lt(3) finds the first 3 cells of each row |
:gt(n) | elements whose sibling index is greater than n | td:gt(1) finds cells after skipping the first two |
:eq(n) | elements whose sibling index is equal to n | td:eq(0) finds the first cell of each row |
:has(selector) | elements that contains at least one element matching the selector | div:has(p) finds divs that contain p elements; div:has(> a) selects div elements that have at least one direct child a element. |
:not(selector) | elements that do not match the selector. | div:not(.logo) finds all divs that do not have the “logo” class; div:not(:has(div)) finds divs that do not contain divs. |
:contains(text) | elements that contains the specified text. The search is case insensitive. The text may appear in the found element, or any of its descendants. | p:contains(SwiftSoup) finds p elements containing the text “SwiftSoup”; p:contains(hello \(there\)) finds p elements containing the text “Hello (There)” |
:matches(regex) | elements whose text matches the specified regular expression. The text may appear in the found element, or any of its descendants. | td:matches(\\d+) finds table cells containing digits. div:matches((?i)login) finds divs containing the text, case insensitively. |
:containsOwn(text) | elements that directly contain the specified text. The search is case insensitive. The text must appear in the found element, not any of its descendants. | p:containsOwn(SwiftSoup) finds p elements with own text “SwiftSoup”. |
:matchesOwn(regex) | elements whose own text matches the specified regular expression. The text must appear in the found element, not any of its descendants. | td:matchesOwn(\\d+) finds table cells directly containing digits. div:matchesOwn((?i)login) finds divs containing the text, case insensitively. |
§Structural pseudo-selectors
Pattern | Matches | Example |
---|---|---|
:root | The element that is the root of the document. In HTML, this is the html element | |
:nth-child(an+b) | elements that have an+b-1 siblings before it in the document tree, for any positive integer or zero value of n, and has a parent element. For values of a and b greater than zero, this effectively divides the element’s children into groups of a elements (the last group taking the remainder), and selecting the bth element of each group. For example, this allows the selectors to address every other row in a table, and could be used to alternate the color of paragraph text in a cycle of four. The a and b values must be integers (positive, negative, or zero). The index of the first child of an element is 1. | |
:nth-last-child(an+b) | elements that have an+b-1 siblings after it in the document tree. Otherwise like :nth-child() | tr:nth-last-child(-n+2) the last two rows of a table |
:nth-of-type(an+b) | pseudo-class notation represents an element that has an+b-1 siblings with the same expanded element name before it in the document tree, for any zero or positive integer value of n, and has a parent element | img:nth-of-type(2n+1) |
:nth-last-of-type(an+b) | pseudo-class notation represents an element that has an+b-1 siblings with the same expanded element name after it in the document tree, for any zero or positive integer value of n, and has a parent element | img:nth-last-of-type(2n+1) |
:first-child | elements that are the first child of some other element. | div > p:first-child |
:last-child | elements that are the last child of some other element. | ol > li:last-child |
:first-of-type | elements that are the first sibling of its type in the list of children of its parent element | dl dt:first-of-type |
:last-of-type | elements that are the last sibling of its type in the list of children of its parent element | tr > td:last-of-type |
:only-child | elements that have a parent element and whose parent element hasve no other element children | |
:only-of-type | an element that has a parent element and whose parent element has no other element children with the same expanded element name | |
:empty | elements that have no children at all |
Sourcepub fn set_html<T: AsRef<str>>(&mut self, html: T) -> Result<()>
pub fn set_html<T: AsRef<str>>(&mut self, html: T) -> Result<()>
Set the element’s inner HTML, clearning the existing HTML.
§Notice
Internally, this operates on SwiftSoup.Element, but not on SwiftSoup.Elements, which is the type you usually get when using methods like Node::select. Either use Node::array to iterate through each element, or use Node::first/Node::last to select an element before calling this function.
Sourcepub fn set_text<T: AsRef<str>>(&mut self, text: T) -> Result<()>
pub fn set_text<T: AsRef<str>>(&mut self, text: T) -> Result<()>
Set the element’s text content, clearing any existing content.
§Notice
Internally, this operates on SwiftSoup.Element, but not on SwiftSoup.Elements, which is the type you usually get when using methods like Node::select. Either use Node::array to iterate through each element, or use Node::first/Node::last to select an element before calling this function.
Sourcepub fn prepend<T: AsRef<str>>(&mut self, html: T) -> Result<()>
pub fn prepend<T: AsRef<str>>(&mut self, html: T) -> Result<()>
Add inner HTML into this element. The given HTML will be parsed, and each node prepended to the start of the element’s children.
§Notice
Internally, this operates on SwiftSoup.Element, but not on SwiftSoup.Elements, which is the type you usually get when using methods like Node::select. Either use Node::array to iterate through each element, or use Node::first/Node::last to select an element before calling this function.
Sourcepub fn append<T: AsRef<str>>(&mut self, html: T) -> Result<()>
pub fn append<T: AsRef<str>>(&mut self, html: T) -> Result<()>
Add inner HTML into this element. The given HTML will be parsed, and each node appended to the end of the element’s children.
§Notice
Internally, this operates on SwiftSoup.Element, but not on SwiftSoup.Elements, which is the type you usually get when using methods like Node::select. Either use Node::array to iterate through each element, or use Node::first/Node::last to select an element before calling this function.
Sourcepub fn next(&self) -> Option<Node>
pub fn next(&self) -> Option<Node>
Get the next sibling of the element, returning None
if there isn’t
one.
Sourcepub fn previous(&self) -> Option<Node>
pub fn previous(&self) -> Option<Node>
Get the previous sibling of the element, returning None
if there isn’t
one.
Sourcepub fn text(&self) -> String
pub fn text(&self) -> String
Get the normalized, combined text of this element and its children. Whitespace is normalized and trimmed.
For example, given HTML <p>Hello <b>there</b> now! </p>
,
p.text() returns “Hello there now!”
Note that this method returns text that would be presented to a reader.
The contents of data nodes (e.g. <script>
tags) are not considered text.
Use Node::html or Node::data
to retrieve that content.
Sourcepub fn untrimmed_text(&self) -> String
pub fn untrimmed_text(&self) -> String
Get the text of this element and its children. Whitespace is not normalized and trimmed.
Notices from Node::text applies.
Sourcepub fn own_text(&self) -> String
pub fn own_text(&self) -> String
Gets the (normalized) text owned by this element only; does not get the combined text of all children.
Node::own_text only operates on a singular element, so calling it after Node::select will not work. You need to get a specific element first, through Node::array and ArrayRef::get, Node::first, or Node::last.
Sourcepub fn data(&self) -> String
pub fn data(&self) -> String
Get the combined data of this element. Data is e.g. the inside of a <script>
tag.
Note that data is NOT the text of the element. Use Node::text to get the text that would be visible to a user, and Node::data for the contents of scripts, comments, CSS styles, etc.
Sourcepub fn array(&self) -> ArrayRef ⓘ
pub fn array(&self) -> ArrayRef ⓘ
Get an array of Node. This is most commonly used with Node::select to iterate through elements that match a selector.
Sourcepub fn html(&self) -> String
pub fn html(&self) -> String
Get the node’s inner HTML.
For example, on <div><p></p></div>
, div.html()
would return <p></p>
.
Sourcepub fn outer_html(&self) -> String
pub fn outer_html(&self) -> String
Get the node’s outer HTML.
For example, on <div><p></p></div>
, div.outer_html()
would return
<div><p></p></div>
.
Sourcepub fn escape(&self) -> String
pub fn escape(&self) -> String
Get the node’s text and escape any HTML-reserved characters to HTML entities.
For example, for a node with text Hello &<> Å å π 新 there ¾ © »
,
this would return Hello &<> Å å π 新 there ¾ © »
Sourcepub fn unescape(&self) -> String
pub fn unescape(&self) -> String
Get the node’s text and unescape any HTML entities to their original characters.
For example, for a node with text Hello &<> Å å π 新 there ¾ © »
,
this would return Hello &<> Å å π 新 there ¾ © »
.
Sourcepub fn tag_name(&self) -> String
pub fn tag_name(&self) -> String
Get the name of the tag for this element. This will always be the
lowercased version. For example, <DIV>
and <div>
would both return
div
.
Sourcepub fn class_name(&self) -> String
pub fn class_name(&self) -> String
Get the literal value of this node’s class
attribute. For example,
on <div class="header gray">
this would return header gray
.