pub struct Filter { /* private fields */ }Expand description
Filters to select the wanted elements of an Html tree.
The Filter structures allows you to
- remove some nodes: use the
Self::comment(to remove all comments of the form<!-- comment -->) orSelf::doctype(to remove all doctype type nodes, such as<!DOCTYPE html>) methods. - select some nodes, by searching them with their name (with the
Self::tag_namemethod) or attribute.s (with theSelf::attribute_nameandSelf::attribute_valuemethods). - select those nodes and their parents, up to a certain generation (cf.
Self::depthmethod).
§Examples
#![allow(unused)]
use html_filter::*;
Filter::new().comment(false).doctype(false); // Removes comments (`<!---->`) and doctype tags (`<!DOCTYPE html>`).
Filter::new().tag_name("a"); // Lists all the `<a>` tags and their content.
Filter::new().attribute_name("onClick"); // Lists all the tags with a `onClick` attribute.
Filter::new().attribute_value("id", "first-title"); // Get the element of `id` `"first-title`
Filter::new().tag_name("li").depth(1); // Lists all the `<li>` tags and their parent (usually `ol` or `ul`).
Filter::new().none_except_text().collapse().trim().no_tags(); // Returns text without padding
// between tags and in one Html::TextImplementations§
Source§impl Filter
Public API for Filter on node-type-filters (texts, doctypes, comments,
etc.)
impl Filter
Public API for Filter on node-type-filters (texts, doctypes, comments,
etc.)
Sourcepub const fn all(self, all: bool) -> Self
pub const fn all(self, all: bool) -> Self
Short-hand to set the keep policy of comments, texts and doctypes at once.
true: keep themfalse: remove them
It is equivalent to:
use html_filter::*;
assert_eq!(Filter::new().doctype(true).text(true).comment(true), Filter::new().all(true));
assert_eq!(Filter::new().doctype(false).text(false).comment(false), Filter::new().all(false));Sourcepub const fn all_except_comment(self) -> Self
pub const fn all_except_comment(self) -> Self
Removes the comments, and forces to keep doctypes and texts.
See also Self::comment to allow comments without forcing others to
be kept.
§Examples
use html_filter::*;
let html = Html::parse("a <p> b <!-- c --></p> d").unwrap();
assert_eq!(html.to_filtered(&Filter::new().tag_name("p").comment(false)), "<p> b </p>");
assert_eq!(html.filter(&Filter::new().tag_name("p").all_except_comment()), "a <p> b </p> d");Sourcepub const fn all_except_doctype(self) -> Self
pub const fn all_except_doctype(self) -> Self
Removes the doctypes, and forces to keep comments and texts.
See also Self::doctype to allow doctypes without forcing others to
be kept.
§Examples
use html_filter::*;
let html = Html::parse("<!doctype html> a <p> b </p> d").unwrap();
assert_eq!(html.to_filtered(&Filter::new().tag_name("p").doctype(false)), "<p> b </p>");
assert_eq!(html.filter(&Filter::new().tag_name("p").all_except_doctype()), " a <p> b </p> d");Sourcepub const fn all_except_text(self) -> Self
pub const fn all_except_text(self) -> Self
Removes the texts, and forces to keep doctypes and comments.
See also Self::text to allow comments without forcing others to
be kept.
§Examples
use html_filter::*;
let html = Html::parse("<!doctype html> a <p> b <!-- c --></p> d <!-- e --> f").unwrap();
assert_eq!(
Filter::new().all_except_text(),
Filter::new().text(false).comment(true).doctype(true)
);
assert_eq!(html.to_filtered(&Filter::new().tag_name("p").text(false)), "<p><!-- c --></p>");
assert_eq!(
html.filter(&Filter::new().tag_name("p").all_except_text()),
"<!doctype html><p><!-- c --></p><!-- e -->"
);Sourcepub const fn comment(self, comment: bool) -> Self
pub const fn comment(self, comment: bool) -> Self
Sets the filter for comments
If comment is set to true (default), comments are kept.
If comment is set to false, comments are removed.
See Filter for usage information.
Sourcepub const fn doctype(self, doctype: bool) -> Self
pub const fn doctype(self, doctype: bool) -> Self
Sets the filter for doctype tags
If doctype is set to true (default), doctype tags are kept.
If doctype is set to false, doctype tags are removed.
See Filter for usage information.
Sourcepub const fn none_except_comment(self) -> Self
pub const fn none_except_comment(self) -> Self
Keeps only the comments
Doctypes and texts are removed, unless said otherwise by the user.
Sourcepub const fn none_except_doctype(self) -> Self
pub const fn none_except_doctype(self) -> Self
Keeps only the doctypes
Comments and texts are removed, unless said otherwise by the user.
Sourcepub const fn none_except_text(self) -> Self
pub const fn none_except_text(self) -> Self
Keeps only the texts
Comments and doctypes are removed, unless said otherwise by the user.
Sourcepub const fn text(self, text: bool) -> Self
pub const fn text(self, text: bool) -> Self
Filters texts
- If
textis set totrue(default), all texts are kept. - If
textis set tofalse, all texts are removed.
See Filter for usage information.
Sourcepub const fn trim(self) -> Self
pub const fn trim(self) -> Self
Trims all texts
This includes removal of text parts that contain only whitespaces, which is very useful to remove new lines for example:
§Examples
use html_filter::*;
let html = Html::parse(
"
<!doctype html>
<ul>
<li>First</li>
<li>Second></li>
</ul>
",
)
.unwrap();
// With trim
let filtered = html.to_filtered(&Filter::new().tag_name("ul").trim());
let (tag, child) = filtered.as_tag().unwrap();
assert_eq!(tag.as_name(), "ul");
let vec = child.as_vec().unwrap();
assert!(matches!(vec[0], Html::Tag { .. })); // first li
assert!(matches!(vec[1], Html::Tag { .. })); // second li
assert_eq!(vec.len(), 2);
// Without trim
let filtered = html.filter(&Filter::new().tag_name("ul"));
let (tag, child) = filtered.as_tag().unwrap();
assert_eq!(tag.as_name(), "ul");
let vec = child.as_vec().unwrap();
assert_eq!(vec[0], Html::Text("\n ".to_string()));
assert!(matches!(vec[1], Html::Tag { .. })); // first li
assert_eq!(vec[2], Html::Text("\n ".to_string()));
assert!(matches!(vec[3], Html::Tag { .. })); // second li
assert_eq!(vec[4], Html::Text("\n".to_string()));
assert_eq!(vec.len(), 5);See also Self::collapse
Source§impl Filter
Public API for Filter on tags and attributes
impl Filter
Public API for Filter on tags and attributes
Sourcepub fn attribute_name<N: Into<String>>(self, name: N) -> Self
pub fn attribute_name<N: Into<String>>(self, name: N) -> Self
Specifies the name of an attribute in the wanted tags.
This matches only tag attributes that don’t have any value, such as
enabled in
<button enabled type="submit" />See Filter for usage information.
Sourcepub fn attribute_value<N: Into<String>, V: Into<String>>(
self,
name: N,
value: V,
) -> Self
pub fn attribute_value<N: Into<String>, V: Into<String>>( self, name: N, value: V, ) -> Self
Specifies the value of an attribute in the wanted tags.
This matches only tag attributes that have the correct value for the
given name. To match only one value inside that values (e.g. class
names), cf. Filter::attribute_value_contains.
See Filter for usage information.
Sourcepub fn attribute_value_contains<N: Into<String>, V: Into<String>>(
self,
name: N,
value: V,
) -> Self
pub fn attribute_value_contains<N: Into<String>, V: Into<String>>( self, name: N, value: V, ) -> Self
Specifies a possible value of an attribute in the wanted tags.
This matches only tag attributes that have the given value as part of
the space-separated values inside the attribute value (cf. example
below). To match exact value, see Filter::attribute_value.
§Examples
use html_filter::*;
let html = Html::parse(r#"<div class="some_class other_class" />"#).unwrap();
let filter = Filter::new().attribute_value_contains("class", "some_class");
if let Html::Tag { tag: Tag { name, .. }, .. } = html.filter(&filter) {
assert_eq!(name, "div");
} else {
unreachable!();
}Sourcepub const fn collapse(self) -> Self
pub const fn collapse(self) -> Self
Collapses successive text nodes.
§Examples
use html_filter::*;
let html =
Html::parse("<div>before <!-- comment --> middle <strong>strong</strong> after</div>")
.unwrap();
// Without collapse
assert_eq!(
Html::Vec(
vec![
Html::Text("before ".into()),
Html::Comment(" comment ".into()),
Html::Text(" middle ".into()),
Html::Text("strong".into()),
Html::Text(" after".into())
]
.into()
),
html.to_filtered(&Filter::new().no_tags().text(true))
);
// With collapse
assert_eq!(
Html::Vec(
vec![
Html::Text("before ".into()),
Html::Comment(" comment ".into()),
Html::Text(" middle strong after".into()),
]
.into()
),
html.to_filtered(&Filter::new().no_tags().text(true).collapse())
);Sourcepub const fn depth(self, depth: usize) -> Self
pub const fn depth(self, depth: usize) -> Self
Specifies the depth of the desired nodes.
The depth means at what depth the nodes must be kept according to the filter. for this node. This allows you to search for a node, and select the node, but also some of its ancestors, up to the chosen depth. For instance, a depth of 0 means you only keep the tag, but a depth of 1 means you keep the wanted tag, but it’s parent and all its children.
§Examples
For example, let’s consider this HTML code:
use html_filter::*;
let html = Html::parse(
r#"
<main>
<nav>
<!-- Navigation menu -->
<ul>
<li href="first">First link</li>
<li href="second">Second link</li>
<li href="third">Third link</li>
</ul>
</nav>
</main>
"#,
)
.unwrap();
assert_eq!(
html.to_filtered(&Filter::new().attribute_value("href", "second").depth(0)),
r#"<li href="second">Second link</li>"#
);
assert_eq!(
html.to_filtered(&Filter::new().attribute_value("href", "second").depth(1)),
r#"<ul>
<li href="first">First link</li>
<li href="second">Second link</li>
<li href="third">Third link</li>
</ul>"#
);
assert_eq!(
html.to_filtered(&Filter::new().attribute_value("href", "second").depth(2)),
r#"<nav>
<!-- Navigation menu -->
<ul>
<li href="first">First link</li>
<li href="second">Second link</li>
<li href="third">Third link</li>
</ul>
</nav>"#
);Sourcepub fn except_attribute_name<N: Into<String>>(self, name: N) -> Self
pub fn except_attribute_name<N: Into<String>>(self, name: N) -> Self
Specifies the name of an attribute in the tags that must be dismissed.
This matches only tag attributes that don’t have any value, such as
enabled in
<button enabled type="submit" />See Filter for usage information.
Sourcepub fn except_attribute_value<N, V>(self, name: N, value: V) -> Self
pub fn except_attribute_value<N, V>(self, name: N, value: V) -> Self
Specifies the value of an attribute in the tags that must be dismissed.
This matches only tag attributes that have the correct value for the
given name. To filter out on a possible value inside the attribute name,
see Filter::except_attribute_value_contains.
See Filter for usage information.
Sourcepub fn except_attribute_value_contains<N: Into<String>, V: Into<String>>(
self,
name: N,
value: V,
) -> Self
pub fn except_attribute_value_contains<N: Into<String>, V: Into<String>>( self, name: N, value: V, ) -> Self
Specifies a possible value of an attribute that must be dismissed.
This matches only tag attributes that have the given value as part of
the space-separated values inside the attribute value (cf. example
below). To match exact value, see Filter::except_attribute_value.
§Examples
use html_filter::*;
let html = Html::parse(r#"<div class="some_class other_class" />"#).unwrap();
let filter = Filter::new().except_attribute_value_contains("class", "some_class");
assert_eq!(html.filter(&filter), Html::Empty);Sourcepub fn except_tag_name<N: Into<String>>(self, name: N) -> Self
pub fn except_tag_name<N: Into<String>>(self, name: N) -> Self
Specifies the tag name of the wanted tags.
See Filter for usage information.
Disable all tags, except those explicitly whitelisted
§Example
use html_filter::*;
let html = Html::parse("<!doctype html><div><!-- comment --></div>").unwrap();
assert_eq!(
html.to_filtered(&Filter::new().no_tags()),
Html::parse("<!doctype html><!-- comment -->").unwrap()
);
let html = Html::parse("z<body>a<div>b<p>c</p>d</div>e</body>y").unwrap();
assert_eq!(
html.to_filtered(&Filter::new().no_tags().tag_name("div").collapse()),
Html::parse("<div>bd</div>").unwrap()
);