Crate kiss_xml

Source
Expand description

§KISS-XML: Keep It Super Simple XML

This Rust library provides an easy-to-use Document Object Model (DOM) for reading and writing XML files. Unlike many other XML parsers, KISS-XML simply parses the given XML to a full in-memory DOM, which you can then modify and serialize back to XML. No schemas or looping required.

This library does not aim to support all XML specifications, only the most commonly used subset of features.

§What’s included:

KISS-XML provides the basics for XML documents, including:

  • Parse XML files and strings to a DOM
  • XML elements, text, and comments
  • DOM is mutable and can be saved as a string and to files
  • XML namespaces (with and without prefixes)
  • CDATA
  • Easy to use

§What’s NOT included:

  • Schema handling
  • Document type declarations (DTDs will be preserved but not interpreted)
  • Parsing character encodings other than UTF-8
  • Typed XML data (eg integer attribute values)
  • Performance optimizations (prioritizing easy-to-use over fast)

If you need any of the above XML features, then this library is too simple for your needs. Try another XML parsing crate instead.

§Examples

§Parse an XML file and print it to the terminal

To parse an XML file, all you need to do is call the kiss_xml::parse_filepath(...) function, and you can convert it to a string with the to_string() method or write it to a file with .write_to_filepath(...).

fn main() -> Result<(), kiss_xml::errors::KissXmlError> {
	use kiss_xml;
	let doc = kiss_xml::parse_filepath("tests/some-file.xml")?;
	println!("{}", doc.to_string());
	Ok(())
}

§Parse XML and then search the DOM for specific elements

Parsed XML content will be converted into a Document Object Model (DOM) with a single root element. A DOM is a tree-like data structure made up of XML Element, Text, and Comment nodes. You can explore the DOM element-by-element with the .elements_by_name(&str) and .first_element_by_name(&str) methods, scan the children of an element with the .child_*() methods, or do a recursive search using the .search(...) and .search_*(...) methods.

For example:

fn main() -> Result<(), kiss_xml::errors::KissXmlError> {
	use kiss_xml;
	use kiss_xml::dom::*;
	use kiss_xml::errors::*;
	let xml = r#"<?xml version="1.0" encoding="UTF-8"?>
<config>
	<name>My Settings</name>
	<sound>
		<property name="volume" value="11" />
		<property name="mixer" value="standard" />
	</sound>
</config>
"#;
	// parse XML to a document object model (DOM)
	let dom = kiss_xml::parse_str(xml)?;
	// print all sound properties
	let properties = dom.root_element()
		.first_element_by_name("sound")?
		.elements_by_name("property");
	for prop in properties {
		println!(
			"{} = {}",
			prop.get_attr("name").ok_or(DoesNotExistError::default())?,
			prop.get_attr("value").ok_or(DoesNotExistError::default())?
		);
	}
	// print children of the root element
	for e in dom.root_element().child_elements() {
		println!("child element <{}>", e.name())
	}
	// print all elements
	for e in dom.root_element().search_elements(|_| true) {
		println!("found element <{}>", e.name())
	}
	Ok(())
}

§Create and edit DOM from scratch

To modify the DOM, use the .*_mut(...) methods to get mutable references to the elements. You and insert, append, and remove elements (and other kinds of nodes) from the DOM.

For example:

fn main() -> Result<(), kiss_xml::errors::KissXmlError> {
	use kiss_xml;
	use kiss_xml::dom::*;
	use kiss_xml::errors::*;
	// make a DOM from scratch
	let mut doc = Document::new(Element::new_from_name("politicians")?);
	doc.root_element_mut().insert(0, Element::new_with_text("person", "John Adams")?);
	doc.root_element_mut().append(Element::new_with_text("person", "Hillary Clinton")?);
	doc.root_element_mut().append(Element::new_with_text("person", "Jimmy John")?);
	doc.root_element_mut().append(Element::new_with_text("person", "Nanny No-Name")?);
	// remove element by index
	let _removed_element = doc.root_element_mut().remove_element(3)?;
	// remove element(s) by use of a predicate function
	let _num_removed = doc.root_element_mut().remove_elements(|e| e.text() == "Jimmy John");
	// print first element content
	println!("First politician: {}", doc.root_element().first_element_by_name("person")?.text());
	// write to file
	doc.write_to_filepath("tests/politics.xml");
	Ok(())
}

§Get and modify text and comments

The XML DOM is made up of Node objects (trait objects implementing trait kiss_xml::dom::Node). The following example shows how to add and remove text and comment nodes in addition to element nodes.

fn main() -> Result<(), kiss_xml::errors::KissXmlError> {
	use kiss_xml;
	use kiss_xml::dom::*;
	use kiss_xml::errors::*;
	use std::collections::HashMap;
	let mut doc = kiss_xml::parse_str(
r#"<html>
	<!-- this is a comment -->
	<body>
		Content goes here
	</body>
</html>"#
	)?;
	// read and remove the first comment
	let comments = doc.root_element().children()
		.filter(|n| n.is_comment())
		.collect::<Vec<_>>();
	let first_comment = comments.first()
		.ok_or(DoesNotExistError::new("no comments in DOM"))?;
	println!("Comment: {}", first_comment.text());
	doc.root_element_mut().remove_all(&|n| n.is_comment());
	// replace content of <body> with some HTML
	doc.root_element_mut().first_element_by_name_mut("body")?.remove_all(&|_| true);
	doc.root_element_mut().first_element_by_name_mut("body")?.append_all(
		vec![
			Element::new_with_text("h1", "Chapter 1")?.boxed(),
			Comment::new("Note: there is only one chapter")?.boxed(),
			Element::new_with_children("p", vec![
				Text::new("Once upon a time, there was a little ").boxed(),
				Element::new_with_attributes_and_text::<&str,&str>(
					"a",
					HashMap::from([("href","https://en.wikipedia.org/wiki/Gnome")]),
					"gnome"
				)?.boxed(),
				Text::new(" who lived in a walnut tree...").boxed()
			])?.boxed()
		]
	);
	// print the results
	println!("{}", doc.to_string());
	// prints:
	// <html>
	//   <body>
	//     <h1>Chapter 1</h1>
	//     <!--Note: there is only one chapter-->
	//     <p>Once upon a time, there was a little <a href="https://en.wikipedia.org/wiki/Gnome">gnome</a>  who lived in a walnut tree...</p>
	//   </body>
	// </html>
	Ok(())
}

§Implementation Details

§Indentation and Whitespace Handling

KISS-XML always produces indented XML output and disregards the whitespace characters between tags. However, there is an exception to this rule: If an XML element contains text, then whitespace will be all preserved on parse and indentation will be disabled when serialized to an XML string.

For example, consider this code snippet:

fn ws_example_1() -> Result<(), Box<dyn std::error::Error>> {
	use kiss_xml;
	use kiss_xml::dom::*;
	let mut tree = Element::new_with_children(
		"tree", vec![Element::new_with_text("speak", "bark!")?.boxed()]
	)?;
	tree.append(Element::new_from_name("branch")?);
	println!("{tree}");
	Ok(())
}

The above code will print the following:

<tree>
  <speak>bark!</speak>
  <branch/>
</tree>

However, if you then add a text node to the “tree” element, then the output formatting will change significantly:

fn ws_example_2() -> Result<(), Box<dyn std::error::Error>> {
	use kiss_xml;
	use kiss_xml::dom::*;
	let mut tree = Element::new_with_children(
		"tree", vec![Element::new_with_text("speak", "bark!")?.boxed()]
	)?;
	tree.append(Element::new_from_name("branch")?);
	tree.append(Text::new("I'm a tree!"));
	println!("{tree}");
	Ok(())
}

The above code will print the following:

<tree><speak>bark!</speak><branch/>I'm a tree!</tree>

Likewise, if we were to parse the following XML with KISS-XML:

<tree>
  <speak>bark!</speak>
  <branch/>
  I'm a tree!
</tree>

You will find that the final Text node contains \n··I'm·a·tree!\n (where \n and · represent newline and space characters for clarity). Unlike HTML, KISS-XML does not collapse whitespaces.

This behavior is based on a common (but not universal) interpretation of the official XML specification.

§License

This library is open source, licensed under the MIT License. You may use it as-is or with modification, without any limitations.

Modules§

dom
A document object model (DOM) is a tree data structure with three different kinds of nodes: Element, Text, and Comment nodes. Element nodes can have children (a list of child nodes), while Text and Comment nodes cannot. As per the XML specification, a DOM can only have one root element.
errors
The kiss_xml::error module holds an enum of possible error types, each of which has a corresponding implementation struct.

Functions§

attribute_escape
Escapes a subset of XML reserved characters (&, ‘, and “) in an attribute into XML-compatible text, eg replacing “&” with “&amp;” and “’” with “&apos;”
escape
Escapes all special characters (&, <, >, ’, and “) in a string into an XML-compatible string, eg replacing “&” with “&amp;” and “<” with “&lt;”
parse_filepath
Reads the file from the given filepath and parses it as an XML document
parse_str
Reads the XML content from the UTF-8 encoded text string and parses it as an XML document
parse_stream
Reads the XML content from the given stream reader and parses it as an XML document. Note that this function will read to EOF before returning.
text_escape
Escapes a subset of XML reserved characters (&, <, and >) in a text string into XML-compatible text, eg replacing “&” with “&amp;” and “<” with “&lt;”
unescape
Reverses any escaped characters (&, <, >, ’, and “) in XML-compatible text to regenerate the original text, eg replacing “&amp;” with “&” and “&lt;” with “<”