Crate xml_oxide[−][src]
Expand description
xml_oxide
Rust XML parser implementation that parses any well-formed XML defined in the W3C Spec in a streaming way.
Features
- It uses constant-like memory for large XML files
- It only supports UTF-8 encoding
- It is a non-validating parser
- It ignores well-formedness in Processing Instructions(DTD), DOCTYPE and parses them as raw strings
- It can parse not-well-formed documents (please report as a bug)
- Entities that can be large are parsed as chunks to keep memory usage low: Character Data, CDATA Section, Comment, Whitespace
- If you have an element tag or DOCTYPE declaration that is bigger than buffer size(currently default 8KB), it can fail
Example Usage
In this example StartElement and EndElement events are counted. Note that you can find more examples under tests
directory.
StartElement
also include empty tags. Checked byis_empty
.- Reference entities like
&
or<
comes in its own event(Not inCharacters
). - Character/numerical and predefined entity references are resolved. Custom entity definitions are passed as raw.
- Check sax::Event to see all available event types
use std::fs::File;
use xml_oxide::{parser::Parser, sax::Event};
fn main() {
println!("Starting...");
let mut counter: usize = 0;
let mut end_counter: usize = 0;
let now = std::time::Instant::now();
let f = File::open("./tests/xml_files/books.xml").unwrap();
let mut p = Parser::start(f);
loop {
let res = p.read_event();
match res {
Ok(event) => match event {
Event::StartDocument => {}
Event::EndDocument => {
break;
}
Event::StartElement(el) => {
//You can differantiate between Starting Tag and Empty Element Tag
if !el.is_empty {
counter = counter + 1;
// print every 10000th element name
if counter % 10000 == 0 {
println!("%10000 start {}", el.name);
}
}
}
Event::EndElement(el) => {
end_counter += 1;
if el.name == "feed" {
break;
}
}
Event::Characters(_) => {}
Event::Reference(_) => {}
_ => {}
},
Err(err) => {
println!("{}", err);
break;
}
}
}
println!("Start event count:{}", counter);
println!("End event count:{}", end_counter);
let elapsed = now.elapsed();
println!("Time elapsed: {:.2?}", elapsed);
}