Crate xml_oxide[−][src]
Expand description
xml_oxide
Rust XML parser implementation that parses any well-formed XML defined in the W3C Spec in a streaming way.
If you want to use xml_sax
interface to implement another parser we can discuss to improve the interface. Currently it is integrated to this crate.
To Do
- Because the namespace spec brings constraints around the usage of “:” in names. Provide
namespace-aware=false
option to parse otherwise valid XML 1.0 documents .
Features
- It uses constant-like memory for large XML files
- Supports Namespaces in XML 1.0
- It only supports UTF-8 encoding
- It is a non-validating parser, it does important well-formedness checks
- Currently, it ignores well-formedness in Processing Instructions, DTD, DOCTYPE and parses them as raw strings
- It can parse not-well-formed documents (please report as a bug)
- Entities that can be large are parsed as chunks to keep memory usage low: Character Data, CDATA Section, Comment, Whitespace
- Reading chunk size is currently default 8KB, not configurable. If you have an element tag or DOCTYPE declaration that is bigger than buffer, it can allocate more memory temporarily.
Example Usage
In this example StartElement and EndElement events are counted. Note that you can find more examples under tests
directory.
StartElement
also include empty tags. Checked byis_empty
.- Reference entities like
&
or<
comes in its own event(Not inCharacters
). - Character/numerical and predefined entity references are resolved. Custom entity definitions are passed as raw.
- Check sax::Event to see all available event types
use std::fs::File;
use xml_oxide::{sax::parser::Parser, sax::Event};
fn main() {
println!("Starting...");
let mut counter: usize = 0;
let mut end_counter: usize = 0;
let now = std::time::Instant::now();
let f = File::open("./tests/xml_files/books.xml").unwrap();
let mut p = Parser::from_reader(f);
loop {
let res = p.read_event();
match res {
Ok(event) => match event {
Event::StartDocument => {}
Event::EndDocument => {
break;
}
Event::StartElement(el) => {
//You can differantiate between Starting Tag and Empty Element Tag
if !el.is_empty {
counter = counter + 1;
// print every 10000th element name
if counter % 10000 == 0 {
println!("%10000 start {}", el.name);
}
}
}
Event::EndElement(el) => {
end_counter += 1;
if el.name == "feed" {
break;
}
}
Event::Characters(_) => {}
Event::Reference(_) => {}
_ => {}
},
Err(err) => {
println!("{}", err);
break;
}
}
}
println!("Start event count:{}", counter);
println!("End event count:{}", end_counter);
let elapsed = now.elapsed();
println!("Time elapsed: {:.2?}", elapsed);
}