xml_oxide 
Rust XML parser implementation that parses any well-formed XML defined in the W3C Spec in a streaming way.
Features
- It uses constant-like memory for large XML files
- It only supports UTF-8 encoding
- It is a non-validating parser
- It ignores well-formedness in Processing Instructions(DTD), DOCTYPE and parses them as raw strings
- It can parse not-well-formed documents (please report as a bug)
- Entities that can be large are parsed as chunks to keep memory usage low: Character Data, CDATA Section, Comment, Whitespace
- If you have an element tag or DOCTYPE declaration that is bigger than buffer size(currently default 8KB), it can fail
Example Usage
In this example "start element" and "end element" events are counted. Note that you can find more examples under tests
directory.
use std::fs::File;
use xml_oxide::{parser::OxideParser, sax::Event};
fn main() {
println!("Hello, world!");
let mut counter: usize = 0;
let mut end_counter: usize = 0;
use std::time::Instant;
let now = Instant::now();
let f = File::open(
"C:/Users/fatih/Downloads/enwiki-20211120-abstract1.xml/enwiki-20211120-abstract1.xml",
)
.unwrap();
let mut p = OxideParser::start(f);
loop {
let res = p.read_event();
match res {
Ok(event) => match event {
Event::StartDocument => {}
Event::EndDocument => {}
Event::StartElement(el) => {
counter = counter + 1;
if counter % 10000 == 0 {
println!("%10000 start {}", el.name);
}
}
Event::EndElement(el) => {
end_counter += 1;
if el.name == "feed" {
break;
}
}
Event::Characters(_) => {}
Event::Reference(_) => {}
_ => {}
},
Err(err) => {
println!("{}", err);
break;
}
}
}
println!("START EVENT COUNT:{}", counter);
println!("END EVENT COUNT:{}", end_counter);
let elapsed = now.elapsed();
println!("Elapsed: {:.2?}", elapsed);
}