Crate xml_oxide[][src]

Expand description

xml_oxide

crates.io github Released API docs

Rust XML parser implementation that parses any well-formed XML defined in the W3C Spec in a streaming way.

Features

  • It uses constant-like memory for large XML files
  • It only supports UTF-8 encoding
  • It is a non-validating parser
  • It ignores well-formedness in Processing Instructions(DTD), DOCTYPE and parses them as raw strings
  • It can parse not-well-formed documents (please report as a bug)
  • Entities that can be large are parsed as chunks to keep memory usage low: Character Data, CDATA Section, Comment, Whitespace
  • If you have an element tag or DOCTYPE declaration that is bigger than buffer size(currently default 8KB), it can fail

Example Usage

In this example StartElement and EndElement events are counted. Note that you can find more examples under tests directory.

  • StartElement also include empty tags. Checked by is_empty.
  • Reference entities like & or < comes in its own event(Not in Characters).
  • Character/numerical and predefined entity references are resolved. Custom entity definitions are passed as raw.
  • Check sax::Event to see all available event types
use std::fs::File;
use xml_oxide::{parser::Parser, sax::Event};


fn main() {
    println!("Starting...");

    let mut counter: usize = 0;
    let mut end_counter: usize = 0;

    let now = std::time::Instant::now();

    let f = File::open("./tests/xml_files/books.xml").unwrap();

    let mut p = Parser::start(f);

    loop {
        let res = p.read_event();

        match res {
            Ok(event) => match event {
                Event::StartDocument => {}
                Event::EndDocument => {
                    break;
                }
                Event::StartElement(el) => {
                    //You can differantiate between Starting Tag and Empty Element Tag
                    if !el.is_empty {
                        counter = counter + 1;
                        // print every 10000th element name
                        if counter % 10000 == 0 {
                            println!("%10000 start {}", el.name);
                        }
                    }
                }
                Event::EndElement(el) => {
                    end_counter += 1;
                    if el.name == "feed" {
                        break;
                    }
                }
                Event::Characters(_) => {}
                Event::Reference(_) => {}
                _ => {}
            },
            Err(err) => {
                println!("{}", err);
                break;
            }
        }
    }

    println!("Start event count:{}", counter);
    println!("End event count:{}", end_counter);

    let elapsed = now.elapsed();
    println!("Time elapsed: {:.2?}", elapsed);
}

Modules