1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
/*!

*xmlparser* is a low-level, pull-based, zero-allocation
[XML 1.0](https://www.w3.org/TR/xml/) parser.

## Example

```rust
for token in xmlparser::Tokenizer::from("<tagname name='value'/>") {
    println!("{:?}", token);
}
```

## Why a new library

The main idea of this library is to provide a fast, low-level and complete XML parser.

Unlike other XML parsers, this one can return tokens not with `&str`/`&[u8]` data, but
with `StrSpan` objects, which contain a position of the data in the original document.
Which can be very useful if you want to post-process tokens even more and want to return
errors with a meaningful position.

So, this is basically an XML parser framework that can be used to write parsers for XML-based formats,
like SVG and to construct a DOM.

At the time of writing the only option was `quick-xml` (v0.10), which does not support DTD and
token positions.

## Benefits

- Tokens contain `StrSpan` objects, that contains a position of the data in the original document.
- Supports basic text escaping with `xml:space` (should be invoked manually).
  A properer text escaping is very hard without a DOM construction.
- Good error processing. All error types contain position (line:column) where it occurred.
- No heap allocations.

## Limitations

- Currently, only ENTITY objects are parsed from the DOCTYPE. Other ignored.
- No tree structure validation. So an XML like `<root><child></root></child>`
  will be parsed without errors. You should check for this manually.
- Duplicated attributes is not an error. So an XML like `<item a="v1" a="v2"/>`
  will be parsed without errors. You should check for this manually.
- UTF-8 only.

## Safety

- The library should not panic. Any panic considered as a critical bug
  and should be reported.
- The library forbids unsafe code.
*/

#![doc(html_root_url = "https://docs.rs/xmlparser/0.4.0")]

#![forbid(unsafe_code)]
#![warn(missing_docs)]

#[macro_use] extern crate log;
#[macro_use] extern crate failure;


mod error;
mod stream;
mod strspan;
mod text;
mod token;
mod xml;
mod xmlchar;


pub use error::{
    Error,
    ErrorPos,
    StreamError,
};
pub use stream::{
    Reference,
    Stream,
};
pub use text::{
    TextUnescape,
    XmlSpace,
};
pub use strspan::{
    StrSpan,
};
pub use token::{
    ElementEnd,
    EntityDefinition,
    ExternalId,
    Token,
};
pub use xml::{
    Tokenizer,
    TokenType,
};
pub use xmlchar::{
    XmlByteExt,
    XmlCharExt,
};