Skip to main content

Crate xmltok

Crate xmltok 

Source
Expand description

xmltok is a low-level, pull-based, zero-allocation XML 1.0 tokenizer.

It is a fork of xmlparser with compact, lifetime-free tokens: 20 bytes per Token instead of 112.

§Example

for token in xmltok::Tokenizer::from("<tagname name='value'/>") {
    println!("{:?}", token);
}

§Why a new library?

This library is basically a low-level XML tokenizer that preserves the positions of the tokens and is not intended to be used directly. If you are looking for a higher level solution, check out roxmltree.

§Benefits

  • All tokens contain StrSpan structs which represent the position of the substring in the original document.
  • Good error processing. All error types contain the position (line:column) where it occurred.
  • No heap allocations.
  • No dependencies.
  • Tiny. ~1400 LOC and ~30KiB in the release build according to cargo-bloat.
  • Supports no_std builds. To use without the standard library, disable the default features.

§Limitations

  • Currently, only ENTITY objects are parsed from the DOCTYPE. All others are ignored.
  • No tree structure validation. So an XML like <root><child></root></child> or a string without root element will be parsed without errors. You should check for this manually. On the other hand <a/><a/> will lead to an error.
  • Duplicated attributes is not an error. So XML like <item a="v1" a="v2"/> will be parsed without errors. You should check for this manually.
  • UTF-8 only.
  • Markup tokens (declaration, processing instruction, DOCTYPE, ENTITY, element start/end, attribute) are limited to 65535 bytes each; text, CDATA and comment tokens to 4 GiB each. Documents are limited to 4 GiB. Anything longer produces a parsing error. This is the cost of the compact Token representation.

§Safety

  • The library must not panic. Any panic is considered a critical bug and should be reported.
  • The library forbids unsafe code.

Structs§

DetachedStrSpan
A string slice, holding offsets only.
SmallDetachedStrSpan
A string slice, holding offsets only.
StrSpan
A string slice.
Stream
A streaming XML parsing interface.
TextPos
Position in text.
Tokenizer
Tokenizer for the XML structure.

Enums§

ElementEnd
ElementEnd token.
EntityDefinition
Representation of the EntityDef value.
Error
An XML parser errors.
ExternalId
Representation of the ExternalID value.
Reference
Representation of the Reference value.
StreamError
A stream parser errors.
Token
An XML token.

Traits§

XmlByteExt
Extension methods for XML-subset only operations.
XmlCharExt
Extension methods for XML-subset only operations.