Expand description
§regexml
To start off, don’t worry. We’re not using regexes to parse XML in this crate. We’re not in fact dealing with XML directly at all.
If you need a Regex engine in Rust, this crate isn’t likely for you; use the regex crate instead. This crate instead implements a Regex engine compliant with varous XML-related standards, and focuses on standard-compliance rather than performance.
regexml implements a regular expression engine that’s compliant with regular
expressions as defined in appendix G of the XML Schema 1.1 standard, part 2:
https://www.w3.org/TR/xmlschema11-2/#regexs
This is the regex language that XML Schema uses so the user can define patterns as additional constraints on string data in an XML document:
https://www.w3.org/TR/xmlschema11-2/#dc-pattern
The XPath and XQuery Functions and Operators 3.1 specification defines an extension of these regular expressions for the purposes of use within the XPath and XQuery standard function library:
https://www.w3.org/TR/xpath-functions-31/#regex-syntax
regexml also implements this extension.
§Origins
The Rust source code is based on the Java implementation in Saxon HE
net.sf.saxon.regex, which implements a spec-compliant Regex engine. In turn
this code is based on an engine implemented by Apache Jakarta:
https://blog.saxonica.com/mike/2012/01/a-new-regex-engine.html
The Java code has been translated by hand into Rust. There are some differences:
-
Operationis an enum, instead of implemented using subclasses and dynamic dispatch as in the Java version. A traitOperationControlprovides dispatch to the enums. -
The
icu4xproject’sicu_crates are used to provide various unicode features, including the implementation of character classes and casing rules. EspeciallyCodePointInversionListand its associated builder proved very useful. Due to the way the regex compiler is organizedCharacterClassdoes provide a special case for character class that matches with a single character. -
The original code had no internal tests, but a lot of integration tests were provided through the qt3tests project for testing XPath and XQuery. Most of those tests have been ported into simple Rust tests, which makes this package easier to maintain and debug.
Now that the port is complete we expect this package to evolve separately wherever it may go - no 1 to 1 mapping with the original Java code is going to be maintained.
Structs§
- Regex
- A XML-style regular expression.
Enums§
- Analyze
Entry - The
Regex::analyzemethod returns an iterator over the results of analyzing a string with a regular expression. It contains these entries. - Error
- Regular expression error
- Match
Entry - A match can be a simple match, or a group match.