Skip to main content

remove_invalid_xml_chars

Function remove_invalid_xml_chars 

Source
pub fn remove_invalid_xml_chars(input: &str) -> String
Expand description

Remove characters that are invalid in XML 1.0 documents.

Per the XML 1.0 specification (Section 2.2), the only valid characters are:

  • #x9 (tab), #xA (line feed), #xD (carriage return)
  • #x20#xD7FF
  • #xE000#xFFFD
  • #x10000#x10FFFF

All other characters (control characters \x00\x08, \x0B\x0C, \x0E\x1F, surrogates \xD800\xDFFF, \xFFFE\xFFFF) are stripped from the output.

This mirrors the character-level cleaning portion of the PHP Strings::normalize() function in sped-common.

§Examples

use fiscal_core::xml_utils::remove_invalid_xml_chars;
assert_eq!(remove_invalid_xml_chars("hello\x00world"), "helloworld");
assert_eq!(remove_invalid_xml_chars("tab\there"), "tab\there");
assert_eq!(remove_invalid_xml_chars("line\nfeed"), "line\nfeed");