Skip to main content

clean_html

Function clean_html 

Source
pub fn clean_html(html: &str) -> String
Expand description

Strip Microsoft / HWP preprocessing artifacts that would otherwise confuse the HTML parser. Safe to call on arbitrary input including multi-byte UTF-8.

Currently removes:

  • <!--StartFragment--> and <!--EndFragment--> (MS clipboard markers)
  • <o:p>…</o:p> and bare <o:p> / </o:p> (Office VML namespace)