Skip to main content

Crate bom_strip

Crate bom_strip 

Source
Expand description

§bom-strip

Strip UTF-8/16/32 BOMs and stray U+FEFF code points from text.

A leading byte order mark breaks serde_json::from_str, hash-based deduplication, and config parsers that don’t allow leading whitespace. This crate gives you four small functions:

  • strip_str — strip a leading U+FEFF from a &str.
  • strip_all — strip every U+FEFF in the input, not just leading.
  • strip_bytes — strip a leading UTF-8 / UTF-16 LE/BE / UTF-32 LE/BE BOM from a &[u8].
  • detect_bom — identify which BOM (if any) leads &[u8].

§Example

use bom_strip::{strip_str, strip_bytes, detect_bom, Bom};

assert_eq!(strip_str("\u{FEFF}hello"), "hello");
assert_eq!(strip_bytes(&[0xEF, 0xBB, 0xBF, b'h', b'i']), &[b'h', b'i']);
assert_eq!(detect_bom(&[0xFF, 0xFE, b'a', 0]), Some(Bom::Utf16Le));

Enums§

Bom
Identified BOM kind.

Functions§

detect_bom
Detect which BOM (if any) leads b.
strip_all
Strip every U+FEFF (BOM and zero-width no-break-space) in s.
strip_bytes
Strip a leading BOM from b. Returns the input unchanged if none.
strip_str
Strip a leading U+FEFF from s.