Expand description
An iterator adapter to suppress CRLF (\r\n) sequences in a stream of
bytes.
§Overview
This module provides CrlfSuppressor, an iterator adapter to filter out
CR (\r, 0x0D) when it is immediately followed by LF (\n, 0x0A), as
commonly found in Windows line endings.
It also provides an extension trait CrlfSuppressorExt so you can easily
call .crlf_suppressor() on any iterator over bytes (e.g., from
BufReader::bytes()).
§Usage
§Basic example
use std::io::{Cursor, Error, Read};
use tpnote_lib::text_reader::CrlfSuppressorExt;
let data = b"hello\r\nworld";
let normalized: Result<Vec<u8>, Error> = Cursor::new(data)
.bytes()
.crlf_suppressor()
.collect();
let s = String::from_utf8(normalized.unwrap()).unwrap();
assert_eq!(s, "hello\nworld");§Reading from a file
use std::fs::File;
use tpnote_lib::text_reader::read_as_string_with_crlf_suppression;
let normalized = read_as_string_with_crlf_suppression(File::open("file.txt")?)?;
println!("{}", normalized);§Implementation details
In UTF-8, continuation bytes for multi-byte code points are always in the
range 0x80..0xBF. Since 0x0D and 0x0A are not in this range, searching
for CRLF as byte values is safe.
§See also
Structs§
- Crlf
Suppressor - An iterator adapter that suppresses CR (
\r, 0x0D) when followed by LF (\n, 0x0A). In a valid multi-byte UTF-8 sequence, continuation bytes must be in the range 0x80 to 0xBF. As 0x0D and 0x0A are not in this range, we can search for them in a stream of bytes.
Traits§
- Crlf
Suppressor Ext - Extension trait to add
.crlf_suppressor()to any iterator over bytes. - String
Ext - Additional method for
Stringsuppressing\rin\r\nsequences: When no\r\nis found, no memory allocation occurs.
Functions§
- read_
as_ string_ with_ crlf_ suppression - Reads all bytes from the given reader, suppressing CR (
\r) bytes that are immediately followed by LF (\n), and returns the resulting data as a UTF-8 string. - read_
with_ crlf_ suppression - Reads all bytes from the given reader, suppressing CR (
\r) bytes that are immediately followed by LF (\n).