Expand description
An iterator adapter to suppress CRLF (\r\n
) sequences in a stream of
bytes.
§Overview
This module provides CrlfSuppressor
, an iterator adapter to filter out
CR (\r
, 0x0D) when it is immediately followed by LF (\n
, 0x0A), as
commonly found in Windows line endings.
It also provides an extension trait CrlfSuppressorExt
so you can easily
call .crlf_suppressor()
on any iterator over bytes (e.g., from
BufReader::bytes()
).
§Usage
§Basic example
use std::io::{Cursor, Error, Read};
use tpnote_lib::text_reader::CrlfSuppressorExt;
let data = b"hello\r\nworld";
let normalized: Result<Vec<u8>, Error> = Cursor::new(data)
.bytes()
.crlf_suppressor()
.collect();
let s = String::from_utf8(normalized.unwrap()).unwrap();
assert_eq!(s, "hello\nworld");
§Reading from a file
use std::fs::File;
use tpnote_lib::text_reader::read_as_string_with_crlf_suppression;
let normalized = read_as_string_with_crlf_suppression(File::open("file.txt")?)?;
println!("{}", normalized);
§Implementation details
In UTF-8, continuation bytes for multi-byte code points are always in the
range 0x80..0xBF
. Since 0x0D
and 0x0A
are not in this range, searching
for CRLF as byte values is safe.
§See also
Structs§
- Crlf
Suppressor - An iterator adapter that suppresses CR (
\r
, 0x0D) when followed by LF (\n
, 0x0A). In a valid multi-byte UTF-8 sequence, continuation bytes must be in the range 0x80 to 0xBF. As 0x0D and 0x0A are not in this range, we can search for them in a stream of bytes.
Traits§
- Crlf
Suppressor Ext - Extension trait to add
.crlf_suppressor()
to any iterator over bytes. - String
Ext - Additional method for
String
suppressing\r
in\r\n
sequences: When no\r\n
is found, no memory allocation occurs.
Functions§
- read_
as_ string_ with_ crlf_ suppression - Reads all bytes from the given reader, suppressing CR (
\r
) bytes that are immediately followed by LF (\n
), and returns the resulting data as a UTF-8 string. - read_
with_ crlf_ suppression - Reads all bytes from the given reader, suppressing CR (
\r
) bytes that are immediately followed by LF (\n
).