pub trait AsyncReadSuperExt: AsyncBufRead {
// Provided method
fn read_utf8_boundaries_lossy<'a>(
&'a mut self,
buf: &'a mut Vec<u8>,
) -> Utf8BoundariesLossy<'a, Self>
where Self: Unpin { ... }
}
Provided Methods§
Sourcefn read_utf8_boundaries_lossy<'a>(
&'a mut self,
buf: &'a mut Vec<u8>,
) -> Utf8BoundariesLossy<'a, Self>where
Self: Unpin,
fn read_utf8_boundaries_lossy<'a>(
&'a mut self,
buf: &'a mut Vec<u8>,
) -> Utf8BoundariesLossy<'a, Self>where
Self: Unpin,
Reads data from the async reader while respecting UTF-8 character boundaries.
This method reads data from the underlying async reader and ensures that the output
buffer contains only valid UTF-8 sequences. Any invalid UTF-8 bytes are replaced
with Unicode replacement characters (U+FFFD
).
§Features
- UTF-8 Boundary Awareness: Handles incomplete UTF-8 sequences that span across multiple read operations by buffering partial characters.
- Lossy Conversion: Invalid UTF-8 bytes are replaced with replacement characters rather than causing errors.
- Efficient Processing: Valid UTF-8 data is processed without additional copying when possible.
§Arguments
buf
- A mutable reference to aVec<u8>
where the valid UTF-8 data will be written. The buffer will be extended with new data, not replaced.
§Returns
Returns a future that resolves to io::Result<usize>
where the usize
indicates
the number of bytes written to the buffer. A return value of 0
indicates EOF.
§Behavior with Invalid UTF-8
- Invalid sequences: Each invalid byte is replaced with a UTF-8 replacement
character (
�
), which is 3 bytes in UTF-8 encoding. - Incomplete sequences: If an incomplete UTF-8 sequence is encountered at the end of available data, the method will buffer it and attempt to complete it on the next read. If EOF is reached with an incomplete sequence, each byte of the incomplete sequence is replaced with a replacement character.
§Examples
§Reading valid UTF-8 data
use async_read_super_ext::AsyncReadSuperExt;
use tokio::io::{BufReader, Cursor};
let data = "Hello, 🦀 World!";
let mut reader = BufReader::new(Cursor::new(data.as_bytes()));
let mut output = Vec::new();
let bytes_read = reader.read_utf8_boundaries_lossy(&mut output).await?;
assert_eq!(bytes_read, data.len());
assert_eq!(String::from_utf8(output).unwrap(), data);
§Handling invalid UTF-8 bytes
use async_read_super_ext::AsyncReadSuperExt;
use tokio::io::{BufReader, Cursor};
// Create data with invalid UTF-8 bytes
let mut data = Vec::new();
data.extend_from_slice("Hello ".as_bytes());
data.push(0xFF); // Invalid UTF-8 byte
data.push(0xFE); // Invalid UTF-8 byte
data.extend_from_slice(" World".as_bytes());
let mut reader = BufReader::new(Cursor::new(data));
let mut output = Vec::new();
let bytes_read = reader.read_utf8_boundaries_lossy(&mut output).await?;
let result = String::from_utf8(output).unwrap();
assert!(result.contains("Hello "));
assert!(result.contains(" World"));
assert!(result.contains('\u{FFFD}')); // Replacement character
// Count replacement characters (should be 2 for the 2 invalid bytes)
let replacement_count = result.chars().filter(|&c| c == '\u{FFFD}').count();
assert_eq!(replacement_count, 2);
§Reading from a stream until EOF
use async_read_super_ext::AsyncReadSuperExt;
use tokio::io::{BufReader, Cursor};
let data = "Line 1\nLine 2\nLine 3";
let mut reader = BufReader::new(Cursor::new(data.as_bytes()));
let mut all_data = Vec::new();
let mut buffer = Vec::new();
loop {
buffer.clear();
let bytes_read = reader.read_utf8_boundaries_lossy(&mut buffer).await?;
if bytes_read == 0 {
break; // EOF reached
}
all_data.extend_from_slice(&buffer);
}
let result = String::from_utf8(all_data).unwrap();
assert_eq!(result, data);
§Errors
This method will return an error if the underlying reader encounters an I/O error. Invalid UTF-8 sequences do not cause errors; they are handled by replacement.
Dyn Compatibility§
This trait is not dyn compatible.
In older versions of Rust, dyn compatibility was called "object safety", so this trait is not object safe.