Trait AsyncReadSuperExt

Source
pub trait AsyncReadSuperExt: AsyncBufRead {
    // Provided method
    fn read_utf8_boundaries_lossy<'a>(
        &'a mut self,
        buf: &'a mut Vec<u8>,
    ) -> Utf8BoundariesLossy<'a, Self>
       where Self: Unpin { ... }
}

Provided Methods§

Source

fn read_utf8_boundaries_lossy<'a>( &'a mut self, buf: &'a mut Vec<u8>, ) -> Utf8BoundariesLossy<'a, Self>
where Self: Unpin,

Reads data from the async reader while respecting UTF-8 character boundaries.

This method reads data from the underlying async reader and ensures that the output buffer contains only valid UTF-8 sequences. Any invalid UTF-8 bytes are replaced with Unicode replacement characters (U+FFFD).

§Features
  • UTF-8 Boundary Awareness: Handles incomplete UTF-8 sequences that span across multiple read operations by buffering partial characters.
  • Lossy Conversion: Invalid UTF-8 bytes are replaced with replacement characters rather than causing errors.
  • Efficient Processing: Valid UTF-8 data is processed without additional copying when possible.
§Arguments
  • buf - A mutable reference to a Vec<u8> where the valid UTF-8 data will be written. The buffer will be extended with new data, not replaced.
§Returns

Returns a future that resolves to io::Result<usize> where the usize indicates the number of bytes written to the buffer. A return value of 0 indicates EOF.

§Behavior with Invalid UTF-8
  • Invalid sequences: Each invalid byte is replaced with a UTF-8 replacement character (), which is 3 bytes in UTF-8 encoding.
  • Incomplete sequences: If an incomplete UTF-8 sequence is encountered at the end of available data, the method will buffer it and attempt to complete it on the next read. If EOF is reached with an incomplete sequence, each byte of the incomplete sequence is replaced with a replacement character.
§Examples
§Reading valid UTF-8 data
use async_read_super_ext::AsyncReadSuperExt;
use tokio::io::{BufReader, Cursor};

let data = "Hello, 🦀 World!";
let mut reader = BufReader::new(Cursor::new(data.as_bytes()));
let mut output = Vec::new();

let bytes_read = reader.read_utf8_boundaries_lossy(&mut output).await?;

assert_eq!(bytes_read, data.len());
assert_eq!(String::from_utf8(output).unwrap(), data);
§Handling invalid UTF-8 bytes
use async_read_super_ext::AsyncReadSuperExt;
use tokio::io::{BufReader, Cursor};

// Create data with invalid UTF-8 bytes
let mut data = Vec::new();
data.extend_from_slice("Hello ".as_bytes());
data.push(0xFF); // Invalid UTF-8 byte
data.push(0xFE); // Invalid UTF-8 byte
data.extend_from_slice(" World".as_bytes());

let mut reader = BufReader::new(Cursor::new(data));
let mut output = Vec::new();

let bytes_read = reader.read_utf8_boundaries_lossy(&mut output).await?;

let result = String::from_utf8(output).unwrap();
assert!(result.contains("Hello "));
assert!(result.contains(" World"));
assert!(result.contains('\u{FFFD}')); // Replacement character

// Count replacement characters (should be 2 for the 2 invalid bytes)
let replacement_count = result.chars().filter(|&c| c == '\u{FFFD}').count();
assert_eq!(replacement_count, 2);
§Reading from a stream until EOF
use async_read_super_ext::AsyncReadSuperExt;
use tokio::io::{BufReader, Cursor};

let data = "Line 1\nLine 2\nLine 3";
let mut reader = BufReader::new(Cursor::new(data.as_bytes()));
let mut all_data = Vec::new();
let mut buffer = Vec::new();

loop {
    buffer.clear();
    let bytes_read = reader.read_utf8_boundaries_lossy(&mut buffer).await?;
     
    if bytes_read == 0 {
        break; // EOF reached
    }
     
    all_data.extend_from_slice(&buffer);
}

let result = String::from_utf8(all_data).unwrap();
assert_eq!(result, data);
§Errors

This method will return an error if the underlying reader encounters an I/O error. Invalid UTF-8 sequences do not cause errors; they are handled by replacement.

Dyn Compatibility§

This trait is not dyn compatible.

In older versions of Rust, dyn compatibility was called "object safety", so this trait is not object safe.

Implementors§