# async-read-super-ext
[](https://www.rust-lang.org)
[](https://crates.io/crates/async-read-super-ext)
[](https://docs.rs/async-read-super-ext)
[](LICENSE-APACHE)
A Rust library that provides extended functionality for async readers, specifically focusing on UTF-8 boundary-aware reading operations.
## Overview
This library extends Tokio's `AsyncBufRead` trait with additional methods for reading data while respecting UTF-8 character boundaries. The main feature is `read_utf8_boundaries_lossy`, which reads data from an async source and ensures that the output contains only valid UTF-8, replacing invalid sequences with Unicode replacement characters.
## Features
- **UTF-8 Boundary Awareness**: Reads data while respecting UTF-8 character boundaries
- **Lossy Conversion**: Invalid UTF-8 sequences are replaced with replacement characters (`�`)
- **Async/Await Support**: Built on top of Tokio's async I/O primitives
- **Buffer Management**: Handles incomplete UTF-8 sequences across read boundaries
- **Zero-Copy When Possible**: Efficient handling of valid UTF-8 data
## Installation
Add this to your `Cargo.toml`:
```toml
[dependencies]
async-read-super-ext = "0.1.0"
```
## Usage
```rust
use async_read_super_ext::AsyncReadSuperExt;
use tokio::io::{AsyncBufRead, BufReader};
use std::io::Cursor;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Example with valid UTF-8 data
let data = "Hello, 🦀 World!";
let mut reader = BufReader::new(Cursor::new(data.as_bytes()));
let mut output = Vec::new();
let bytes_read = reader.read_utf8_boundaries_lossy(&mut output).await?;
let result = String::from_utf8(output)?;
println!("Read {} bytes: {}", bytes_read, result);
Ok(())
}
```
### Handling Invalid UTF-8
The library gracefully handles invalid UTF-8 sequences by replacing them with Unicode replacement characters:
```rust
use async_read_super_ext::AsyncReadSuperExt;
use tokio::io::BufReader;
use std::io::Cursor;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Create data with invalid UTF-8 bytes
let mut data = Vec::new();
data.extend_from_slice("Hello ".as_bytes());
data.push(0xFF); // Invalid UTF-8 byte
data.push(0xFE); // Invalid UTF-8 byte
data.extend_from_slice(" World".as_bytes());
let mut reader = BufReader::new(Cursor::new(data));
let mut output = Vec::new();
let bytes_read = reader.read_utf8_boundaries_lossy(&mut output).await?;
let result = String::from_utf8(output)?;
println!("Read {} bytes: {}", bytes_read, result);
// Output: "Hello �� World" (with replacement characters)
Ok(())
}
```
### Reading Large Files
The library efficiently handles large files and streams:
```rust
use async_read_super_ext::AsyncReadSuperExt;
use tokio::fs::File;
use tokio::io::BufReader;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let file = File::open("large_file.txt").await?;
let mut reader = BufReader::new(file);
let mut all_data = Vec::new();
let mut buffer = Vec::new();
loop {
buffer.clear();
let bytes_read = reader.read_utf8_boundaries_lossy(&mut buffer).await?;
if bytes_read == 0 {
break; // EOF
}
all_data.extend_from_slice(&buffer);
}
let content = String::from_utf8(all_data)?;
println!("Total content length: {} characters", content.chars().count());
Ok(())
}
```
## How It Works
The library implements a state machine that:
1. **Reads data** from the underlying async reader using `poll_fill_buf`
2. **Validates UTF-8** sequences using Rust's built-in UTF-8 validation
3. **Handles incomplete sequences** by buffering partial UTF-8 characters across read boundaries
4. **Replaces invalid bytes** with Unicode replacement characters (`U+FFFD`)
5. **Outputs valid UTF-8** data to the provided buffer
### Key Components
- **`AsyncReadSuperExt`**: Extension trait that adds the `read_utf8_boundaries_lossy` method to any `AsyncBufRead`
- **`Utf8BoundariesLossy`**: Future that implements the async reading logic
- **Internal state management**: Handles incomplete UTF-8 sequences and invalid byte replacement
## Performance Characteristics
- **Memory Efficient**: Uses a small fixed-size buffer (4 bytes) for handling incomplete UTF-8 sequences
- **Streaming**: Processes data incrementally without requiring the entire input in memory
- **Zero-Copy**: Valid UTF-8 data is copied directly to the output buffer without additional processing
## Error Handling
The library follows Rust's standard error handling patterns:
- I/O errors from the underlying reader are propagated
- Invalid UTF-8 sequences are handled gracefully with replacement characters
- The output is always valid UTF-8
## Dependencies
- **tokio**: Async runtime and I/O utilities
- **pin-project-lite**: For safe pin projection in async contexts
- **tracing**: For logging and debugging support
## Compatibility
- **Rust Edition**: 2024
- **Minimum Rust Version**: Requires Rust with async/await support
- **Tokio Version**: Compatible with Tokio 1.x
## Testing
The library includes comprehensive tests covering:
- Valid UTF-8 input
- Invalid UTF-8 sequences
- Incomplete UTF-8 at buffer boundaries
- Large file handling
- Mixed valid/invalid content
- Edge cases (empty input, leading/trailing invalid bytes)
Run tests with:
```bash
cargo test
```
## License
This project is licensed under either of
- Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE))
- MIT License ([LICENSE-MIT](LICENSE-MIT))
at your option.
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
<p align="center">
<strong>Happy coding with Rust! 🦀✨</strong>
</p>