async-read-super-ext 0.1.0

A super extension for tokio::io::AsyncRead
Documentation
# async-read-super-ext

[![Rust](https://img.shields.io/badge/rust-1.85-brightgreen.svg)](https://www.rust-lang.org)
[![Crates.io](https://img.shields.io/crates/v/async-read-super-ext.svg)](https://crates.io/crates/async-read-super-ext)
[![Documentation](https://docs.rs/async-read-super-ext/badge.svg)](https://docs.rs/async-read-super-ext)
[![MIT/Apache-2 licensed](https://img.shields.io/crates/l/async-read-super-ext.svg)](LICENSE-APACHE)

A Rust library that provides extended functionality for async readers, specifically focusing on UTF-8 boundary-aware reading operations.

## Overview

This library extends Tokio's `AsyncBufRead` trait with additional methods for reading data while respecting UTF-8 character boundaries. The main feature is `read_utf8_boundaries_lossy`, which reads data from an async source and ensures that the output contains only valid UTF-8, replacing invalid sequences with Unicode replacement characters.

## Features

- **UTF-8 Boundary Awareness**: Reads data while respecting UTF-8 character boundaries
- **Lossy Conversion**: Invalid UTF-8 sequences are replaced with replacement characters (``)
- **Async/Await Support**: Built on top of Tokio's async I/O primitives
- **Buffer Management**: Handles incomplete UTF-8 sequences across read boundaries
- **Zero-Copy When Possible**: Efficient handling of valid UTF-8 data

## Installation

Add this to your `Cargo.toml`:

```toml
[dependencies]
async-read-super-ext = "0.1.0"
```

## Usage

```rust
use async_read_super_ext::AsyncReadSuperExt;
use tokio::io::{AsyncBufRead, BufReader};
use std::io::Cursor;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Example with valid UTF-8 data
    let data = "Hello, 🦀 World!";
    let mut reader = BufReader::new(Cursor::new(data.as_bytes()));
    let mut output = Vec::new();
    
    let bytes_read = reader.read_utf8_boundaries_lossy(&mut output).await?;
    
    let result = String::from_utf8(output)?;
    println!("Read {} bytes: {}", bytes_read, result);
    
    Ok(())
}
```

### Handling Invalid UTF-8

The library gracefully handles invalid UTF-8 sequences by replacing them with Unicode replacement characters:

```rust
use async_read_super_ext::AsyncReadSuperExt;
use tokio::io::BufReader;
use std::io::Cursor;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create data with invalid UTF-8 bytes
    let mut data = Vec::new();
    data.extend_from_slice("Hello ".as_bytes());
    data.push(0xFF); // Invalid UTF-8 byte
    data.push(0xFE); // Invalid UTF-8 byte
    data.extend_from_slice(" World".as_bytes());
    
    let mut reader = BufReader::new(Cursor::new(data));
    let mut output = Vec::new();
    
    let bytes_read = reader.read_utf8_boundaries_lossy(&mut output).await?;
    
    let result = String::from_utf8(output)?;
    println!("Read {} bytes: {}", bytes_read, result);
    // Output: "Hello �� World" (with replacement characters)
    
    Ok(())
}
```

### Reading Large Files

The library efficiently handles large files and streams:

```rust
use async_read_super_ext::AsyncReadSuperExt;
use tokio::fs::File;
use tokio::io::BufReader;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let file = File::open("large_file.txt").await?;
    let mut reader = BufReader::new(file);
    let mut all_data = Vec::new();
    let mut buffer = Vec::new();
    
    loop {
        buffer.clear();
        let bytes_read = reader.read_utf8_boundaries_lossy(&mut buffer).await?;
        
        if bytes_read == 0 {
            break; // EOF
        }
        
        all_data.extend_from_slice(&buffer);
    }
    
    let content = String::from_utf8(all_data)?;
    println!("Total content length: {} characters", content.chars().count());
    
    Ok(())
}
```

## How It Works

The library implements a state machine that:

1. **Reads data** from the underlying async reader using `poll_fill_buf`
2. **Validates UTF-8** sequences using Rust's built-in UTF-8 validation
3. **Handles incomplete sequences** by buffering partial UTF-8 characters across read boundaries
4. **Replaces invalid bytes** with Unicode replacement characters (`U+FFFD`)
5. **Outputs valid UTF-8** data to the provided buffer

### Key Components

- **`AsyncReadSuperExt`**: Extension trait that adds the `read_utf8_boundaries_lossy` method to any `AsyncBufRead`
- **`Utf8BoundariesLossy`**: Future that implements the async reading logic
- **Internal state management**: Handles incomplete UTF-8 sequences and invalid byte replacement

## Performance Characteristics

- **Memory Efficient**: Uses a small fixed-size buffer (4 bytes) for handling incomplete UTF-8 sequences
- **Streaming**: Processes data incrementally without requiring the entire input in memory
- **Zero-Copy**: Valid UTF-8 data is copied directly to the output buffer without additional processing

## Error Handling

The library follows Rust's standard error handling patterns:

- I/O errors from the underlying reader are propagated
- Invalid UTF-8 sequences are handled gracefully with replacement characters
- The output is always valid UTF-8

## Dependencies

- **tokio**: Async runtime and I/O utilities
- **pin-project-lite**: For safe pin projection in async contexts
- **tracing**: For logging and debugging support

## Compatibility

- **Rust Edition**: 2024
- **Minimum Rust Version**: Requires Rust with async/await support
- **Tokio Version**: Compatible with Tokio 1.x

## Testing

The library includes comprehensive tests covering:

- Valid UTF-8 input
- Invalid UTF-8 sequences
- Incomplete UTF-8 at buffer boundaries
- Large file handling
- Mixed valid/invalid content
- Edge cases (empty input, leading/trailing invalid bytes)

Run tests with:

```bash
cargo test
```

## License

This project is licensed under either of

- Apache License, Version 2.0 ([LICENSE-APACHE]LICENSE-APACHE)
- MIT License ([LICENSE-MIT]LICENSE-MIT)

at your option.

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

<p align="center">
<strong>Happy coding with Rust! 🦀✨</strong>
</p>