async-regex
Empower regex with streaming capabilities!
A high-performance library that brings the power of regex pattern matching to streaming data. This crate extends the standard read_until
functionality to support multi-byte patterns using regex, making it perfect for parsing protocols, log files, and other structured data streams.
Why async-regex? This crate empowers regex with streaming capabilities - bringing the robust pattern matching of the
regex
crate to streaming data processing!
โจ Features
- ๐ Regex-Powered: Built on the robust
regex
crate for reliable pattern matching - ๐ Streaming Support: Process data as it arrives without loading everything into memory
- โก High Performance: Optimized implementations with comprehensive benchmarks
- ๐ฆ Pure Rust Implementation: Entirely written in safe Rust with zero
unsafe
code - ๐งช Well Tested: Extensive test coverage
- ๐ Well Documented: Comprehensive documentation and examples
- ๐พ Memory Efficient: Zero-copy parsing and minimal allocations
- ๐ Async & Sync APIs: Both async and synchronous versions available
- ๐ Multi-byte Patterns: Unlike standard
read_until
which only supports single bytes - ๐ฏ Protocol Parsing: Perfect for HTTP, custom protocols, and structured data streams
๐ฏ Use Cases
Perfect for:
- HTTP Protocol Parsing: Find headers like "Content-Length:" or "Authorization:" in streaming HTTP data
- Log File Processing: Parse structured logs with regex patterns as they're being written
- Network Protocol Parsing: Handle custom protocols with complex pattern matching
- Data Pipeline Processing: Process large files without loading everything into memory
- Real-time Data Analysis: Find patterns in streaming sensor data or metrics
- Async Web Applications: Parse request/response data efficiently
- File Format Parsing: Parse structured files like CSV, JSON, or custom formats
- Any streaming scenario where you need regex pattern matching on data that arrives incrementally
๐ Quick Start
Async Regex Pattern Search
use read_until_pattern_async;
use Cursor;
use Runtime;
let rt = new.unwrap;
rt.block_on;
Complex Regex Pattern Matching
use read_until_pattern_async;
use Cursor;
use Runtime;
let rt = new.unwrap;
rt.block_on;
Sync Regex Pattern Search
use read_until_pattern;
use Cursor;
let mut reader = new;
let mut buffer = Vec new;
// Find timestamp using regex
let = read_until_pattern.unwrap;
assert_eq!;
assert_eq!;
๐ Performance
This crate is optimized for high-performance streaming pattern search with regex:
Streaming Performance Benefits
- Memory Efficient: Process large files without loading everything into memory
- Regex-Powered: Leverages the robust and fast
regex
crate for pattern matching - Async Optimized: Minimal overhead for async operations (~10% compared to sync)
- Zero-Copy Operations: Efficient data handling with minimal allocations
Performance Characteristics
Benchmarks run on MacBook Pro (2019) with 8-Core Intel Core i9 @ 2.4GHz, 32GB RAM
Simple Pattern Matching
- Small data (500 bytes): ~9.3ยตs per operation (async), ~9.1ยตs (sync)
- Medium data (5KB): ~9.4ยตs per operation (async), ~9.1ยตs (sync)
- Large data (50KB): ~10.3ยตs per operation (async), ~10.1ยตs (sync)
Regex Pattern Matching
- Small data (500 bytes): ~481ยตs per operation (regex patterns)
- Medium data (5KB): ~519ยตs per operation (regex patterns)
- Large data (50KB): ~835ยตs per operation (regex patterns)
Complex Pattern Matching
- Small data (500 bytes): ~428ยตs per operation (complex regex)
- Medium data (5KB): ~431ยตs per operation (complex regex)
- Large data (50KB): ~468ยตs per operation (complex regex)
Pattern Position Performance
- Pattern at start: ~7.1ยตs per operation
- Pattern at middle: ~7.3ยตs per operation
- Pattern at end: ~7.3ยตs per operation
Performance Notes
- Memory usage: Constant memory usage regardless of input size
- Pattern complexity: Performance scales with regex complexity, not input size
- Async overhead: ~10% performance cost for async operations vs sync
- Consistent performance: Pattern position has minimal impact on performance
Why Streaming Matters
- Large Files: Process multi-gigabyte files without memory issues
- Real-time Data: Handle continuous data streams efficiently
- Network Protocols: Parse data as it arrives over the network
- Resource Efficiency: Lower memory footprint and better resource utilization
๐ Empowering Regex with Streaming
This crate bridges the gap between regex and streaming data processing!
The Problem:
- regex crate: Powerful pattern matching, but requires complete in-memory data
- tokio::io::AsyncBufRead::read_until: Great for streaming, but only single-byte delimiters
- Standard libraries: No built-in way to use regex patterns on streaming data
Our Solution:
- Regex-powered streaming: Use any regex pattern on streaming data
- Multi-byte patterns: Find complex patterns like "HTTP/1.1" or email addresses
- Memory efficient: Process data as it arrives, not all at once
- Async & sync: Both streaming paradigms supported
Perfect for:
- Protocol parsing: HTTP headers, custom protocols, structured data
- Log processing: Parse logs as they're written with regex patterns
- Data pipelines: Process large files with complex pattern matching
- Real-time systems: Handle streaming data with regex power
When to Use Our Solution vs Other Libraries
Use Case | Our Solution | regex crate | tokio::io::AsyncBufRead |
---|---|---|---|
Regex patterns on streaming data | โ Perfect! | โ In-memory only | โ Single-byte only |
Multi-byte pattern matching | โ Regex-powered | โ Full regex support | โ Single-byte only |
Streaming data processing | โ Memory efficient | โ Loads all data | โ Memory efficient |
Complex pattern matching | โ Full regex support | โ Full regex support | โ Single-byte only |
Async I/O | โ Native async | โ Sync only | โ Native async |
Large file processing | โ Streaming | โ Memory intensive | โ ๏ธ Limited patterns |
Protocol parsing | โ Perfect | โ Not suitable | โ ๏ธ Limited patterns |
๐ก Key Insight: This crate combines the power of regex with the efficiency of streaming, making it perfect for processing large files or continuous data streams with complex pattern matching requirements.
API Reference
Async Functions (Regex-Powered Streaming)
read_until_pattern_async<R>(reader: &mut R, pattern: &str, to: &mut Vec<u8>) -> Result<(Vec<u8>, usize)>
- Find regex pattern in async stream, returns matched substring and total bytes read
- Where
R: AsyncBufRead + Unpin
read_while_any_async<R>(reader: &mut R, check_set: &[u8], to: &mut Vec<u8>) -> Result<(u8, usize)>
- Read while any byte in check_set matches, returns stop byte and count
- Where
R: AsyncBufRead + Unpin
Sync Functions (Regex-Powered Streaming)
read_until_pattern<R>(reader: &mut R, pattern: &str, to: &mut Vec<u8>) -> Result<(Vec<u8>, usize)>
- Find regex pattern in sync stream, returns matched substring and total bytes read
- Where
R: BufRead
read_while_any<R>(reader: &mut R, check_set: &[u8], to: &mut Vec<u8>) -> Result<(u8, usize)>
- Read while any byte in check_set matches, returns stop byte and count
- Where
R: BufRead
Utility Functions
find_pattern(haystack: &[u8], needle: &Regex) -> Option<(usize, usize)>
- Direct regex pattern search in byte slice, returns (start, length)
- Uses compiled regex for maximum performance
Testing
Run tests:
Run benchmarks:
๐ค Contributing
Contributions are welcome! This crate aims to make regex pattern matching accessible for streaming data. Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
๐ License
This project is licensed under the MIT License. See the LICENSE file for details.
๐ฏ Summary
async-regex empowers the powerful regex
crate with streaming capabilities, making it possible to use complex regex patterns on data streams without loading everything into memory. Perfect for protocol parsing, log processing, and any scenario where you need regex power on streaming data.