hyperscan-tokio
Blazing-fast multi-pattern regex matching for Rust, bringing the power of Vectorscan (Intel Hyperscan's modern fork) to the async ecosystem.
Why hyperscan-tokio?
- 🚀 Extreme Performance: Scan gigabytes per second using SIMD acceleration
- 🔍 Multi-Pattern Matching: Compile thousands of patterns into a single automaton
- ⚡ Async-First: Built for Tokio from the ground up
- 🎯 Zero-Copy: Minimal allocations with
Bytesintegration - 🔄 Hot Reloading: Swap pattern databases without downtime
- 🛡️ Production Ready: Battle-tested VectorScan engine with safe Rust API
Performance
scanning_throughput/1024 time: [2.1 µs 2.2 µs 2.3 µs]
thrpt: [434 MiB/s 455 MiB/s 476 MiB/s]
scanning_throughput/1048576 time: [45.2 µs 45.8 µs 46.4 µs]
thrpt: [21.5 GiB/s 21.9 GiB/s 22.1 GiB/s]
vs_alternatives/hyperscan_tokio time: [12.3 µs 12.5 µs 12.7 µs]
vs_alternatives/rust_regex time: [891.2 µs 895.7 µs 900.1 µs]
71.6x faster than regex crate for multi-pattern matching
Quick Start
use *;
use Arc;
async
Chimera Support (PCRE with Capture Groups)
hyperscan-tokio offers two ways to use PCRE-compatible patterns with capture groups:
Option 1: Rust-native implementation (Recommended)
Uses the regex crate internally, no special system dependencies:
= { = "0.1", = ["chimera"] }
Option 2: FFI-based Chimera
Requires VectorScan built with Chimera support:
# Build from source automatically
= { = "0.1", = ["vendored", "chimera-ffi"] }
# Or use system VectorScan with Chimera (requires manual build)
= { = "0.1", = ["system", "chimera-ffi"] }
⚠️ Important: Standard Homebrew/package manager installations of VectorScan do NOT include Chimera. See CHIMERA_SETUP.md for detailed setup instructions.
Advanced Features
Worker Pool for Maximum Throughput
let pool = builder
.num_workers
.core_affinity // Pin workers to CPU cores
.build?;
// Process millions of log lines
let jobs: = log_lines.into_iter
.map
.collect;
let results = pool.scan_batch.await?;
Hot-Reloadable Patterns
let reloadable = new;
// In another task, reload patterns without stopping scanning
spawn;
Streaming Mode
let stream_scanner = new?;
// Scan data as it arrives
let match_stream = stream_scanner
.scan_stream
.await?;
while let Some = match_stream.next.await
Use Cases
- Log Analysis: Scan millions of log entries per second for security patterns
- Data Loss Prevention: Find sensitive data in real-time streams
- Web Application Firewall: Match attack patterns at line speed
- Content Filtering: High-performance content moderation
- Network Security: Deep packet inspection at 10Gbps+
Building from Source
Requires VectorScan development files:
# Ubuntu/Debian
# macOS
# Build
Architecture
hyperscan-tokio
├── hyperscan-tokio-sys/ # Low-level FFI bindings
│ └── Safe wrappers around VectorScan C API
└── src/ # High-level async Rust API
├── scanner.rs # Core scanning functionality
├── database.rs # Pattern compilation
├── worker_pool.rs # Parallel scanning
└── stream.rs # Streaming mode
Benchmarks
Run the benchmarks:
Contributing
Contributions are welcome! Please read our Contributing Guide for details.
License
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Acknowledgments
- VectorScan - The underlying regex engine
- Intel Hyperscan - The original project
- The Rust community for excellent async ecosystem
Built with ❤️ for the Rust community. Making enterprise-grade regex performance accessible to everyone.