precursor 0.2.3

Pre-protocol payload tagging, similarity clustering, and packet/firmware triage CLI.
# Regex Acceleration and Offload

Last updated: February 14, 2026

## Goal

Increase pattern matching throughput for high-volume payload streams while keeping
Precursor output semantics unchanged.

## Practical options

### 1) CPU SIMD acceleration (recommended first)

- **Hyperscan** / **Vectorscan** provide high-throughput regex matching.
- Best fit as an optional prefilter or alternate regex engine for compatible patterns.
- Important caveat: these engines intentionally do not support full PCRE syntax
  (for example, backreferences and some advanced constructs).

### 2) NIC/DPU/GPU-style regex offload (longer-term)

- DPDK `rte_regexdev` provides an abstraction for hardware regex acceleration devices.
- This path is feasible but requires device-specific integration and deployment complexity.
- Some historic offload products have uncertain lifecycle; validate long-term vendor support
  before committing production architecture.

## Suggested implementation plan

1. Add a regex engine abstraction in code (`pcre2` default, accelerated engine optional).
   - Implemented scaffold: `--regex-engine pcre2|vectorscan`.
   - Current `vectorscan` mode emits compatibility diagnostics and executes through PCRE2 fallback path.
2. Start with a safe compatibility subset:
   - compile simple/wildcard/Sigma-generated patterns into accelerated engine
   - fallback to PCRE2 for unsupported patterns
3. Add CI benchmarks that compare:
   - `pcre2` baseline
   - accelerated mode
   - mixed compatibility fallback mode
4. Expose engine selection in CLI:
   - `--regex-engine pcre2|vectorscan`

## Fit with Precursor

This aligns well with pre-protocol triage workloads:

- broad, high-recall pattern sets
- high packet/log volume
- need for deterministic JSON output contracts

Similarity hashing and protocol inference stages can remain unchanged while regex
front-end throughput is improved.

## References

- Hyperscan developer reference (PCRE subset and unsupported constructs):
  - https://intel.github.io/hyperscan/dev-reference/compilation.html
- Vectorscan project README (portable Hyperscan fork and architecture support):
  - https://github.com/VectorCamp/vectorscan
- DPDK regex device API (hardware regex abstraction layer):
  - https://doc.dpdk.org/guides/prog_guide/regexdev.html
- NVIDIA BlueField DOCA RegEx lifecycle discussion:
  - https://forums.developer.nvidia.com/t/bluefield-and-regex-support/303845