pub fn preprocess(
input: &[u8],
limits: &Limits,
) -> HedlResult<PreprocessedInput>Expand description
Preprocess raw input bytes into lines.
This handles:
- UTF-8 validation
- BOM skipping
- CRLF normalization
- Bare CR rejection
- Control character validation (SIMD-optimized)
- Size and line length limits
- Line boundary detection (SIMD-accelerated with memchr)
ยงPerformance
Uses SIMD-accelerated newline scanning via memchr for 4-20x
faster preprocessing on large files (> 1 MB).