fast_whitespace_collapse
A high-performance Rust crate for collapsing consecutive spaces and tabs into a single space.
Uses SIMD (u8x16) via the wide crate for efficient processing.
Automatically falls back to a scalar implementation if SIMD is unavailable.
Features
- Collapses multiple spaces and tabs into a single space.
- Preserves newlines and non-whitespace characters.
- Uses SIMD (
u8x16) when supported to process 16 bytes at a time. - Falls back to a fast scalar implementation if SIMD is unavailable.
- Ensures valid UTF-8 output.
- SIMD requires AVX2, SSE2, or NEON instruction sets.
Installation
Add this to your Cargo.toml:
[]
= "0.1"
Or run the following command:
Controlling SIMD Support
By default, SIMD acceleration is enabled. You can control it via Cargo features:
πΉ Disable SIMD for Embedded Targets
πΉ Explicitly Enable SIMD
Usage
use collapse_whitespace;
let input = "This is \t a test.";
let output = collapse_whitespace;
assert_eq!;
Performance
- Processes text using SIMD (
u8x16), handling 16 bytes in parallel. - Falls back to scalar processing when SIMD is unavailable.
- Handles large inputs efficiently while maintaining valid UTF-8 output.
Benchmark Results
Comparison with Other Approaches
| Method | Time |
|---|---|
| Regex approach | 11.289 Β΅s |
| collapse crate | 1.2624 Β΅s |
| Iterative approach | 629.60 ns |
| Iterative bytes | 428.00 ns |
| fast_whitespace_collapse crate | 388.73 ns |
π fast_whitespace_collapse outperforms other methods, achieving the lowest execution time.
π Benchmark executed on Apple M1 Pro (NEON SIMD enabled).
πΉ Run Your Own Benchmark
Compatibility
fast_whitespace_collapse supports multiple architectures:
- x86_64: Uses SIMD (
SSE2,AVX2) for maximum performance. - ARM (aarch64, M1/M2/M3): Uses NEON SIMD.
- **Other: Falls back to a scalar implementation.
Examples
Basic Usage
use collapse_whitespace;
assert_eq!;
assert_eq!;
assert_eq!;
Unicode Support
assert_eq!; // Japanese
assert_eq!; // Chinese
assert_eq!; // Emojis
Handling Newlines
assert_eq!;
Tests
Run tests with:
License
This project is licensed under the MIT License.