base122-fast
Base122 is a binary-to-text encoding scheme designed to be significantly more space-efficient than Base64. It incurs only ~14% overhead (compared to Base64's 33%) while remaining valid UTF-8.
base122-fast is a high-performance implementation optimized for modern CPUs, featuring no_std support.
Key Features
- High Throughput: Optimized for Gbps-level processing.
- no_std: Suitable for embedded systems and WASM (requires
alloc). - SIMD Within A Register: Uses SWAR techniques to process multiple bytes in 64-bit registers.
- Safety: While leveraging
unsafefor performance, it is rigorously tested for round-trip integrity.
Performance
The following throughput was measured on high-entropy binary data using an AMD Ryzen 5 5600 (single core).
| Data Size | Encode (Throughput) | Decode (Throughput) |
|---|---|---|
| 16 B | 301.9 MiB/s | 335.9 MiB/s |
| 64 B | 883.6 MiB/s | 834.2 MiB/s |
| 1 KiB | 2.15 GiB/s | 1.63 GiB/s |
| 64 KiB | 2.40 GiB/s | 1.64 GiB/s |
| 1 MiB | 814.6 MiB/s | 678.9 MiB/s |
For large payloads (≥1 MiB), the implementation sustains approximately 6.5 Gbps encoding and 5.4 Gbps decoding.
Note: Run
cargo benchto reproduce these results. The benchmarks evaluate encoding, decoding, and round-trip integrity on random byte streams.
Quick Start
Add to Cargo.toml:
[]
= "0.1"
Usage
use ;
// Encoding
let data = b"hello world";
let encoded_str = encode;
// Decoding
let decoded_vec = decode.expect;
assert_eq!;
Implementation Details
Base122 maps binary data to a UTF-8 safe subset of 122 non-control bytes. Six ASCII codes (\x00, \x0A, \x0D, \x22, \x26, \x5C) are considered illegal and are handled via an escape mechanism.
Optimization Strategy
High throughput is achieved by processing 64-bit chunks. When a chunk contains no illegal bytes, a branchless fast-path is taken. By utilizing unsafe pointer arithmetic and pre-calculating output capacities, the hot loop avoids bounds checking and reallocations.
License
MIT License.