base122-fast
A high-performance Base122 implementation in Rust.
Base122 is a binary-to-text encoding scheme designed to be significantly more space-efficient than Base64. It incurs only ~14% overhead (compared to Base64's 33%) while remaining valid UTF-8.
Performance
This crate is engineered for maximum throughput leveraging several low-level optimizations:
- SWAR (SIMD Within A Register): Processes 64-bit words using bitwise masks to detect illegal characters across multiple bytes simultaneously, minimizing per-byte overhead.
- Branchless Fast-Paths – Efficiently bypasses escape-character logic for ASCII-compatible segments.
- Zero-Copy Strategy – Utilizes direct pointer arithmetic and pre-allocated buffers to minimize heap allocations.
- Unsafe Intrinsics – Leverages
unsafeRust for unchecked memory access and optimized bit manipulation.
Benchmarks
The following throughput was measured on uniform random binary data using an AMD Ryzen 5 5600 (single core).
| Data Size | Encode (MiB/s) | Decode (MiB/s) |
|---|---|---|
| 16 B | 331 | 318 |
| 64 B | 672 | 626 |
| 1 KiB | 1135 | 1070 |
| 64 KiB | 1089 | 871 |
| 1 MiB | 518 | 573 |
| 16 MiB | 533 | 597 |
For large payloads (≥1 MiB), the implementation sustains approximately 4.1 Gbps encoding and 4.8 Gbps decoding.
Note: Run
cargo benchto reproduce these results. The benchmarks evaluate encoding, decoding, and round-trip integrity on random byte streams.
Quick Start
Add to Cargo.toml:
[]
= "0.1"
Encode
use encode;
let data = b"hello world";
let encoded = encode;
println!;
Decode
use ;
let data = b"hello world";
let encoded = encode;
let decoded = decode.expect;
assert_eq!;
Implementation Details
Base122 maps binary data to a UTF-8 safe subset of 122 non-control bytes. Six ASCII codes (\x00, \x0A, \x0D, \x22, \x26, \x5C) are considered illegal and are handled via an escape mechanism.
Encoding Logic
- Read 7-byte input chunks.
- Split chunks into eight 7-bit groups.
- Write groups directly if they are "safe."
- If a group collides with an illegal byte, it triggers a two-byte escape sequence.
Optimization Strategy
High throughput is achieved by processing 64-bit chunks. When a chunk contains no illegal bytes, a branchless fast-path is taken. By utilizing unsafe pointer arithmetic and pre-calculating output capacities, the hot loop avoids bounds checking and reallocations.
License
This project is licensed under the MIT License.