base122-fast 0.1.0

High-performance Base122 encoding (4+ Gbps) with lower overhead (~14%) than Base64.
Documentation

base122-fast

Crates.io Documentation License: MIT

A high-performance Base122 implementation in Rust.

Base122 is a binary-to-text encoding scheme designed to be significantly more space-efficient than Base64. It incurs only ~14% overhead (compared to Base64's 33%) while remaining valid UTF-8.

Performance

This crate is engineered for maximum throughput leveraging several low-level optimizations:

  • SWAR (SIMD Within A Register): Processes 64-bit words using bitwise masks to detect illegal characters across multiple bytes simultaneously, minimizing per-byte overhead.
  • Branchless Fast-Paths – Efficiently bypasses escape-character logic for ASCII-compatible segments.
  • Zero-Copy Strategy – Utilizes direct pointer arithmetic and pre-allocated buffers to minimize heap allocations.
  • Unsafe Intrinsics – Leverages unsafe Rust for unchecked memory access and optimized bit manipulation.

Benchmarks

The following throughput was measured on uniform random binary data using an AMD Ryzen 5 5600 (single core).

Data Size Encode (MiB/s) Decode (MiB/s)
16 B 331 318
64 B 672 626
1 KiB 1135 1070
64 KiB 1089 871
1 MiB 518 573
16 MiB 533 597

For large payloads (≥1 MiB), the implementation sustains approximately 4.1 Gbps encoding and 4.8 Gbps decoding.

Note: Run cargo bench to reproduce these results. The benchmarks evaluate encoding, decoding, and round-trip integrity on random byte streams.

Quick Start

Add to Cargo.toml:

[dependencies]

base122-fast = "0.1"

Encode

use base122_fast::encode;

let data = b"hello world";
let encoded = encode(data);
println!("{}", encoded);

Decode

use base122_fast::{encode, decode};

let data = b"hello world";
let encoded = encode(data);
let decoded = decode(&encoded).expect("decoding failed");
assert_eq!(decoded, data);

Implementation Details

Base122 maps binary data to a UTF-8 safe subset of 122 non-control bytes. Six ASCII codes (\x00, \x0A, \x0D, \x22, \x26, \x5C) are considered illegal and are handled via an escape mechanism.

Encoding Logic

  1. Read 7-byte input chunks.
  2. Split chunks into eight 7-bit groups.
  3. Write groups directly if they are "safe."
  4. If a group collides with an illegal byte, it triggers a two-byte escape sequence.

Optimization Strategy

High throughput is achieved by processing 64-bit chunks. When a chunk contains no illegal bytes, a branchless fast-path is taken. By utilizing unsafe pointer arithmetic and pre-calculating output capacities, the hot loop avoids bounds checking and reallocations.

License

This project is licensed under the MIT License.