base-d

A universal, multi-dictionary encoding library and CLI tool for Rust. Encode binary data using numerous dictionaries including RFC standards, ancient scripts, emoji, playing cards, Matrix-style Japanese, and more.
Overview
base-d is a flexible encoding framework that goes far beyond traditional base64. It supports:
- Numerous built-in dictionaries - From RFC 4648 standards to hieroglyphics, emoji, Matrix-style base256, and a 1024-character CJK dictionary
- 3 encoding modes - Mathematical, chunked (RFC-compliant), and byte-range
- Auto-detection - Automatically identify which dictionary was used to encode data
- Compression support - Built-in gzip, zstd, brotli, lz4, snappy, and lzma compression with configurable levels
- Hashing support - 26 hash algorithms: cryptographic (SHA-256, BLAKE3, Ascon, etc.), CRC checksums, and xxHash including xxHash3 (pure Rust, no OpenSSL)
- Custom dictionaries - Define your own via TOML configuration
- Streaming support - Memory-efficient processing for large files
- Library + CLI - Use programmatically or from the command line
- High performance - Optimized with fast lookup tables and efficient memory allocation
- Special encodings - Matrix-style base256 that works like hex (1:1 byte mapping)
Key Features
Multiple Encoding Modes
- Mathematical Base Conversion - Treats data as a large number, works with any dictionary size
- Chunked Mode - RFC 4648 compatible (base64, base32, base16)
- Byte Range Mode - Direct 1:1 byte-to-emoji mapping (base100)
Performance
- SIMD-Accelerated - Runtime AVX2/SSSE3 (x86_64) and NEON (ARM) detection
- Specialized SIMD - Hardcoded lookup tables for RFC dictionaries (base64, base32, base16)
- LUT SIMD - Runtime lookup tables for arbitrary dictionaries
- ~500 MiB/s base64 encode, ~7.4 GiB/s base64 decode with specialized SIMD
- Streaming Mode - Process multi-GB files with constant 4KB memory usage
Extensive Dictionary Collection
- Standards: base64, base32, base16, base58 (Bitcoin), base85 (Git)
- Ancient Scripts: Egyptian hieroglyphics, Sumerian cuneiform, Elder Futhark runes
- Game Pieces: Playing cards, mahjong tiles, domino tiles, chess pieces
- Esoteric: Alchemical symbols, zodiac signs, weather symbols, musical notation
- Emoji: Face emoji, animal emoji, base100 (256 emoji range)
- Custom: Define your own dictionaries in TOML
Advanced Capabilities
- Streaming Mode - Process multi-GB files with constant 4KB memory usage
- Dictionary Detection - Automatically identify encoding format without prior knowledge
- Compression Pipeline - Compress before encoding with gzip, zstd, brotli, or lz4
- User Configuration - Load custom dictionaries from
~/.config/base-d/dictionaries.toml - Project-Local Config - Override dictionaries per-project with
./dictionaries.toml - Three Independent Algorithms - Choose the right mode for your use case
Quick Start
# Install (once published)
# Or build from source
# List all available dictionaries
# Encode with playing cards (default)
|
# RFC 4648 base32
|
# Bitcoin base58
|
# Egyptian hieroglyphics
|
# Emoji faces
|
# Matrix-style base256
|
# Enter the Matrix (live streaming random Matrix code)
# Auto-detect dictionary and decode
|
# Show top candidates with confidence scores
# Transcode between dictionaries (decode from one, encode to another)
|
|
# Compress and encode (supported: gzip, zstd, brotli, lz4, snappy, lzma)
|
|
|
# Compress with default encoding (base64)
|
# Decompress and decode
|
# Output raw compressed binary
|
# Process files
# Compress large files efficiently
# Hash files (supported: md5, sha256, sha512, blake3, ascon, k12, crc32, xxhash64, xxhash3, and more)
|
|
|
|
|
|
# Hash with custom seed
|
# Hash with secret (XXH3 only)
|
Installation
Usage
As a Library
Add to your Cargo.toml:
[]
= "0.1"
Basic Encoding/Decoding
use ;
Streaming for Large Files
use ;
use File;
Custom Dictionaries
use ;
Loading User Configurations
use DictionariesConfig;
// Load with user overrides from:
// 1. Built-in dictionaries
// 2. ~/.config/base-d/dictionaries.toml
// 3. ./dictionaries.toml
let config = load_with_overrides?;
// Or load from specific file
let config = load_from_file?;
As a CLI Tool
Encode and decode data using any dictionary defined in dictionaries.toml:
# List available dictionaries
# Encode from stdin (default dictionary is "cards")
|
# Encode a file
# Encode with specific dictionary
|
# Decode from specific dictionary
|
# Decode playing cards
|
# Transcode between dictionaries (no intermediate piping needed!)
|
# Output: 48656c6c6f
# Convert between any two dictionaries
|
|
# Stream mode for large files (memory efficient)
Custom Dictionaries
Add your own dictionaries to dictionaries.toml:
[]
# Your custom 16-character dictionary
= "ππππ€£πππ
ππππππππ₯°π"
# Chess pieces (12 characters)
= "ββββββββββββ"
Or create custom dictionaries in ~/.config/base-d/dictionaries.toml to use across all projects. See Custom Dictionaries Guide for details.
Built-in Dictionaries
base-d includes 35 pre-configured dictionaries organized into several categories:
- RFC 4648 Standards: base16, base32, base32hex, base64, base64url
- Bitcoin & Blockchain: base58, base58flickr
- High-Density Encodings: base62, base85, ascii85, z85, base256_matrix (Matrix-style), base1024
- Human-Oriented: base32_crockford, base32_zbase
- Ancient Scripts: hieroglyphs, cuneiform, runic
- Game Pieces: cards, domino, mahjong, chess
- Esoteric Symbols: alchemy, zodiac, weather, music, arrows
- Emoji: emoji_faces, emoji_animals, base100
- Other: dna, binary, hex, base64_math, hex_math
Run base-d --list to see all available dictionaries with their encoding modes.
For a complete reference with examples and use cases, see DICTIONARIES.md.
How It Works
base-d supports three encoding algorithms:
-
Mathematical Base Conversion (default) - Treats binary data as a single large number and converts it to the target base. Works with any dictionary size.
-
Bit-Chunking - Groups bits into fixed-size chunks for RFC 4648 compatibility (base64, base32, base16).
-
Byte Range - Direct 1:1 byte-to-character mapping using a Unicode range (like base100). Each byte maps to a specific emoji with zero encoding overhead.
For a detailed explanation of all modes with examples, see ENCODING_MODES.md.
License
MIT OR Apache-2.0
Documentation
Core Concepts
- Dictionary Reference - Complete guide to all built-in dictionaries
- Custom Dictionaries - Create and load your own dictionaries
- Encoding Modes - Mathematical vs chunked vs byte range encoding
- Base1024 - High-density CJK encoding
Features
- Hashing - 24 hash algorithms (SHA, BLAKE, CRC, xxHash)
- Compression - gzip, zstd, brotli, lz4, snappy, lzma support
- Detection - Auto-detect encoding format
- Streaming - Memory-efficient processing for large files
Performance
- SIMD Optimizations - AVX2/SSSE3/NEON acceleration
- Benchmarking - Running and interpreting benchmarks
- Performance Guide - Benchmarks and optimization tips
Matrix Mode
- Matrix Mode - Live Matrix-style visualization
- Neo Mode -
--neoflag deep dive
Reference
- API Reference - Library API documentation
- Hexadecimal Explained - Why hex is special
- Roadmap - Planned features and development phases
- CI/CD Setup - GitHub Actions workflow documentation
Contributing
Contributions are welcome! Please see ROADMAP.md for planned features.