Expand description
§base-d
A universal, multi-dictionary encoding library for Rust.
Encode binary data using numerous dictionaries including RFC standards, ancient scripts, emoji, playing cards, and more. Supports three encoding modes: radix (true base conversion), RFC 4648 chunked encoding, and direct byte-range mapping.
§Quick Start
use base_d::{DictionaryRegistry, Dictionary, encode, decode};
// Load built-in dictionaries
let config = DictionaryRegistry::load_default()?;
let base64_config = config.get_dictionary("base64").unwrap();
// Create dictionary
let chars: Vec<char> = base64_config.chars.chars().collect();
let padding = base64_config.padding.as_ref().and_then(|s| s.chars().next());
let mut builder = Dictionary::builder()
.chars(chars)
.mode(base64_config.effective_mode());
if let Some(p) = padding {
builder = builder.padding(p);
}
let dictionary = builder.build()?;
// Encode and decode
let data = b"Hello, World!";
let encoded = encode(data, &dictionary);
let decoded = decode(&encoded, &dictionary)?;
assert_eq!(data, &decoded[..]);§Features
- 33 Built-in Dictionaries: RFC standards, emoji, ancient scripts, and more
- 3 Encoding Modes: Radix, chunked (RFC-compliant), byte-range
- Streaming Support: Memory-efficient processing for large files
- Custom Dictionaries: Define your own via TOML configuration
- User Configuration: Load dictionaries from
~/.config/base-d/dictionaries.toml - SIMD Acceleration: AVX2/SSSE3 on x86_64, NEON on aarch64 (enabled by default)
§Cargo Features
simd(default): Enable SIMD acceleration for encoding/decoding. Disable with--no-default-featuresfor scalar-only builds.
§Encoding Modes
§Radix Base Conversion
True base conversion treating data as a large number. Works with any dictionary size.
use base_d::{Dictionary, EncodingMode, encode};
let chars: Vec<char> = "😀😁😂🤣😃😄😅😆".chars().collect();
let dictionary = Dictionary::builder()
.chars(chars)
.mode(EncodingMode::Radix)
.build()?;
let encoded = encode(b"Hi", &dictionary);§Chunked Mode (RFC 4648)
Fixed-size bit groups, compatible with standard base64/base32.
use base_d::{Dictionary, EncodingMode, encode};
let chars: Vec<char> = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
.chars().collect();
let dictionary = Dictionary::builder()
.chars(chars)
.mode(EncodingMode::Chunked)
.padding('=')
.build()?;
let encoded = encode(b"Hello", &dictionary);
assert_eq!(encoded, "SGVsbG8=");§Byte Range Mode
Direct 1:1 byte-to-emoji mapping. Zero encoding overhead.
use base_d::{Dictionary, EncodingMode, encode};
let dictionary = Dictionary::builder()
.mode(EncodingMode::ByteRange)
.start_codepoint(127991) // U+1F3F7
.build()?;
let data = b"Hi";
let encoded = encode(data, &dictionary);
assert_eq!(encoded.chars().count(), 2); // 1:1 mapping§Streaming
For large files, use streaming to avoid loading entire file into memory:
use base_d::{DictionaryRegistry, StreamingEncoder};
use std::fs::File;
let config = DictionaryRegistry::load_default()?;
let dictionary_config = config.get_dictionary("base64").unwrap();
// ... create dictionary from config
let mut input = File::open("large_file.bin")?;
let output = File::create("encoded.txt")?;
let mut encoder = StreamingEncoder::new(&dictionary, output);
encoder.encode(&mut input)?;Re-exports§
pub use convenience::CompressEncodeResult;pub use convenience::HashEncodeResult;pub use convenience::compress_encode;pub use convenience::compress_encode_with;pub use convenience::hash_encode;pub use convenience::hash_encode_with;
Modules§
- bench
- Benchmarking utilities for comparing encoding paths.
- convenience
- Convenience functions for common encoding patterns.
- prelude
- Convenient re-exports for common usage.
- schema
- Schema encoding types and traits for building custom frontends
- word
- Word-based encoding using radix conversion.
- word_
alternating - Alternating word-based encoding for PGP-style biometric word lists.
- wordlists
- Built-in word lists for word-based encoding.
Structs§
- Alternating
Word Dictionary - A word dictionary that alternates between multiple sub-dictionaries.
- Compression
Config - Configuration for a compression algorithm.
- Dictionary
- Represents an encoding dictionary with its characters and configuration.
- Dictionary
Builder - Builder for constructing a Dictionary with flexible configuration.
- Dictionary
Config - Configuration for a single dictionary loaded from TOML.
- Dictionary
Detector - Detector for automatically identifying which dictionary was used to encode data.
- Dictionary
Match - A match result from dictionary detection.
- Dictionary
NotFound Error - Error when a dictionary is not found
- Dictionary
Registry - Collection of dictionary configurations loaded from TOML files.
- Settings
- Global settings for base-d.
- Streaming
Decoder - Streaming decoder for processing large amounts of encoded data efficiently.
- Streaming
Encoder - Streaming encoder for processing large amounts of data efficiently.
- Word
Dictionary - A word-based dictionary for encoding binary data as word sequences.
- Word
Dictionary Builder - Builder for constructing a WordDictionary with flexible configuration.
- XxHash
Config - Configuration for xxHash algorithms.
Enums§
- Compression
Algorithm - Supported compression algorithms.
- Decode
Error - Errors that can occur during decoding.
- Detected
Mode - Detected stele mode based on JSON structure
- Dictionary
Type - Dictionary type: character-based or word-based.
- Encoding
Mode - Encoding strategy for converting binary data to text.
- Hash
Algorithm - Supported hash algorithms.
- Schema
Compression Algo - Compression algorithms for schema encoding
Functions§
- compress
- Compress data using the specified algorithm and level.
- decode
- Decodes a string back to binary data using the specified dictionary.
- decode_
schema - Decode schema format to JSON: framed → display96 → [decompress] → binary → IR → JSON
- decode_
stele - Decode stele format to JSON: stele → IR → JSON
- decode_
stele_ path - Decode stele path mode to JSON
- decompress
- Decompress data using the specified algorithm.
- detect_
dictionary - Convenience function to detect dictionary from input.
- detect_
stele_ mode - Auto-detect the best stele mode for the given JSON structure
- encode
- Encodes binary data using the specified dictionary.
- encode_
markdown_ stele - Encode markdown document to stele format: markdown → IR → stele
- encode_
markdown_ stele_ ascii - Encode markdown document to ASCII inline stele format
- encode_
markdown_ stele_ light - Encode markdown to stele with field tokenization only (no value dictionary)
- encode_
markdown_ stele_ markdown - Encode markdown document to markdown-like inline stele format Uses #1-#6 for headers, -1/-2 for lists, preserves markdown syntax patterns
- encode_
markdown_ stele_ readable - Encode markdown to stele without tokenization (human-readable)
- encode_
schema - Encode JSON to schema format: JSON → IR → binary → [compress] → display96 → framed
- encode_
stele - Encode JSON to stele format: JSON → IR → stele
- encode_
stele_ ascii - Encode JSON to ASCII inline stele format
- encode_
stele_ light - Encode JSON to stele with field tokenization only (no value dictionary)
- encode_
stele_ minified - encode_
stele_ path - Encode JSON to stele path mode (one line per leaf value)
- encode_
stele_ readable - Encode JSON to stele without tokenization (human-readable field names)
- find_
closest_ dictionary - Find the closest matching dictionary name
- hash
- Compute hash of data using the specified algorithm. Uses default configuration (seed = 0, no secret).
- hash_
with_ config - Compute hash of data using the specified algorithm with custom configuration.