Crate base_d

Expand description

§base-d

A universal, multi-dictionary encoding library for Rust.

Encode binary data using numerous dictionaries including RFC standards, ancient scripts, emoji, playing cards, and more. Supports three encoding modes: radix (true base conversion), RFC 4648 chunked encoding, and direct byte-range mapping.

§Quick Start

use base_d::{DictionaryRegistry, Dictionary, encode, decode};

// Load built-in dictionaries
let config = DictionaryRegistry::load_default()?;
let base64_config = config.get_dictionary("base64").unwrap();

// Create dictionary
let chars: Vec<char> = base64_config.chars.chars().collect();
let padding = base64_config.padding.as_ref().and_then(|s| s.chars().next());
let mut builder = Dictionary::builder()
    .chars(chars)
    .mode(base64_config.effective_mode());
if let Some(p) = padding {
    builder = builder.padding(p);
}
let dictionary = builder.build()?;

// Encode and decode
let data = b"Hello, World!";
let encoded = encode(data, &dictionary);
let decoded = decode(&encoded, &dictionary)?;
assert_eq!(data, &decoded[..]);

§Features

33 Built-in Dictionaries: RFC standards, emoji, ancient scripts, and more
3 Encoding Modes: Radix, chunked (RFC-compliant), byte-range
Streaming Support: Memory-efficient processing for large files
Custom Dictionaries: Define your own via TOML configuration
User Configuration: Load dictionaries from ~/.config/base-d/dictionaries.toml
SIMD Acceleration: AVX2/SSSE3 on x86_64, NEON on aarch64 (enabled by default)

§Cargo Features

simd (default): Enable SIMD acceleration for encoding/decoding. Disable with --no-default-features for scalar-only builds.

§Encoding Modes

§Radix Base Conversion

True base conversion treating data as a large number. Works with any dictionary size.

use base_d::{Dictionary, EncodingMode, encode};

let chars: Vec<char> = "😀😁😂🤣😃😄😅😆".chars().collect();
let dictionary = Dictionary::builder()
    .chars(chars)
    .mode(EncodingMode::Radix)
    .build()?;

let encoded = encode(b"Hi", &dictionary);

§Chunked Mode (RFC 4648)

Fixed-size bit groups, compatible with standard base64/base32.

use base_d::{Dictionary, EncodingMode, encode};

let chars: Vec<char> = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
    .chars().collect();
let dictionary = Dictionary::builder()
    .chars(chars)
    .mode(EncodingMode::Chunked)
    .padding('=')
    .build()?;

let encoded = encode(b"Hello", &dictionary);
assert_eq!(encoded, "SGVsbG8=");

§Byte Range Mode

Direct 1:1 byte-to-emoji mapping. Zero encoding overhead.

use base_d::{Dictionary, EncodingMode, encode};

let dictionary = Dictionary::builder()
    .mode(EncodingMode::ByteRange)
    .start_codepoint(127991)  // U+1F3F7
    .build()?;

let data = b"Hi";
let encoded = encode(data, &dictionary);
assert_eq!(encoded.chars().count(), 2);  // 1:1 mapping

§Streaming

For large files, use streaming to avoid loading entire file into memory:

use base_d::{DictionaryRegistry, StreamingEncoder};
use std::fs::File;

let config = DictionaryRegistry::load_default()?;
let dictionary_config = config.get_dictionary("base64").unwrap();

// ... create dictionary from config

let mut input = File::open("large_file.bin")?;
let output = File::create("encoded.txt")?;

let mut encoder = StreamingEncoder::new(&dictionary, output);
encoder.encode(&mut input)?;

Re-exports§

pub use convenience::CompressEncodeResult;
pub use convenience::HashEncodeResult;
pub use convenience::compress_encode;
pub use convenience::compress_encode_with;
pub use convenience::hash_encode;
pub use convenience::hash_encode_with;

Modules§

bench: Benchmarking utilities for comparing encoding paths.
convenience: Convenience functions for common encoding patterns.
prelude: Convenient re-exports for common usage.
schema: Schema encoding types and traits for building custom frontends
word: Word-based encoding using radix conversion.
word_alternating: Alternating word-based encoding for PGP-style biometric word lists.
wordlists: Built-in word lists for word-based encoding.

Structs§

AlternatingWordDictionary: A word dictionary that alternates between multiple sub-dictionaries.
CompressionConfig: Configuration for a compression algorithm.
Dictionary: Represents an encoding dictionary with its characters and configuration.
DictionaryBuilder: Builder for constructing a Dictionary with flexible configuration.
DictionaryConfig: Configuration for a single dictionary loaded from TOML.
DictionaryDetector: Detector for automatically identifying which dictionary was used to encode data.
DictionaryMatch: A match result from dictionary detection.
DictionaryNotFoundError: Error when a dictionary is not found
DictionaryRegistry: Collection of dictionary configurations loaded from TOML files.
Settings: Global settings for base-d.
StreamingDecoder: Streaming decoder for processing large amounts of encoded data efficiently.
StreamingEncoder: Streaming encoder for processing large amounts of data efficiently.
WordDictionary: A word-based dictionary for encoding binary data as word sequences.
WordDictionaryBuilder: Builder for constructing a WordDictionary with flexible configuration.
XxHashConfig: Configuration for xxHash algorithms.

Enums§

CompressionAlgorithm: Supported compression algorithms.
DecodeError: Errors that can occur during decoding.
DetectedMode: Detected stele mode based on JSON structure
DictionaryType: Dictionary type: character-based or word-based.
EncodingMode: Encoding strategy for converting binary data to text.
HashAlgorithm: Supported hash algorithms.
SchemaCompressionAlgo: Compression algorithms for schema encoding

Functions§

compress: Compress data using the specified algorithm and level.
decode: Decodes a string back to binary data using the specified dictionary.
decode_schema: Decode schema format to JSON: framed → display96 → [decompress] → binary → IR → JSON
decode_stele: Decode stele format to JSON: stele → IR → JSON
decode_stele_path: Decode stele path mode to JSON
decompress: Decompress data using the specified algorithm.
detect_dictionary: Convenience function to detect dictionary from input.
detect_stele_mode: Auto-detect the best stele mode for the given JSON structure
encode: Encodes binary data using the specified dictionary.
encode_markdown_stele: Encode markdown document to stele format: markdown → IR → stele
encode_markdown_stele_ascii: Encode markdown document to ASCII inline stele format
encode_markdown_stele_light: Encode markdown to stele with field tokenization only (no value dictionary)
encode_markdown_stele_markdown: Encode markdown document to markdown-like inline stele format Uses #1-#6 for headers, -1/-2 for lists, preserves markdown syntax patterns
encode_markdown_stele_readable: Encode markdown to stele without tokenization (human-readable)
encode_schema: Encode JSON to schema format: JSON → IR → binary → [compress] → display96 → framed
encode_stele: Encode JSON to stele format: JSON → IR → stele
encode_stele_ascii: Encode JSON to ASCII inline stele format
encode_stele_light: Encode JSON to stele with field tokenization only (no value dictionary)
encode_stele_minified
encode_stele_path: Encode JSON to stele path mode (one line per leaf value)
encode_stele_readable: Encode JSON to stele without tokenization (human-readable field names)
find_closest_dictionary: Find the closest matching dictionary name
hash: Compute hash of data using the specified algorithm. Uses default configuration (seed = 0, no secret).
hash_with_config: Compute hash of data using the specified algorithm with custom configuration.

Crate base_d

Crate base_d Copy item path

§base-d

§Quick Start

§Features

§Cargo Features

§Encoding Modes

§Radix Base Conversion

§Chunked Mode (RFC 4648)

§Byte Range Mode

§Streaming

Re-exports§

Modules§

Structs§

Enums§

Functions§

Crate base_d