Crate base_d

Crate base_d 

Source
Expand description

§base-d

A universal, multi-dictionary encoding library for Rust.

Encode binary data using numerous dictionaries including RFC standards, ancient scripts, emoji, playing cards, and more. Supports three encoding modes: radix (true base conversion), RFC 4648 chunked encoding, and direct byte-range mapping.

§Quick Start

use base_d::{DictionaryRegistry, Dictionary, encode, decode};

// Load built-in dictionaries
let config = DictionaryRegistry::load_default()?;
let base64_config = config.get_dictionary("base64").unwrap();

// Create dictionary
let chars: Vec<char> = base64_config.chars.chars().collect();
let padding = base64_config.padding.as_ref().and_then(|s| s.chars().next());
let mut builder = Dictionary::builder()
    .chars(chars)
    .mode(base64_config.effective_mode());
if let Some(p) = padding {
    builder = builder.padding(p);
}
let dictionary = builder.build()?;

// Encode and decode
let data = b"Hello, World!";
let encoded = encode(data, &dictionary);
let decoded = decode(&encoded, &dictionary)?;
assert_eq!(data, &decoded[..]);

§Features

  • 33 Built-in Dictionaries: RFC standards, emoji, ancient scripts, and more
  • 3 Encoding Modes: Radix, chunked (RFC-compliant), byte-range
  • Streaming Support: Memory-efficient processing for large files
  • Custom Dictionaries: Define your own via TOML configuration
  • User Configuration: Load dictionaries from ~/.config/base-d/dictionaries.toml
  • SIMD Acceleration: AVX2/SSSE3 on x86_64, NEON on aarch64 (enabled by default)

§Cargo Features

  • simd (default): Enable SIMD acceleration for encoding/decoding. Disable with --no-default-features for scalar-only builds.

§Encoding Modes

§Radix Base Conversion

True base conversion treating data as a large number. Works with any dictionary size.

use base_d::{Dictionary, EncodingMode, encode};

let chars: Vec<char> = "😀😁😂🤣😃😄😅😆".chars().collect();
let dictionary = Dictionary::builder()
    .chars(chars)
    .mode(EncodingMode::Radix)
    .build()?;

let encoded = encode(b"Hi", &dictionary);

§Chunked Mode (RFC 4648)

Fixed-size bit groups, compatible with standard base64/base32.

use base_d::{Dictionary, EncodingMode, encode};

let chars: Vec<char> = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
    .chars().collect();
let dictionary = Dictionary::builder()
    .chars(chars)
    .mode(EncodingMode::Chunked)
    .padding('=')
    .build()?;

let encoded = encode(b"Hello", &dictionary);
assert_eq!(encoded, "SGVsbG8=");

§Byte Range Mode

Direct 1:1 byte-to-emoji mapping. Zero encoding overhead.

use base_d::{Dictionary, EncodingMode, encode};

let dictionary = Dictionary::builder()
    .mode(EncodingMode::ByteRange)
    .start_codepoint(127991)  // U+1F3F7
    .build()?;

let data = b"Hi";
let encoded = encode(data, &dictionary);
assert_eq!(encoded.chars().count(), 2);  // 1:1 mapping

§Streaming

For large files, use streaming to avoid loading entire file into memory:

use base_d::{DictionaryRegistry, StreamingEncoder};
use std::fs::File;

let config = DictionaryRegistry::load_default()?;
let dictionary_config = config.get_dictionary("base64").unwrap();

// ... create dictionary from config

let mut input = File::open("large_file.bin")?;
let output = File::create("encoded.txt")?;

let mut encoder = StreamingEncoder::new(&dictionary, output);
encoder.encode(&mut input)?;

Re-exports§

pub use convenience::CompressEncodeResult;
pub use convenience::HashEncodeResult;
pub use convenience::compress_encode;
pub use convenience::compress_encode_with;
pub use convenience::hash_encode;
pub use convenience::hash_encode_with;

Modules§

bench
Benchmarking utilities for comparing encoding paths.
convenience
Convenience functions for common encoding patterns.
prelude
Convenient re-exports for common usage.
schema
Schema encoding types and traits for building custom frontends
word
Word-based encoding using radix conversion.
word_alternating
Alternating word-based encoding for PGP-style biometric word lists.
wordlists
Built-in word lists for word-based encoding.

Structs§

AlternatingWordDictionary
A word dictionary that alternates between multiple sub-dictionaries.
CompressionConfig
Configuration for a compression algorithm.
Dictionary
Represents an encoding dictionary with its characters and configuration.
DictionaryBuilder
Builder for constructing a Dictionary with flexible configuration.
DictionaryConfig
Configuration for a single dictionary loaded from TOML.
DictionaryDetector
Detector for automatically identifying which dictionary was used to encode data.
DictionaryMatch
A match result from dictionary detection.
DictionaryNotFoundError
Error when a dictionary is not found
DictionaryRegistry
Collection of dictionary configurations loaded from TOML files.
Settings
Global settings for base-d.
StreamingDecoder
Streaming decoder for processing large amounts of encoded data efficiently.
StreamingEncoder
Streaming encoder for processing large amounts of data efficiently.
WordDictionary
A word-based dictionary for encoding binary data as word sequences.
WordDictionaryBuilder
Builder for constructing a WordDictionary with flexible configuration.
XxHashConfig
Configuration for xxHash algorithms.

Enums§

CompressionAlgorithm
Supported compression algorithms.
DecodeError
Errors that can occur during decoding.
DetectedMode
Detected stele mode based on JSON structure
DictionaryType
Dictionary type: character-based or word-based.
EncodingMode
Encoding strategy for converting binary data to text.
HashAlgorithm
Supported hash algorithms.
SchemaCompressionAlgo
Compression algorithms for schema encoding

Functions§

compress
Compress data using the specified algorithm and level.
decode
Decodes a string back to binary data using the specified dictionary.
decode_schema
Decode schema format to JSON: framed → display96 → [decompress] → binary → IR → JSON
decode_stele
Decode stele format to JSON: stele → IR → JSON
decode_stele_path
Decode stele path mode to JSON
decompress
Decompress data using the specified algorithm.
detect_dictionary
Convenience function to detect dictionary from input.
detect_stele_mode
Auto-detect the best stele mode for the given JSON structure
encode
Encodes binary data using the specified dictionary.
encode_markdown_stele
Encode markdown document to stele format: markdown → IR → stele
encode_markdown_stele_ascii
Encode markdown document to ASCII inline stele format
encode_markdown_stele_light
Encode markdown to stele with field tokenization only (no value dictionary)
encode_markdown_stele_markdown
Encode markdown document to markdown-like inline stele format Uses #1-#6 for headers, -1/-2 for lists, preserves markdown syntax patterns
encode_markdown_stele_readable
Encode markdown to stele without tokenization (human-readable)
encode_schema
Encode JSON to schema format: JSON → IR → binary → [compress] → display96 → framed
encode_stele
Encode JSON to stele format: JSON → IR → stele
encode_stele_ascii
Encode JSON to ASCII inline stele format
encode_stele_light
Encode JSON to stele with field tokenization only (no value dictionary)
encode_stele_minified
encode_stele_path
Encode JSON to stele path mode (one line per leaf value)
encode_stele_readable
Encode JSON to stele without tokenization (human-readable field names)
find_closest_dictionary
Find the closest matching dictionary name
hash
Compute hash of data using the specified algorithm. Uses default configuration (seed = 0, no secret).
hash_with_config
Compute hash of data using the specified algorithm with custom configuration.