Crate base_d

Crate base_d 

Source
Expand description

§base-d

A universal, multi-dictionary encoding library for Rust.

Encode binary data using numerous dictionaries including RFC standards, ancient scripts, emoji, playing cards, and more. Supports three encoding modes: radix (true base conversion), RFC 4648 chunked encoding, and direct byte-range mapping.

§Quick Start

use base_d::{DictionaryRegistry, Dictionary, encode, decode};

// Load built-in dictionaries
let config = DictionaryRegistry::load_default()?;
let base64_config = config.get_dictionary("base64").unwrap();

// Create dictionary
let chars: Vec<char> = base64_config.chars.chars().collect();
let padding = base64_config.padding.as_ref().and_then(|s| s.chars().next());
let mut builder = Dictionary::builder()
    .chars(chars)
    .mode(base64_config.effective_mode());
if let Some(p) = padding {
    builder = builder.padding(p);
}
let dictionary = builder.build()?;

// Encode and decode
let data = b"Hello, World!";
let encoded = encode(data, &dictionary);
let decoded = decode(&encoded, &dictionary)?;
assert_eq!(data, &decoded[..]);

§Features

  • 33 Built-in Dictionaries: RFC standards, emoji, ancient scripts, and more
  • 3 Encoding Modes: Radix, chunked (RFC-compliant), byte-range
  • Streaming Support: Memory-efficient processing for large files
  • Custom Dictionaries: Define your own via TOML configuration
  • User Configuration: Load dictionaries from ~/.config/base-d/dictionaries.toml
  • SIMD Acceleration: AVX2/SSSE3 on x86_64, NEON on aarch64 (enabled by default)

§Cargo Features

  • simd (default): Enable SIMD acceleration for encoding/decoding. Disable with --no-default-features for scalar-only builds.

§Encoding Modes

§Radix Base Conversion

True base conversion treating data as a large number. Works with any dictionary size.

use base_d::{Dictionary, EncodingMode, encode};

let chars: Vec<char> = "😀😁😂🤣😃😄😅😆".chars().collect();
let dictionary = Dictionary::builder()
    .chars(chars)
    .mode(EncodingMode::Radix)
    .build()?;

let encoded = encode(b"Hi", &dictionary);

§Chunked Mode (RFC 4648)

Fixed-size bit groups, compatible with standard base64/base32.

use base_d::{Dictionary, EncodingMode, encode};

let chars: Vec<char> = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
    .chars().collect();
let dictionary = Dictionary::builder()
    .chars(chars)
    .mode(EncodingMode::Chunked)
    .padding('=')
    .build()?;

let encoded = encode(b"Hello", &dictionary);
assert_eq!(encoded, "SGVsbG8=");

§Byte Range Mode

Direct 1:1 byte-to-emoji mapping. Zero encoding overhead.

use base_d::{Dictionary, EncodingMode, encode};

let dictionary = Dictionary::builder()
    .mode(EncodingMode::ByteRange)
    .start_codepoint(127991)  // U+1F3F7
    .build()?;

let data = b"Hi";
let encoded = encode(data, &dictionary);
assert_eq!(encoded.chars().count(), 2);  // 1:1 mapping

§Streaming

For large files, use streaming to avoid loading entire file into memory:

use base_d::{DictionaryRegistry, StreamingEncoder};
use std::fs::File;

let config = DictionaryRegistry::load_default()?;
let dictionary_config = config.get_dictionary("base64").unwrap();

// ... create dictionary from config

let mut input = File::open("large_file.bin")?;
let output = File::create("encoded.txt")?;

let mut encoder = StreamingEncoder::new(&dictionary, output);
encoder.encode(&mut input)?;

Re-exports§

pub use convenience::CompressEncodeResult;
pub use convenience::HashEncodeResult;
pub use convenience::compress_encode;
pub use convenience::compress_encode_with;
pub use convenience::hash_encode;
pub use convenience::hash_encode_with;

Modules§

bench
Benchmarking utilities for comparing encoding paths.
convenience
Convenience functions for common encoding patterns.
prelude
Convenient re-exports for common usage.
schema
Schema encoding types and traits for building custom frontends

Structs§

CompressionConfig
Configuration for a compression algorithm.
Dictionary
Represents an encoding dictionary with its characters and configuration.
DictionaryBuilder
Builder for constructing a Dictionary with flexible configuration.
DictionaryConfig
Configuration for a single dictionary loaded from TOML.
DictionaryDetector
Detector for automatically identifying which dictionary was used to encode data.
DictionaryMatch
A match result from dictionary detection.
DictionaryNotFoundError
Error when a dictionary is not found
DictionaryRegistry
Collection of dictionary configurations loaded from TOML files.
Settings
Global settings for base-d.
StreamingDecoder
Streaming decoder for processing large amounts of encoded data efficiently.
StreamingEncoder
Streaming encoder for processing large amounts of data efficiently.
XxHashConfig
Configuration for xxHash algorithms.

Enums§

CompressionAlgorithm
Supported compression algorithms.
DecodeError
Errors that can occur during decoding.
EncodingMode
Encoding strategy for converting binary data to text.
HashAlgorithm
Supported hash algorithms.
SchemaCompressionAlgo
Compression algorithms for schema encoding

Functions§

compress
Compress data using the specified algorithm and level.
decode
Decodes a string back to binary data using the specified dictionary.
decode_fiche
Decode fiche format to JSON: fiche → IR → JSON
decode_fiche_path
Decode fiche path mode to JSON
decode_schema
Decode schema format to JSON: framed → display96 → [decompress] → binary → IR → JSON
decompress
Decompress data using the specified algorithm.
detect_dictionary
Convenience function to detect dictionary from input.
encode
Encodes binary data using the specified dictionary.
encode_fiche
Encode JSON to fiche format: JSON → IR → fiche
encode_fiche_light
Encode JSON to fiche with field tokenization only (no value dictionary)
encode_fiche_minified
encode_fiche_path
Encode JSON to fiche path mode (one line per leaf value)
encode_fiche_readable
Encode JSON to fiche without tokenization (human-readable field names)
encode_schema
Encode JSON to schema format: JSON → IR → binary → [compress] → display96 → framed
find_closest_dictionary
Find the closest matching dictionary name
hash
Compute hash of data using the specified algorithm. Uses default configuration (seed = 0, no secret).
hash_with_config
Compute hash of data using the specified algorithm with custom configuration.