datacard-rs 1.0.0

Generic binary card format library with checksums and pluggable format traits
Documentation

DataCard

BytePunch-Compressed CML Documents (.card file format)

DataCard represents the .card file format: CML documents compressed with BytePunch for efficient storage and transmission.

Overview

DataCard bridges CML (Content Markup Language) and BytePunch compression into a single file format optimized for:

  • Storage: 40-70% smaller than raw CML
  • Network: Efficient over-the-wire transmission
  • Bundling: Cards can be spooled into DataSpool archives

Workflow

CML Document (XML)
  → BytePunch Compression (dictionary-based)
    → .card file (binary, optimized)

Installation

Add to your Cargo.toml:

[dependencies]

datacard = "0.1.0"

bytepunch = "0.1.0"

sam-cml = "0.1.0"

Quick Start

use datacard::Card;
use bytepunch::Dictionary;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Load dictionary (profile-specific)
    let dict = Dictionary::from_file("dictionaries/code-api.json")?;

    // Create card from CML
    let cml_xml = r#"<cml version="0.1">
        <metadata>
            <title>Example Document</title>
        </metadata>
        <section id="intro">
            <heading level="1">Introduction</heading>
            <para>Sample content</para>
        </section>
    </cml>"#;

    let card = Card::from_cml(cml_xml, &dict)?;

    // Save to file
    card.save("example.card")?;

    // Load from file
    let loaded = Card::load("example.card")?;

    // Decompress back to CML
    let decompressed_cml = loaded.to_cml(&dict)?;
    assert_eq!(cml_xml, decompressed_cml);

    println!("Original: {} bytes", cml_xml.len());
    println!("Compressed: {} bytes", card.size());
    println!("Compression ratio: {:.1}%",
             (1.0 - card.size() as f32 / cml_xml.len() as f32) * 100.0);

    Ok(())
}

API Reference

Card

Represents a BytePunch-compressed CML document.

pub struct Card {
    pub data: Vec<u8>,
}

Methods

from_cml(cml_xml: &str, dictionary: &Dictionary) -> Result<Self>

Create a card from CML XML string.

Arguments:

  • cml_xml - CML document as XML string
  • dictionary - BytePunch dictionary for compression

Returns: Compressed Card

Example:

let dict = Dictionary::from_file("code-api.json")?;
let card = Card::from_cml("<cml>...</cml>", &dict)?;
to_cml(&self, dictionary: &Dictionary) -> Result<String>

Decompress card back to CML XML.

Arguments:

  • dictionary - BytePunch dictionary for decompression (must match compression dict)

Returns: Decompressed CML as XML string

Example:

let cml = card.to_cml(&dict)?;
load<P: AsRef<Path>>(path: P) -> Result<Self>

Load card from file.

Arguments:

  • path - File path to .card file

Returns: Loaded Card

Example:

let card = Card::load("document.card")?;
save<P: AsRef<Path>>(&self, path: P) -> Result<()>

Save card to file.

Arguments:

  • path - Output file path

Example:

card.save("document.card")?;
size(&self) -> usize

Get compressed size in bytes.

Returns: Size of compressed data

Example:

println!("Card size: {} bytes", card.size());
from_bytes(data: Vec<u8>) -> Self

Create card from raw compressed data.

Arguments:

  • data - Pre-compressed BytePunch data

Returns: Card wrapping the data

Example:

let card = Card::from_bytes(compressed_bytes);
as_bytes(&self) -> &[u8]

Get raw compressed data.

Returns: Slice of compressed bytes

Example:

let bytes = card.as_bytes();

File Format

DataCard files (.card) contain BytePunch-compressed CML:

┌─────────────────────────────────────┐
│ Magic: "BP01" (4 bytes)             │  BytePunch format identifier
├─────────────────────────────────────┤
│ Version: 1 (1 byte)                 │  Format version
├─────────────────────────────────────┤
│ Profile Length: N (1 byte)          │  Length of profile name
├─────────────────────────────────────┤
│ Profile: "code:api" (N bytes)       │  Dictionary profile used
├─────────────────────────────────────┤
│ Compressed Content (variable)       │  Dictionary-compressed CML
└─────────────────────────────────────┘

Use Cases

1. Rust Standard Library Documentation

Convert rustdoc JSON to CML cards:

use datacard::Card;
use bytepunch::Dictionary;
use sam_cml::CmlDocument;

// Load code-api dictionary
let dict = Dictionary::from_file("dictionaries/code-api.json")?;

// Parse rustdoc, convert to CML, compress to card
let rustdoc_json = std::fs::read_to_string("std.json")?;
let cml_doc = convert_rustdoc_to_cml(&rustdoc_json)?;
let cml_xml = cml_doc.to_xml()?;
let card = Card::from_cml(&cml_xml, &dict)?;

card.save("std.card")?;

Result: 3,309 Rust stdlib cards at 40-70% compression ratio.

2. Legal Document Archive

Compress legal documents with profile-specific dictionary:

let dict = Dictionary::from_file("dictionaries/legal.json")?;

for doc in legal_documents {
    let cml = doc.to_cml()?;
    let card = Card::from_cml(&cml, &dict)?;
    card.save(format!("legal/{}.card", doc.id))?;
}

3. Network Transmission

Send cards over the network:

use tokio::io::AsyncWriteExt;

let card = Card::load("document.card")?;
let bytes = card.as_bytes();

// Send with length prefix
stream.write_u32(bytes.len() as u32).await?;
stream.write_all(bytes).await?;

4. Bundle into DataSpools

Combine multiple cards into indexed spool:

use dataspool::{SpoolBuilder, PersistentVectorStore};

let mut spool = SpoolBuilder::new("archive.spool")?;
let mut db = PersistentVectorStore::new("archive.db")?;

for card_file in cards {
    let card = Card::load(&card_file)?;
    let entry = spool.add_card(card.as_bytes())?;

    // Store metadata and embeddings in .db
    db.add_document_ref(
        &doc_id,
        DocumentRef::Spool {
            spool_path: "archive.spool".into(),
            offset: entry.offset,
            length: entry.length,
        },
        &embedding,
    )?;
}

spool.finalize()?;

Result: Single .spool file + SQLite .db → bundled into .engram

Compression Ratios

Typical compression ratios by content type:

Content Type Dictionary Compression
Rust API docs code:api 60-70%
Legal documents legal 40-50%
Bookstack wikis bookstack 50-60%

Integration

DataCard integrates with:

Performance

Benchmarks on Rust stdlib documentation (3,309 cards):

Operation Time Throughput
Compress CML → Card ~2ms 500 docs/sec
Decompress Card → CML ~1ms 1000 docs/sec
Save to file <1ms I/O bound
Load from file <1ms I/O bound

Error Handling

use datacard::{Card, CardError};

match Card::load("missing.card") {
    Ok(card) => println!("Loaded {} bytes", card.size()),
    Err(CardError::Io(e)) => eprintln!("File error: {}", e),
    Err(CardError::BytePunch(e)) => eprintln!("Decompression error: {}", e),
    Err(CardError::InvalidFormat) => eprintln!("Invalid card format"),
}

Development

# Clone repo

git clone https://github.com/Blackfall-Labs/datacard-rs

cd datacard-rs


# Build

cargo build


# Run tests

cargo test


# Build release

cargo build --release

History

DataCard was extracted from the SAM project where it was used to compress 3,309 Rust standard library documentation cards for offline AI knowledge retrieval.

Original implementation: crates/sam-core/examples/build_stdlib_engram_cards.rs

License

MIT - See LICENSE for details.

Author

Magnus Trent magnus@blackfall.dev

Links