datacard-rs 1.0.0

Generic binary card format library with checksums and pluggable format traits
Documentation
# DataCard


**BytePunch-Compressed CML Documents (.card file format)**

DataCard represents the `.card` file format: CML documents compressed with BytePunch for efficient storage and transmission.

## Overview


DataCard bridges [CML (Content Markup Language)](https://github.com/manifest-humanity/content-markup-language) and [BytePunch compression](https://github.com/Blackfall-Labs/bytepunch-rs) into a single file format optimized for:

- **Storage**: 40-70% smaller than raw CML
- **Network**: Efficient over-the-wire transmission
- **Bundling**: Cards can be spooled into [DataSpool]https://github.com/Blackfall-Labs/dataspool-rs archives

## Workflow


```text
CML Document (XML)
  → BytePunch Compression (dictionary-based)
    → .card file (binary, optimized)
```

## Installation


Add to your `Cargo.toml`:

```toml
[dependencies]
datacard = "0.1.0"
bytepunch = "0.1.0"
sam-cml = "0.1.0"
```

## Quick Start


```rust
use datacard::Card;
use bytepunch::Dictionary;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Load dictionary (profile-specific)
    let dict = Dictionary::from_file("dictionaries/code-api.json")?;

    // Create card from CML
    let cml_xml = r#"<cml version="0.1">
        <metadata>
            <title>Example Document</title>
        </metadata>
        <section id="intro">
            <heading level="1">Introduction</heading>
            <para>Sample content</para>
        </section>
    </cml>"#;

    let card = Card::from_cml(cml_xml, &dict)?;

    // Save to file
    card.save("example.card")?;

    // Load from file
    let loaded = Card::load("example.card")?;

    // Decompress back to CML
    let decompressed_cml = loaded.to_cml(&dict)?;
    assert_eq!(cml_xml, decompressed_cml);

    println!("Original: {} bytes", cml_xml.len());
    println!("Compressed: {} bytes", card.size());
    println!("Compression ratio: {:.1}%",
             (1.0 - card.size() as f32 / cml_xml.len() as f32) * 100.0);

    Ok(())
}
```

## API Reference


### `Card`


Represents a BytePunch-compressed CML document.

```rust
pub struct Card {
    pub data: Vec<u8>,
}
```

#### Methods


##### `from_cml(cml_xml: &str, dictionary: &Dictionary) -> Result<Self>`


Create a card from CML XML string.

**Arguments:**
- `cml_xml` - CML document as XML string
- `dictionary` - BytePunch dictionary for compression

**Returns:** Compressed `Card`

**Example:**
```rust
let dict = Dictionary::from_file("code-api.json")?;
let card = Card::from_cml("<cml>...</cml>", &dict)?;
```

##### `to_cml(&self, dictionary: &Dictionary) -> Result<String>`


Decompress card back to CML XML.

**Arguments:**
- `dictionary` - BytePunch dictionary for decompression (must match compression dict)

**Returns:** Decompressed CML as XML string

**Example:**
```rust
let cml = card.to_cml(&dict)?;
```

##### `load<P: AsRef<Path>>(path: P) -> Result<Self>`


Load card from file.

**Arguments:**
- `path` - File path to `.card` file

**Returns:** Loaded `Card`

**Example:**
```rust
let card = Card::load("document.card")?;
```

##### `save<P: AsRef<Path>>(&self, path: P) -> Result<()>`


Save card to file.

**Arguments:**
- `path` - Output file path

**Example:**
```rust
card.save("document.card")?;
```

##### `size(&self) -> usize`


Get compressed size in bytes.

**Returns:** Size of compressed data

**Example:**
```rust
println!("Card size: {} bytes", card.size());
```

##### `from_bytes(data: Vec<u8>) -> Self`


Create card from raw compressed data.

**Arguments:**
- `data` - Pre-compressed BytePunch data

**Returns:** `Card` wrapping the data

**Example:**
```rust
let card = Card::from_bytes(compressed_bytes);
```

##### `as_bytes(&self) -> &[u8]`


Get raw compressed data.

**Returns:** Slice of compressed bytes

**Example:**
```rust
let bytes = card.as_bytes();
```

## File Format


DataCard files (`.card`) contain BytePunch-compressed CML:

```text
┌─────────────────────────────────────┐
│ Magic: "BP01" (4 bytes)             │  BytePunch format identifier
├─────────────────────────────────────┤
│ Version: 1 (1 byte)                 │  Format version
├─────────────────────────────────────┤
│ Profile Length: N (1 byte)          │  Length of profile name
├─────────────────────────────────────┤
│ Profile: "code:api" (N bytes)       │  Dictionary profile used
├─────────────────────────────────────┤
│ Compressed Content (variable)       │  Dictionary-compressed CML
└─────────────────────────────────────┘
```

## Use Cases


### 1. Rust Standard Library Documentation


Convert rustdoc JSON to CML cards:

```rust
use datacard::Card;
use bytepunch::Dictionary;
use sam_cml::CmlDocument;

// Load code-api dictionary
let dict = Dictionary::from_file("dictionaries/code-api.json")?;

// Parse rustdoc, convert to CML, compress to card
let rustdoc_json = std::fs::read_to_string("std.json")?;
let cml_doc = convert_rustdoc_to_cml(&rustdoc_json)?;
let cml_xml = cml_doc.to_xml()?;
let card = Card::from_cml(&cml_xml, &dict)?;

card.save("std.card")?;
```

Result: 3,309 Rust stdlib cards at 40-70% compression ratio.

### 2. Legal Document Archive


Compress legal documents with profile-specific dictionary:

```rust
let dict = Dictionary::from_file("dictionaries/legal.json")?;

for doc in legal_documents {
    let cml = doc.to_cml()?;
    let card = Card::from_cml(&cml, &dict)?;
    card.save(format!("legal/{}.card", doc.id))?;
}
```

### 3. Network Transmission


Send cards over the network:

```rust
use tokio::io::AsyncWriteExt;

let card = Card::load("document.card")?;
let bytes = card.as_bytes();

// Send with length prefix
stream.write_u32(bytes.len() as u32).await?;
stream.write_all(bytes).await?;
```

### 4. Bundle into DataSpools


Combine multiple cards into indexed spool:

```rust
use dataspool::{SpoolBuilder, PersistentVectorStore};

let mut spool = SpoolBuilder::new("archive.spool")?;
let mut db = PersistentVectorStore::new("archive.db")?;

for card_file in cards {
    let card = Card::load(&card_file)?;
    let entry = spool.add_card(card.as_bytes())?;

    // Store metadata and embeddings in .db
    db.add_document_ref(
        &doc_id,
        DocumentRef::Spool {
            spool_path: "archive.spool".into(),
            offset: entry.offset,
            length: entry.length,
        },
        &embedding,
    )?;
}

spool.finalize()?;
```

Result: Single `.spool` file + SQLite `.db` → bundled into `.engram`

## Compression Ratios


Typical compression ratios by content type:

| Content Type | Dictionary | Compression |
|--------------|------------|-------------|
| Rust API docs | `code:api` | 60-70% |
| Legal documents | `legal` | 40-50% |
| Bookstack wikis | `bookstack` | 50-60% |

## Integration


DataCard integrates with:

- **[CML]https://github.com/manifest-humanity/content-markup-language** - Source document format
- **[BytePunch]https://github.com/Blackfall-Labs/bytepunch-rs** - Compression algorithm
- **[DataSpool]https://github.com/Blackfall-Labs/dataspool-rs** - Bundling system
- **[Engram]https://github.com/manifest-humanity/engram** - Signed archives

## Performance


Benchmarks on Rust stdlib documentation (3,309 cards):

| Operation | Time | Throughput |
|-----------|------|------------|
| Compress CML → Card | ~2ms | 500 docs/sec |
| Decompress Card → CML | ~1ms | 1000 docs/sec |
| Save to file | <1ms | I/O bound |
| Load from file | <1ms | I/O bound |

## Error Handling


```rust
use datacard::{Card, CardError};

match Card::load("missing.card") {
    Ok(card) => println!("Loaded {} bytes", card.size()),
    Err(CardError::Io(e)) => eprintln!("File error: {}", e),
    Err(CardError::BytePunch(e)) => eprintln!("Decompression error: {}", e),
    Err(CardError::InvalidFormat) => eprintln!("Invalid card format"),
}
```

## Development


```bash
# Clone repo

git clone https://github.com/Blackfall-Labs/datacard-rs
cd datacard-rs

# Build

cargo build

# Run tests

cargo test

# Build release

cargo build --release
```

## History


DataCard was extracted from the [SAM project](https://github.com/Blackfall-Labs/sam) where it was used to compress 3,309 Rust standard library documentation cards for offline AI knowledge retrieval.

Original implementation: `crates/sam-core/examples/build_stdlib_engram_cards.rs`

## License


MIT - See [LICENSE](LICENSE) for details.

## Author


Magnus Trent <magnus@blackfall.dev>

## Links


- **GitHub:** https://github.com/Blackfall-Labs/datacard-rs
- **Docs:** https://docs.rs/datacard
- **Crates.io:** https://crates.io/crates/datacard
- **BytePunch:** https://github.com/Blackfall-Labs/bytepunch-rs
- **DataSpool:** https://github.com/Blackfall-Labs/dataspool-rs
- **CML:** https://github.com/manifest-humanity/content-markup-language