# DataCard
**BytePunch-Compressed CML Documents (.card file format)**
DataCard represents the `.card` file format: CML documents compressed with BytePunch for efficient storage and transmission.
## Overview
DataCard bridges [CML (Content Markup Language)](https://github.com/manifest-humanity/content-markup-language) and [BytePunch compression](https://github.com/Blackfall-Labs/bytepunch-rs) into a single file format optimized for:
- **Storage**: 40-70% smaller than raw CML
- **Network**: Efficient over-the-wire transmission
- **Bundling**: Cards can be spooled into [DataSpool](https://github.com/Blackfall-Labs/dataspool-rs) archives
## Workflow
```text
CML Document (XML)
→ BytePunch Compression (dictionary-based)
→ .card file (binary, optimized)
```
## Installation
Add to your `Cargo.toml`:
```toml
[dependencies]
datacard = "0.1.0"
bytepunch = "0.1.0"
sam-cml = "0.1.0"
```
## Quick Start
```rust
use datacard::Card;
use bytepunch::Dictionary;
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Load dictionary (profile-specific)
let dict = Dictionary::from_file("dictionaries/code-api.json")?;
// Create card from CML
let cml_xml = r#"<cml version="0.1">
<metadata>
<title>Example Document</title>
</metadata>
<section id="intro">
<heading level="1">Introduction</heading>
<para>Sample content</para>
</section>
</cml>"#;
let card = Card::from_cml(cml_xml, &dict)?;
// Save to file
card.save("example.card")?;
// Load from file
let loaded = Card::load("example.card")?;
// Decompress back to CML
let decompressed_cml = loaded.to_cml(&dict)?;
assert_eq!(cml_xml, decompressed_cml);
println!("Original: {} bytes", cml_xml.len());
println!("Compressed: {} bytes", card.size());
println!("Compression ratio: {:.1}%",
(1.0 - card.size() as f32 / cml_xml.len() as f32) * 100.0);
Ok(())
}
```
## API Reference
### `Card`
Represents a BytePunch-compressed CML document.
```rust
pub struct Card {
pub data: Vec<u8>,
}
```
#### Methods
##### `from_cml(cml_xml: &str, dictionary: &Dictionary) -> Result<Self>`
Create a card from CML XML string.
**Arguments:**
- `cml_xml` - CML document as XML string
- `dictionary` - BytePunch dictionary for compression
**Returns:** Compressed `Card`
**Example:**
```rust
let dict = Dictionary::from_file("code-api.json")?;
let card = Card::from_cml("<cml>...</cml>", &dict)?;
```
##### `to_cml(&self, dictionary: &Dictionary) -> Result<String>`
Decompress card back to CML XML.
**Arguments:**
- `dictionary` - BytePunch dictionary for decompression (must match compression dict)
**Returns:** Decompressed CML as XML string
**Example:**
```rust
let cml = card.to_cml(&dict)?;
```
##### `load<P: AsRef<Path>>(path: P) -> Result<Self>`
Load card from file.
**Arguments:**
- `path` - File path to `.card` file
**Returns:** Loaded `Card`
**Example:**
```rust
let card = Card::load("document.card")?;
```
##### `save<P: AsRef<Path>>(&self, path: P) -> Result<()>`
Save card to file.
**Arguments:**
- `path` - Output file path
**Example:**
```rust
card.save("document.card")?;
```
##### `size(&self) -> usize`
Get compressed size in bytes.
**Returns:** Size of compressed data
**Example:**
```rust
println!("Card size: {} bytes", card.size());
```
##### `from_bytes(data: Vec<u8>) -> Self`
Create card from raw compressed data.
**Arguments:**
- `data` - Pre-compressed BytePunch data
**Returns:** `Card` wrapping the data
**Example:**
```rust
let card = Card::from_bytes(compressed_bytes);
```
##### `as_bytes(&self) -> &[u8]`
Get raw compressed data.
**Returns:** Slice of compressed bytes
**Example:**
```rust
let bytes = card.as_bytes();
```
## File Format
DataCard files (`.card`) contain BytePunch-compressed CML:
```text
┌─────────────────────────────────────┐
│ Magic: "BP01" (4 bytes) │ BytePunch format identifier
├─────────────────────────────────────┤
│ Version: 1 (1 byte) │ Format version
├─────────────────────────────────────┤
│ Profile Length: N (1 byte) │ Length of profile name
├─────────────────────────────────────┤
│ Profile: "code:api" (N bytes) │ Dictionary profile used
├─────────────────────────────────────┤
│ Compressed Content (variable) │ Dictionary-compressed CML
└─────────────────────────────────────┘
```
## Use Cases
### 1. Rust Standard Library Documentation
Convert rustdoc JSON to CML cards:
```rust
use datacard::Card;
use bytepunch::Dictionary;
use sam_cml::CmlDocument;
// Load code-api dictionary
let dict = Dictionary::from_file("dictionaries/code-api.json")?;
// Parse rustdoc, convert to CML, compress to card
let rustdoc_json = std::fs::read_to_string("std.json")?;
let cml_doc = convert_rustdoc_to_cml(&rustdoc_json)?;
let cml_xml = cml_doc.to_xml()?;
let card = Card::from_cml(&cml_xml, &dict)?;
card.save("std.card")?;
```
Result: 3,309 Rust stdlib cards at 40-70% compression ratio.
### 2. Legal Document Archive
Compress legal documents with profile-specific dictionary:
```rust
let dict = Dictionary::from_file("dictionaries/legal.json")?;
for doc in legal_documents {
let cml = doc.to_cml()?;
let card = Card::from_cml(&cml, &dict)?;
card.save(format!("legal/{}.card", doc.id))?;
}
```
### 3. Network Transmission
Send cards over the network:
```rust
use tokio::io::AsyncWriteExt;
let card = Card::load("document.card")?;
let bytes = card.as_bytes();
// Send with length prefix
stream.write_u32(bytes.len() as u32).await?;
stream.write_all(bytes).await?;
```
### 4. Bundle into DataSpools
Combine multiple cards into indexed spool:
```rust
use dataspool::{SpoolBuilder, PersistentVectorStore};
let mut spool = SpoolBuilder::new("archive.spool")?;
let mut db = PersistentVectorStore::new("archive.db")?;
for card_file in cards {
let card = Card::load(&card_file)?;
let entry = spool.add_card(card.as_bytes())?;
// Store metadata and embeddings in .db
db.add_document_ref(
&doc_id,
DocumentRef::Spool {
spool_path: "archive.spool".into(),
offset: entry.offset,
length: entry.length,
},
&embedding,
)?;
}
spool.finalize()?;
```
Result: Single `.spool` file + SQLite `.db` → bundled into `.engram`
## Compression Ratios
Typical compression ratios by content type:
| Rust API docs | `code:api` | 60-70% |
| Legal documents | `legal` | 40-50% |
| Bookstack wikis | `bookstack` | 50-60% |
## Integration
DataCard integrates with:
- **[CML](https://github.com/manifest-humanity/content-markup-language)** - Source document format
- **[BytePunch](https://github.com/Blackfall-Labs/bytepunch-rs)** - Compression algorithm
- **[DataSpool](https://github.com/Blackfall-Labs/dataspool-rs)** - Bundling system
- **[Engram](https://github.com/manifest-humanity/engram)** - Signed archives
## Performance
Benchmarks on Rust stdlib documentation (3,309 cards):
| Compress CML → Card | ~2ms | 500 docs/sec |
| Decompress Card → CML | ~1ms | 1000 docs/sec |
| Save to file | <1ms | I/O bound |
| Load from file | <1ms | I/O bound |
## Error Handling
```rust
use datacard::{Card, CardError};
match Card::load("missing.card") {
Ok(card) => println!("Loaded {} bytes", card.size()),
Err(CardError::Io(e)) => eprintln!("File error: {}", e),
Err(CardError::BytePunch(e)) => eprintln!("Decompression error: {}", e),
Err(CardError::InvalidFormat) => eprintln!("Invalid card format"),
}
```
## Development
```bash
# Clone repo
git clone https://github.com/Blackfall-Labs/datacard-rs
cd datacard-rs
# Build
cargo build
# Run tests
cargo test
# Build release
cargo build --release
```
## History
DataCard was extracted from the [SAM project](https://github.com/Blackfall-Labs/sam) where it was used to compress 3,309 Rust standard library documentation cards for offline AI knowledge retrieval.
Original implementation: `crates/sam-core/examples/build_stdlib_engram_cards.rs`
## License
MIT - See [LICENSE](LICENSE) for details.
## Author
Magnus Trent <magnus@blackfall.dev>
## Links
- **GitHub:** https://github.com/Blackfall-Labs/datacard-rs
- **Docs:** https://docs.rs/datacard
- **Crates.io:** https://crates.io/crates/datacard
- **BytePunch:** https://github.com/Blackfall-Labs/bytepunch-rs
- **DataSpool:** https://github.com/Blackfall-Labs/dataspool-rs
- **CML:** https://github.com/manifest-humanity/content-markup-language