DataCard
BytePunch-Compressed CML Documents (.card file format)
DataCard represents the .card file format: CML documents compressed with BytePunch for efficient storage and transmission.
Overview
DataCard bridges CML (Content Markup Language) and BytePunch compression into a single file format optimized for:
- Storage: 40-70% smaller than raw CML
- Network: Efficient over-the-wire transmission
- Bundling: Cards can be spooled into DataSpool archives
Workflow
CML Document (XML)
→ BytePunch Compression (dictionary-based)
→ .card file (binary, optimized)
Installation
Add to your Cargo.toml:
[]
= "0.1.0"
= "0.1.0"
= "0.1.0"
Quick Start
use Card;
use Dictionary;
API Reference
Card
Represents a BytePunch-compressed CML document.
Methods
from_cml(cml_xml: &str, dictionary: &Dictionary) -> Result<Self>
Create a card from CML XML string.
Arguments:
cml_xml- CML document as XML stringdictionary- BytePunch dictionary for compression
Returns: Compressed Card
Example:
let dict = from_file?;
let card = from_cml?;
to_cml(&self, dictionary: &Dictionary) -> Result<String>
Decompress card back to CML XML.
Arguments:
dictionary- BytePunch dictionary for decompression (must match compression dict)
Returns: Decompressed CML as XML string
Example:
let cml = card.to_cml?;
load<P: AsRef<Path>>(path: P) -> Result<Self>
Load card from file.
Arguments:
path- File path to.cardfile
Returns: Loaded Card
Example:
let card = load?;
save<P: AsRef<Path>>(&self, path: P) -> Result<()>
Save card to file.
Arguments:
path- Output file path
Example:
card.save?;
size(&self) -> usize
Get compressed size in bytes.
Returns: Size of compressed data
Example:
println!;
from_bytes(data: Vec<u8>) -> Self
Create card from raw compressed data.
Arguments:
data- Pre-compressed BytePunch data
Returns: Card wrapping the data
Example:
let card = from_bytes;
as_bytes(&self) -> &[u8]
Get raw compressed data.
Returns: Slice of compressed bytes
Example:
let bytes = card.as_bytes;
File Format
DataCard files (.card) contain BytePunch-compressed CML:
┌─────────────────────────────────────┐
│ Magic: "BP01" (4 bytes) │ BytePunch format identifier
├─────────────────────────────────────┤
│ Version: 1 (1 byte) │ Format version
├─────────────────────────────────────┤
│ Profile Length: N (1 byte) │ Length of profile name
├─────────────────────────────────────┤
│ Profile: "code:api" (N bytes) │ Dictionary profile used
├─────────────────────────────────────┤
│ Compressed Content (variable) │ Dictionary-compressed CML
└─────────────────────────────────────┘
Use Cases
1. Rust Standard Library Documentation
Convert rustdoc JSON to CML cards:
use Card;
use Dictionary;
use CmlDocument;
// Load code-api dictionary
let dict = from_file?;
// Parse rustdoc, convert to CML, compress to card
let rustdoc_json = read_to_string?;
let cml_doc = convert_rustdoc_to_cml?;
let cml_xml = cml_doc.to_xml?;
let card = from_cml?;
card.save?;
Result: 3,309 Rust stdlib cards at 40-70% compression ratio.
2. Legal Document Archive
Compress legal documents with profile-specific dictionary:
let dict = from_file?;
for doc in legal_documents
3. Network Transmission
Send cards over the network:
use AsyncWriteExt;
let card = load?;
let bytes = card.as_bytes;
// Send with length prefix
stream.write_u32.await?;
stream.write_all.await?;
4. Bundle into DataSpools
Combine multiple cards into indexed spool:
use ;
let mut spool = new?;
let mut db = new?;
for card_file in cards
spool.finalize?;
Result: Single .spool file + SQLite .db → bundled into .engram
Compression Ratios
Typical compression ratios by content type:
| Content Type | Dictionary | Compression |
|---|---|---|
| Rust API docs | code:api |
60-70% |
| Legal documents | legal |
40-50% |
| Bookstack wikis | bookstack |
50-60% |
Integration
DataCard integrates with:
- CML - Source document format
- BytePunch - Compression algorithm
- DataSpool - Bundling system
- Engram - Signed archives
Performance
Benchmarks on Rust stdlib documentation (3,309 cards):
| Operation | Time | Throughput |
|---|---|---|
| Compress CML → Card | ~2ms | 500 docs/sec |
| Decompress Card → CML | ~1ms | 1000 docs/sec |
| Save to file | <1ms | I/O bound |
| Load from file | <1ms | I/O bound |
Error Handling
use ;
match load
Development
# Clone repo
# Build
# Run tests
# Build release
History
DataCard was extracted from the SAM project where it was used to compress 3,309 Rust standard library documentation cards for offline AI knowledge retrieval.
Original implementation: crates/sam-core/examples/build_stdlib_engram_cards.rs
License
MIT - See LICENSE for details.
Author
Magnus Trent magnus@blackfall.dev
Links
- GitHub: https://github.com/Blackfall-Labs/datacard-rs
- Docs: https://docs.rs/datacard
- Crates.io: https://crates.io/crates/datacard
- BytePunch: https://github.com/Blackfall-Labs/bytepunch-rs
- DataSpool: https://github.com/Blackfall-Labs/dataspool-rs
- CML: https://github.com/manifest-humanity/content-markup-language