# DataCard Format Specification v1.0
**Last Updated:** 2025-12-21
**Status:** Draft
## Overview
DataCard (`.card`) is a binary file format for storing BytePunch-compressed CML documents with metadata.
## Design Goals
1. **Self-describing** - Magic header and version for identification
2. **Minimal overhead** - Typically 20-50 bytes of metadata
3. **Fast parsing** - Simple binary structure
4. **Extensible** - Version field and flags for future features
5. **Validated** - Optional checksums for integrity
6. **Spool-compatible** - Works standalone or in DataSpool archives
## Binary Structure
```
┌──────────────────────────────────────────┐
│ Header (8 bytes) │
├──────────────────────────────────────────┤
│ Metadata (variable, length-prefixed) │
├──────────────────────────────────────────┤
│ Payload (BytePunch compressed CML) │
├──────────────────────────────────────────┤
│ Footer (4 bytes, optional) │
└──────────────────────────────────────────┘
```
## Header (8 bytes)
```rust
struct CardHeader {
magic: [u8; 4], // "CARD" (0x43 0x41 0x52 0x44)
major: u8, // Major version (1)
minor: u8, // Minor version (0)
flags: u16, // Feature flags (little-endian)
}
```
### Flags (16 bits)
```
Bit 0: HAS_CHECKSUM - Footer contains CRC32 checksum
Bit 1: HAS_TIMESTAMP - Metadata includes creation timestamp
Bits 2-15: Reserved (must be 0)
```
## Metadata (variable length)
```rust
struct Metadata {
length: u32, // Metadata JSON length (little-endian)
json: Vec<u8>, // UTF-8 encoded JSON
}
```
### Metadata JSON Schema
**Required fields:**
- `id` (string) - Document identifier (e.g., "std::vec::Vec")
- `compressed_size` (u64) - Payload size in bytes
**Optional fields:**
- `profile` (string) - CML profile (e.g., "code:api", "legal:constitution")
- `original_size` (u64) - Original CML size before compression
- `created` (u64) - Unix timestamp in milliseconds
- `dict_version` (string) - BytePunch dictionary version used
**Example:**
```json
{
"id": "std::vec::Vec",
"profile": "code:api",
"compressed_size": 1234,
"original_size": 5678,
"created": 1703001234567,
"dict_version": "cml-core-v1"
}
```
## Payload (variable length)
Raw BytePunch-compressed data. Size is specified in `metadata.compressed_size`.
## Footer (4 bytes, optional)
```rust
struct CardFooter {
crc32: u32, // CRC32 checksum (little-endian)
}
```
CRC32 is computed over:
1. Header (8 bytes)
2. Metadata length + JSON
3. Payload
Only present if `flags & HAS_CHECKSUM != 0`.
## File Extension
- **Standalone files:** `.card`
- **In spools:** Embedded in `.spool` files
## Size Limits
- **Metadata:** 64 KB maximum (u16 length would suffice, but u32 for future-proofing)
- **Payload:** 4 GB maximum (u32 compressed_size)
- **Total file:** 4 GB maximum (practical limit)
## Implementation Notes
### Encoding
- All multi-byte integers are **little-endian**
- Metadata JSON is UTF-8 encoded
- No padding or alignment requirements
### Reading Algorithm
```rust
fn read_card(reader: &mut Read) -> Result<Card> {
// 1. Read and validate header
let header = read_header(reader)?;
if header.magic != b"CARD" { return Err(InvalidMagic); }
if header.major != 1 { return Err(UnsupportedVersion); }
// 2. Read metadata
let meta_len = read_u32_le(reader)?;
let meta_json = read_bytes(reader, meta_len)?;
let metadata = parse_json(&meta_json)?;
// 3. Read payload
let payload = read_bytes(reader, metadata.compressed_size)?;
// 4. Validate checksum if present
if header.flags & 0x01 != 0 {
let checksum = read_u32_le(reader)?;
validate_crc32(&header, &meta_json, &payload, checksum)?;
}
Ok(Card { metadata, payload })
}
```
### Writing Algorithm
```rust
fn write_card(writer: &mut Write, metadata: &Metadata, payload: &[u8]) -> Result<()> {
// 1. Write header
writer.write_all(b"CARD")?;
writer.write_u8(1)?; // major
writer.write_u8(0)?; // minor
writer.write_u16_le(0)?; // no flags for v1.0
// 2. Write metadata
let meta_json = serde_json::to_vec(&metadata)?;
writer.write_u32_le(meta_json.len() as u32)?;
writer.write_all(&meta_json)?;
// 3. Write payload
writer.write_all(payload)?;
Ok(())
}
```
## Example Files
### Minimal Card (no checksum)
```
00000020: 73 73 65 64 5f 73 69 7a 65 22 3a 31 32 33 34 7d |ssed_size":1234}|
00000030: [BytePunch compressed data...]
```
Breakdown:
- `43 41 52 44` - "CARD" magic
- `01 00` - Version 1.0
- `00 00` - Flags: 0
- `2d 00 00 00` - Metadata length: 45 bytes
- `7b...7d` - Metadata JSON: `{"id":"test","compressed_size":1234}`
- Remaining bytes - Compressed payload
## Compatibility
### With DataSpool
Cards can be concatenated into `.spool` files. The spool format maintains:
- Card boundaries (offset + length)
- Individual card metadata
- Random access to any card
### With BytePunch
- Payload uses standard BytePunch compression
- Dictionary is NOT embedded (stored separately or in engram manifest)
- Decompression requires matching dictionary version
## Version History
- **v1.0** (2025-12-21) - Initial specification
- Basic header, metadata, payload structure
- Optional CRC32 checksums
- Up to 4GB payloads
## Future Considerations
- **v1.1** - Compression algorithm field in metadata
- **v2.0** - Embedded dictionary support
- **v2.0** - Multiple compression backends
- **v2.0** - Signature/encryption support