datacard-rs 1.0.0

Generic binary card format library with checksums and pluggable format traits
Documentation
# DataCard Format Specification v1.0


**Last Updated:** 2025-12-21
**Status:** Draft

## Overview


DataCard (`.card`) is a binary file format for storing BytePunch-compressed CML documents with metadata.

## Design Goals


1. **Self-describing** - Magic header and version for identification
2. **Minimal overhead** - Typically 20-50 bytes of metadata
3. **Fast parsing** - Simple binary structure
4. **Extensible** - Version field and flags for future features
5. **Validated** - Optional checksums for integrity
6. **Spool-compatible** - Works standalone or in DataSpool archives

## Binary Structure


```
┌──────────────────────────────────────────┐
│ Header (8 bytes)                         │
├──────────────────────────────────────────┤
│ Metadata (variable, length-prefixed)     │
├──────────────────────────────────────────┤
│ Payload (BytePunch compressed CML)       │
├──────────────────────────────────────────┤
│ Footer (4 bytes, optional)               │
└──────────────────────────────────────────┘
```

## Header (8 bytes)


```rust
struct CardHeader {
    magic: [u8; 4],    // "CARD" (0x43 0x41 0x52 0x44)
    major: u8,         // Major version (1)
    minor: u8,         // Minor version (0)
    flags: u16,        // Feature flags (little-endian)
}
```

### Flags (16 bits)


```
Bit 0: HAS_CHECKSUM  - Footer contains CRC32 checksum
Bit 1: HAS_TIMESTAMP - Metadata includes creation timestamp
Bits 2-15: Reserved (must be 0)
```

## Metadata (variable length)


```rust
struct Metadata {
    length: u32,         // Metadata JSON length (little-endian)
    json: Vec<u8>,       // UTF-8 encoded JSON
}
```

### Metadata JSON Schema


**Required fields:**
- `id` (string) - Document identifier (e.g., "std::vec::Vec")
- `compressed_size` (u64) - Payload size in bytes

**Optional fields:**
- `profile` (string) - CML profile (e.g., "code:api", "legal:constitution")
- `original_size` (u64) - Original CML size before compression
- `created` (u64) - Unix timestamp in milliseconds
- `dict_version` (string) - BytePunch dictionary version used

**Example:**
```json
{
  "id": "std::vec::Vec",
  "profile": "code:api",
  "compressed_size": 1234,
  "original_size": 5678,
  "created": 1703001234567,
  "dict_version": "cml-core-v1"
}
```

## Payload (variable length)


Raw BytePunch-compressed data. Size is specified in `metadata.compressed_size`.

## Footer (4 bytes, optional)


```rust
struct CardFooter {
    crc32: u32,  // CRC32 checksum (little-endian)
}
```

CRC32 is computed over:
1. Header (8 bytes)
2. Metadata length + JSON
3. Payload

Only present if `flags & HAS_CHECKSUM != 0`.

## File Extension


- **Standalone files:** `.card`
- **In spools:** Embedded in `.spool` files

## Size Limits


- **Metadata:** 64 KB maximum (u16 length would suffice, but u32 for future-proofing)
- **Payload:** 4 GB maximum (u32 compressed_size)
- **Total file:** 4 GB maximum (practical limit)

## Implementation Notes


### Encoding

- All multi-byte integers are **little-endian**
- Metadata JSON is UTF-8 encoded
- No padding or alignment requirements

### Reading Algorithm


```rust
fn read_card(reader: &mut Read) -> Result<Card> {
    // 1. Read and validate header
    let header = read_header(reader)?;
    if header.magic != b"CARD" { return Err(InvalidMagic); }
    if header.major != 1 { return Err(UnsupportedVersion); }

    // 2. Read metadata
    let meta_len = read_u32_le(reader)?;
    let meta_json = read_bytes(reader, meta_len)?;
    let metadata = parse_json(&meta_json)?;

    // 3. Read payload
    let payload = read_bytes(reader, metadata.compressed_size)?;

    // 4. Validate checksum if present
    if header.flags & 0x01 != 0 {
        let checksum = read_u32_le(reader)?;
        validate_crc32(&header, &meta_json, &payload, checksum)?;
    }

    Ok(Card { metadata, payload })
}
```

### Writing Algorithm


```rust
fn write_card(writer: &mut Write, metadata: &Metadata, payload: &[u8]) -> Result<()> {
    // 1. Write header
    writer.write_all(b"CARD")?;
    writer.write_u8(1)?;  // major
    writer.write_u8(0)?;  // minor
    writer.write_u16_le(0)?;  // no flags for v1.0

    // 2. Write metadata
    let meta_json = serde_json::to_vec(&metadata)?;
    writer.write_u32_le(meta_json.len() as u32)?;
    writer.write_all(&meta_json)?;

    // 3. Write payload
    writer.write_all(payload)?;

    Ok(())
}
```

## Example Files


### Minimal Card (no checksum)


```
00000000: 43 41 52 44 01 00 00 00  2d 00 00 00 7b 22 69 64  |CARD....-...{"id|
00000010: 22 3a 22 74 65 73 74 22  2c 22 63 6f 6d 70 72 65  |":"test","compre|
00000020: 73 73 65 64 5f 73 69 7a  65 22 3a 31 32 33 34 7d  |ssed_size":1234}|
00000030: [BytePunch compressed data...]
```

Breakdown:
- `43 41 52 44` - "CARD" magic
- `01 00` - Version 1.0
- `00 00` - Flags: 0
- `2d 00 00 00` - Metadata length: 45 bytes
- `7b...7d` - Metadata JSON: `{"id":"test","compressed_size":1234}`
- Remaining bytes - Compressed payload

## Compatibility


### With DataSpool


Cards can be concatenated into `.spool` files. The spool format maintains:
- Card boundaries (offset + length)
- Individual card metadata
- Random access to any card

### With BytePunch


- Payload uses standard BytePunch compression
- Dictionary is NOT embedded (stored separately or in engram manifest)
- Decompression requires matching dictionary version

## Version History


- **v1.0** (2025-12-21) - Initial specification
  - Basic header, metadata, payload structure
  - Optional CRC32 checksums
  - Up to 4GB payloads

## Future Considerations


- **v1.1** - Compression algorithm field in metadata
- **v2.0** - Embedded dictionary support
- **v2.0** - Multiple compression backends
- **v2.0** - Signature/encryption support