Module format

Module format 

Source
Expand description

Defines the physical binary layout of Parcode V4 files.

This module specifies the on-disk representation of Parcode files, including the global header, chunk structure, and metadata encoding. Understanding this format is essential for implementing readers in other languages or debugging file corruption.

§File Format Overview (V4)

Parcode V4 uses a “bottom-up” layout strategy where children are written before their parents. This enables streaming writes and allows the root chunk to contain a complete table of contents for the entire file.

§High-Level Structure

┌──────────────────────────────────┐
│ Chunk 0 (Leaf)                   │
├──────────────────────────────────┤
│ Chunk 1 (Leaf)                   │
├──────────────────────────────────┤
│ ...                              │
├──────────────────────────────────┤
│ Chunk N (Parent)                 │
├──────────────────────────────────┤
│ Root Chunk                       │
├──────────────────────────────────┤
│ Global Header (26 bytes)         │
└──────────────────────────────────┘

§Chunk Anatomy

Each chunk is self-contained and consists of three parts:

┌─────────────────────────────────────────────────────────┐
│ Compressed Payload (Variable Length)                    │
│   - Contains the actual data (serialized with bincode)  │
│   - May be compressed (LZ4, etc.) based on MetaByte     │
├─────────────────────────────────────────────────────────┤
│ Children Table (Optional, only if is_chunkable = true)  │
│   - Array of ChildRef structures (16 bytes each)        │
│   - Count stored as u32 LE (4 bytes) at the end         │
├─────────────────────────────────────────────────────────┤
│ MetaByte (1 byte)                                       │
│   - Bit 0: is_chunkable (has children)                  │
│   - Bits 1-3: compression_method (0-7)                  │
│   - Bits 4-7: Reserved for future use                   │
└─────────────────────────────────────────────────────────┘

§Reading a Chunk

To read a chunk at offset O with length L:

  1. Read the MetaByte at O + L - 1
  2. If is_chunkable, read the child count at O + L - 5 (u32 LE)
  3. Read the children table (if present) working backwards from the count
  4. The payload starts at O and ends before the children table (or MetaByte if no children)
  5. Decompress the payload based on the compression method

The global header is always located at the end of the file and has a fixed size of 26 bytes:

Offset | Size | Field         | Description
-------|------|---------------|----------------------------------------
0      | 4    | magic         | Magic bytes: "PAR4" (0x50 0x41 0x52 0x34)
4      | 2    | version       | Format version (u16 LE, currently 4)
6      | 8    | root_offset   | Absolute offset of root chunk (u64 LE)
14     | 8    | root_length   | Total length of root chunk (u64 LE)
22     | 4    | checksum      | Reserved for CRC32 (u32 LE, currently 0)

§Design Rationale

§Why Bottom-Up Layout?

  • Streaming Writes: Children can be written as soon as they’re ready, without knowing the final file size
  • Parallel Execution: Multiple threads can write chunks concurrently without coordination beyond the sequential writer mutex
  • Self-Describing Root: The root chunk contains all necessary metadata to navigate the entire file

§Why MetaByte at the End?

  • Backward Reading: When navigating from a parent to children, we can read the MetaByte first to determine the chunk structure
  • Alignment: Placing metadata at the end avoids alignment issues with the payload

§Why Fixed-Size ChildRef?

  • Random Access: Fixed-size references enable O(1) indexing into the children array
  • Simplicity: No need for variable-length encoding or delimiters

§Compatibility

  • Endianness: All multi-byte integers use little-endian encoding
  • Alignment: No special alignment requirements (can be read from any offset)
  • Version Detection: Readers should check the magic bytes and version before parsing

Structs§

ChildRef
Represents a reference to a child chunk stored within a parent chunk. This allows the reader to locate dependencies without deserializing the payload.
GlobalHeader
The Global Header located at the very end of the file (Tail). It points to the Root Chunk, which is the entry point for the graph.
MetaByte
Configuration flags for a specific chunk, stored in the last byte.

Constants§

GLOBAL_HEADER_SIZE
The fixed size of the Global Header. Magic(4) + Version(2) + RootOffset(8) + RootLength(8) + Checksum(4) = 26
MAGIC_BYTES
Magic bytes identifying the file format: “PAR4”.