Skip to main content

Module magic

Module magic 

Source
Expand description

Magic bytes and version constants.

Defines the file signature (HEXZ) and format version that identify a valid snapshot file.

Critical Note: The magic bytes never change—they permanently identify this as a Hexz file across all versions. Altering them would make all existing files unreadable.

The format version constant, however, can and will change to indicate incompatible format updates. Readers must check this version and reject files they cannot decode. File signature, magic bytes, and header size constants for Hexz snapshots.

This module defines the fundamental constants that identify and structure Hexz snapshot files (.hxz). These values form the first line of defense against file corruption and format misidentification, enabling readers to quickly reject invalid files before attempting deserialization.

§File Format Overview

Every Hexz snapshot file has the following fixed structure:

┌─────────────────────────────────────────────────────────────┐
│ Byte 0-3: Magic Bytes ("HEXZ")                              │
├─────────────────────────────────────────────────────────────┤
│ Byte 4-4095: Header (bincode-serialized Header)             │
│   - version: u32                                             │
│   - block_size: u32                                          │
│   - index_offset: u64                                        │
│   - compression: CompressionType                             │
│   - features: FeatureFlags                                   │
│   - optional fields (encryption, parent, etc.)               │
├─────────────────────────────────────────────────────────────┤
│ Byte 4096+: Compressed block data                            │
│ ...                                                          │
│ Index pages                                                  │
│ Master index (at header.index_offset)                        │
└─────────────────────────────────────────────────────────────┘

§Magic Bytes Rationale

The 4-byte signature HEXZ (ASCII: 0x48 0x45 0x58 0x5A) serves multiple purposes:

§Immediate Format Validation

Readers can detect non-Hexz files with a single 4-byte read before attempting any deserialization, preventing crashes or misinterpretation:

let mut magic = [0u8; 4];
file.read_exact(&mut magic)?;
if &magic != MAGIC_BYTES {
    return Err(Error::InvalidMagic { found: magic });
}

§Corruption Detection

If the magic bytes are corrupted, the file is likely unrecoverable and should be rejected immediately rather than attempting to parse garbage data.

§File Type Identification

Operating systems and tools (e.g., file(1)) can identify .hxz files by searching for the HEXZ signature, even if the file extension is wrong.

§Endianness Independence

The ASCII signature avoids byte-order ambiguity. Unlike a numeric magic number (e.g., 0x4845585A), the byte sequence is identical on little-endian and big-endian systems.

§Header Size Calculation

The header is fixed at 4096 bytes (4 KB) for several reasons:

§Alignment Benefits

  • Page alignment: Matches common OS page size (4096 bytes on x86/ARM)
  • Block alignment: Compatible with 4KB block storage devices
  • DMA efficiency: Hardware I/O transfers work optimally on page boundaries

§Padding Strategy

The actual serialized Header is typically 200-500 bytes. The remaining space is zero-padded, providing:

  • Forward compatibility: New header fields can be added without changing the header size, preserving alignment properties
  • Metadata expansion: Optional fields (encryption params, signatures) fit within the fixed 4096-byte envelope

§Read Performance

Fixed-size headers enable predictable I/O patterns:

// Single aligned read for header
let mut header_buf = vec![0u8; HEADER_SIZE];
file.read_exact(&mut header_buf)?;
let header: Header = bincode::deserialize(&header_buf[4..])?;

§Backward Compatibility Guarantee

These constants are immutable across all Hexz versions:

  • MAGIC_BYTES must always be b"HEXZ" (changing this creates a new file format)
  • HEADER_SIZE must always be 4096 (changing this breaks offset calculations)

The [FORMAT_VERSION] constant, however, can and will change to indicate format evolution. Version checking logic is in crate::format::version.

§Security Considerations

§Magic Byte Spoofing

An attacker could create a malicious file with valid magic bytes but corrupted or adversarial header data. Defenses include:

  • Version checking: Reject unknown versions (see crate::format::version::check_version)
  • Checksum verification: Validate block checksums before decompression
  • Bounds checking: Ensure all offsets/lengths are within file size

§Header Parsing Robustness

The bincode deserializer must handle truncated, oversized, or malformed headers gracefully. Always deserialize with size limits:

let config = bincode::config::standard().with_limit(HEADER_SIZE as u64);
let header: Header = bincode::decode_from_slice(&header_buf, config)?;

§File Type Registration

For integration with system file-type databases:

§MIME Type (Proposed)

application/x-hexz-snapshot

§Magic Database Entry (/etc/magic)

0       string  HEXZ            Hexz snapshot file
>4      ulelong x               \b, version %d

§File Extension

The conventional extension is .hxz, though the format does not require it.

§Examples

§Validating Magic Bytes

use hexz_core::format::magic::MAGIC_BYTES;

let file_header = b"HEXZ..."; // First bytes of a file
assert_eq!(&file_header[..4], MAGIC_BYTES);

§Header Offset Calculation

use hexz_core::format::magic::HEADER_SIZE;

// First compressed block starts immediately after header
let first_block_offset = HEADER_SIZE;
assert_eq!(first_block_offset, 4096);

§File Format Detection

use std::fs::File;
use std::io::Read;
use hexz_core::format::magic::MAGIC_BYTES;

fn is_hexz_file(path: &Path) -> std::io::Result<bool> {
    let mut file = File::open(path)?;
    let mut magic = [0u8; 4];
    file.read_exact(&mut magic)?;
    Ok(&magic == MAGIC_BYTES)
}

§Reader Implementation

use hexz_core::format::magic::{MAGIC_BYTES, HEADER_SIZE, FORMAT_VERSION};
use hexz_core::format::header::Header;
use hexz_core::error::Error;

fn read_header(file: &mut File) -> Result<Header, Error> {
    // Read full header region (magic + serialized header)
    let mut buf = vec![0u8; HEADER_SIZE];
    file.read_exact(&mut buf)?;

    // Validate magic bytes
    if &buf[0..4] != MAGIC_BYTES {
        return Err(Error::InvalidMagic {
            found: buf[0..4].try_into().unwrap(),
        });
    }

    // Deserialize header (bytes 4..4096)
    let header: Header = bincode::deserialize(&buf[4..])?;

    // Validate version
    if header.version != FORMAT_VERSION {
        return Err(Error::UnsupportedVersion {
            found: header.version,
            supported: FORMAT_VERSION,
        });
    }

    Ok(header)
}

Constants§

FORMAT_VERSION
Format version number for snapshots written by this build.
HEADER_SIZE
Size of the fixed header region at the start of snapshot files.
MAGIC_BYTES
File signature identifying Hexz snapshot files.