Expand description
Magic bytes and version constants.
Defines the file signature (HEXZ) and format version that identify
a valid snapshot file.
Critical Note: The magic bytes never change—they permanently identify this as a Hexz file across all versions. Altering them would make all existing files unreadable.
The format version constant, however, can and will change to indicate incompatible format updates. Readers must check this version and reject files they cannot decode. File signature, magic bytes, and header size constants for Hexz snapshots.
This module defines the fundamental constants that identify and structure
Hexz snapshot files (.hxz). These values form the first line of defense
against file corruption and format misidentification, enabling readers to
quickly reject invalid files before attempting deserialization.
§File Format Overview
Every Hexz snapshot file has the following fixed structure:
┌─────────────────────────────────────────────────────────────┐
│ Byte 0-3: Magic Bytes ("HEXZ") │
├─────────────────────────────────────────────────────────────┤
│ Byte 4-4095: Header (bincode-serialized Header) │
│ - version: u32 │
│ - block_size: u32 │
│ - index_offset: u64 │
│ - compression: CompressionType │
│ - features: FeatureFlags │
│ - optional fields (encryption, parent, etc.) │
├─────────────────────────────────────────────────────────────┤
│ Byte 4096+: Compressed block data │
│ ... │
│ Index pages │
│ Master index (at header.index_offset) │
└─────────────────────────────────────────────────────────────┘§Magic Bytes Rationale
The 4-byte signature HEXZ (ASCII: 0x48 0x45 0x58 0x5A) serves multiple purposes:
§Immediate Format Validation
Readers can detect non-Hexz files with a single 4-byte read before attempting any deserialization, preventing crashes or misinterpretation:
let mut magic = [0u8; 4];
file.read_exact(&mut magic)?;
if &magic != MAGIC_BYTES {
return Err(Error::InvalidMagic { found: magic });
}§Corruption Detection
If the magic bytes are corrupted, the file is likely unrecoverable and should be rejected immediately rather than attempting to parse garbage data.
§File Type Identification
Operating systems and tools (e.g., file(1)) can identify .hxz files
by searching for the HEXZ signature, even if the file extension is wrong.
§Endianness Independence
The ASCII signature avoids byte-order ambiguity. Unlike a numeric magic number
(e.g., 0x4845585A), the byte sequence is identical on little-endian and
big-endian systems.
§Header Size Calculation
The header is fixed at 4096 bytes (4 KB) for several reasons:
§Alignment Benefits
- Page alignment: Matches common OS page size (4096 bytes on x86/ARM)
- Block alignment: Compatible with 4KB block storage devices
- DMA efficiency: Hardware I/O transfers work optimally on page boundaries
§Padding Strategy
The actual serialized Header is typically 200-500 bytes. The
remaining space is zero-padded, providing:
- Forward compatibility: New header fields can be added without changing the header size, preserving alignment properties
- Metadata expansion: Optional fields (encryption params, signatures) fit within the fixed 4096-byte envelope
§Read Performance
Fixed-size headers enable predictable I/O patterns:
// Single aligned read for header
let mut header_buf = vec![0u8; HEADER_SIZE];
file.read_exact(&mut header_buf)?;
let header: Header = bincode::deserialize(&header_buf[4..])?;§Backward Compatibility Guarantee
These constants are immutable across all Hexz versions:
MAGIC_BYTESmust always beb"HEXZ"(changing this creates a new file format)HEADER_SIZEmust always be4096(changing this breaks offset calculations)
The [FORMAT_VERSION] constant, however, can and will change to indicate
format evolution. Version checking logic is in crate::format::version.
§Security Considerations
§Magic Byte Spoofing
An attacker could create a malicious file with valid magic bytes but corrupted or adversarial header data. Defenses include:
- Version checking: Reject unknown versions (see
crate::format::version::check_version) - Checksum verification: Validate block checksums before decompression
- Bounds checking: Ensure all offsets/lengths are within file size
§Header Parsing Robustness
The bincode deserializer must handle truncated, oversized, or malformed headers gracefully. Always deserialize with size limits:
let config = bincode::config::standard().with_limit(HEADER_SIZE as u64);
let header: Header = bincode::decode_from_slice(&header_buf, config)?;§File Type Registration
For integration with system file-type databases:
§MIME Type (Proposed)
application/x-hexz-snapshot§Magic Database Entry (/etc/magic)
0 string HEXZ Hexz snapshot file
>4 ulelong x \b, version %d§File Extension
The conventional extension is .hxz, though the format does not require it.
§Examples
§Validating Magic Bytes
use hexz_core::format::magic::MAGIC_BYTES;
let file_header = b"HEXZ..."; // First bytes of a file
assert_eq!(&file_header[..4], MAGIC_BYTES);§Header Offset Calculation
use hexz_core::format::magic::HEADER_SIZE;
// First compressed block starts immediately after header
let first_block_offset = HEADER_SIZE;
assert_eq!(first_block_offset, 4096);§File Format Detection
use std::fs::File;
use std::io::Read;
use hexz_core::format::magic::MAGIC_BYTES;
fn is_hexz_file(path: &Path) -> std::io::Result<bool> {
let mut file = File::open(path)?;
let mut magic = [0u8; 4];
file.read_exact(&mut magic)?;
Ok(&magic == MAGIC_BYTES)
}§Reader Implementation
use hexz_core::format::magic::{MAGIC_BYTES, HEADER_SIZE, FORMAT_VERSION};
use hexz_core::format::header::Header;
use hexz_core::error::Error;
fn read_header(file: &mut File) -> Result<Header, Error> {
// Read full header region (magic + serialized header)
let mut buf = vec![0u8; HEADER_SIZE];
file.read_exact(&mut buf)?;
// Validate magic bytes
if &buf[0..4] != MAGIC_BYTES {
return Err(Error::InvalidMagic {
found: buf[0..4].try_into().unwrap(),
});
}
// Deserialize header (bytes 4..4096)
let header: Header = bincode::deserialize(&buf[4..])?;
// Validate version
if header.version != FORMAT_VERSION {
return Err(Error::UnsupportedVersion {
found: header.version,
supported: FORMAT_VERSION,
});
}
Ok(header)
}Constants§
- FORMAT_
VERSION - Format version number for snapshots written by this build.
- HEADER_
SIZE - Size of the fixed header region at the start of snapshot files.
- MAGIC_
BYTES - File signature identifying Hexz snapshot files.