Skip to main content

hexz_core/format/
magic.rs

1//! File signature, magic bytes, and header size constants for Hexz snapshots.
2//!
3//! This module defines the fundamental constants that identify and structure
4//! Hexz snapshot files (`.hxz`). These values form the first line of defense
5//! against file corruption and format misidentification, enabling readers to
6//! quickly reject invalid files before attempting deserialization.
7//!
8//! # File Format Overview
9//!
10//! Every Hexz snapshot file has the following fixed structure:
11//!
12//! ```text
13//! ┌─────────────────────────────────────────────────────────────┐
14//! │ Byte 0-3: Magic Bytes ("HEXZ")                              │
15//! ├─────────────────────────────────────────────────────────────┤
16//! │ Byte 4-4095: Header (bincode-serialized Header)             │
17//! │   - version: u32                                             │
18//! │   - block_size: u32                                          │
19//! │   - index_offset: u64                                        │
20//! │   - compression: CompressionType                             │
21//! │   - features: FeatureFlags                                   │
22//! │   - optional fields (encryption, parent, etc.)               │
23//! ├─────────────────────────────────────────────────────────────┤
24//! │ Byte 4096+: Compressed block data                            │
25//! │ ...                                                          │
26//! │ Index pages                                                  │
27//! │ Master index (at header.index_offset)                        │
28//! └─────────────────────────────────────────────────────────────┘
29//! ```
30//!
31//! # Magic Bytes Rationale
32//!
33//! The 4-byte signature `HEXZ` (ASCII: 0x48 0x45 0x58 0x5A) serves multiple purposes:
34//!
35//! ## Immediate Format Validation
36//!
37//! Readers can detect non-Hexz files with a single 4-byte read before
38//! attempting any deserialization, preventing crashes or misinterpretation:
39//!
40//! ```rust,ignore
41//! let mut magic = [0u8; 4];
42//! file.read_exact(&mut magic)?;
43//! if &magic != MAGIC_BYTES {
44//!     return Err(Error::InvalidMagic { found: magic });
45//! }
46//! ```
47//!
48//! ## Corruption Detection
49//!
50//! If the magic bytes are corrupted, the file is likely unrecoverable and
51//! should be rejected immediately rather than attempting to parse garbage data.
52//!
53//! ## File Type Identification
54//!
55//! Operating systems and tools (e.g., `file(1)`) can identify `.hxz` files
56//! by searching for the `HEXZ` signature, even if the file extension is wrong.
57//!
58//! ## Endianness Independence
59//!
60//! The ASCII signature avoids byte-order ambiguity. Unlike a numeric magic number
61//! (e.g., `0x4845585A`), the byte sequence is identical on little-endian and
62//! big-endian systems.
63//!
64//! # Header Size Calculation
65//!
66//! The header is fixed at 4096 bytes (4 KB) for several reasons:
67//!
68//! ## Alignment Benefits
69//!
70//! - **Page alignment**: Matches common OS page size (4096 bytes on x86/ARM)
71//! - **Block alignment**: Compatible with 4KB block storage devices
72//! - **DMA efficiency**: Hardware I/O transfers work optimally on page boundaries
73//!
74//! ## Padding Strategy
75//!
76//! The actual serialized [`Header`] is typically 200-500 bytes. The
77//! remaining space is zero-padded, providing:
78//!
79//! - **Forward compatibility**: New header fields can be added without changing
80//!   the header size, preserving alignment properties
81//! - **Metadata expansion**: Optional fields (encryption params, signatures) fit
82//!   within the fixed 4096-byte envelope
83//!
84//! ## Read Performance
85//!
86//! Fixed-size headers enable predictable I/O patterns:
87//!
88//! ```rust,ignore
89//! // Single aligned read for header
90//! let mut header_buf = vec![0u8; HEADER_SIZE];
91//! file.read_exact(&mut header_buf)?;
92//! let header: Header = bincode::deserialize(&header_buf[4..])?;
93//! ```
94//!
95//! # Backward Compatibility Guarantee
96//!
97//! These constants are **immutable** across all Hexz versions:
98//!
99//! - `MAGIC_BYTES` must always be `b"HEXZ"` (changing this creates a new file format)
100//! - `HEADER_SIZE` must always be `4096` (changing this breaks offset calculations)
101//!
102//! The [`FORMAT_VERSION`] constant, however, **can and will change** to indicate
103//! format evolution. Version checking logic is in [`crate::format::version`].
104//!
105//! # Security Considerations
106//!
107//! ## Magic Byte Spoofing
108//!
109//! An attacker could create a malicious file with valid magic bytes but corrupted
110//! or adversarial header data. Defenses include:
111//!
112//! - **Version checking**: Reject unknown versions (see [`crate::format::version::check_version`])
113//! - **Checksum verification**: Validate block checksums before decompression
114//! - **Bounds checking**: Ensure all offsets/lengths are within file size
115//!
116//! ## Header Parsing Robustness
117//!
118//! The bincode deserializer must handle truncated, oversized, or malformed headers
119//! gracefully. Always deserialize with size limits:
120//!
121//! ```rust,ignore
122//! let config = bincode::config::standard().with_limit(HEADER_SIZE as u64);
123//! let header: Header = bincode::decode_from_slice(&header_buf, config)?;
124//! ```
125//!
126//! # File Type Registration
127//!
128//! For integration with system file-type databases:
129//!
130//! ## MIME Type (Proposed)
131//!
132//! ```text
133//! application/x-hexz-snapshot
134//! ```
135//!
136//! ## Magic Database Entry (`/etc/magic`)
137//!
138//! ```text
139//! 0       string  HEXZ            Hexz snapshot file
140//! >4      ulelong x               \b, version %d
141//! ```
142//!
143//! ## File Extension
144//!
145//! The conventional extension is `.hxz`, though the format does not require it.
146//!
147//! # Examples
148//!
149//! ## Validating Magic Bytes
150//!
151//! ```
152//! use hexz_core::format::magic::MAGIC_BYTES;
153//!
154//! let file_header = b"HEXZ..."; // First bytes of a file
155//! assert_eq!(&file_header[..4], MAGIC_BYTES);
156//! ```
157//!
158//! ## Header Offset Calculation
159//!
160//! ```
161//! use hexz_core::format::magic::HEADER_SIZE;
162//!
163//! // First compressed block starts immediately after header
164//! let first_block_offset = HEADER_SIZE;
165//! assert_eq!(first_block_offset, 4096);
166//! ```
167//!
168//! ## File Format Detection
169//!
170//! ```rust,ignore
171//! use std::fs::File;
172//! use std::io::Read;
173//! use hexz_core::format::magic::MAGIC_BYTES;
174//!
175//! fn is_hexz_file(path: &Path) -> std::io::Result<bool> {
176//!     let mut file = File::open(path)?;
177//!     let mut magic = [0u8; 4];
178//!     file.read_exact(&mut magic)?;
179//!     Ok(&magic == MAGIC_BYTES)
180//! }
181//! ```
182//!
183//! ## Reader Implementation
184//!
185//! ```rust,ignore
186//! use hexz_core::format::magic::{MAGIC_BYTES, HEADER_SIZE, FORMAT_VERSION};
187//! use hexz_core::format::header::Header;
188//! use hexz_core::error::Error;
189//!
190//! fn read_header(file: &mut File) -> Result<Header, Error> {
191//!     // Read full header region (magic + serialized header)
192//!     let mut buf = vec![0u8; HEADER_SIZE];
193//!     file.read_exact(&mut buf)?;
194//!
195//!     // Validate magic bytes
196//!     if &buf[0..4] != MAGIC_BYTES {
197//!         return Err(Error::InvalidMagic {
198//!             found: buf[0..4].try_into().unwrap(),
199//!         });
200//!     }
201//!
202//!     // Deserialize header (bytes 4..4096)
203//!     let header: Header = bincode::deserialize(&buf[4..])?;
204//!
205//!     // Validate version
206//!     if header.version != FORMAT_VERSION {
207//!         return Err(Error::UnsupportedVersion {
208//!             found: header.version,
209//!             supported: FORMAT_VERSION,
210//!         });
211//!     }
212//!
213//!     Ok(header)
214//! }
215//! ```
216//!
217//! [`Header`]: crate::format::header::Header
218
219/// File signature identifying Hexz snapshot files.
220///
221/// This 4-byte constant (`HEXZ` in ASCII, 0x48 0x45 0x58 0x5A in hex) appears
222/// at the beginning of every valid `.hxz` file. Readers must validate this
223/// signature before attempting to parse the rest of the header.
224///
225/// # Rationale
226///
227/// - **ASCII-readable**: Easy to identify in hex dumps (`48 45 58 5A` = "HEXZ")
228/// - **Low collision probability**: Unlikely to appear at offset 0 in non-Hexz files
229/// - **Endian-neutral**: Byte sequence is identical regardless of CPU byte order
230/// - **Mnemonic**: "HEXZ" identifies the project and format
231///
232/// # Validation Example
233///
234/// ```
235/// use hexz_core::format::magic::MAGIC_BYTES;
236///
237/// let file_start = b"HEXZ\x01\x00\x00\x00..."; // First bytes of a file
238/// if &file_start[..4] == MAGIC_BYTES {
239///     println!("Valid Hexz file");
240/// } else {
241///     eprintln!("Not a Hexz file");
242/// }
243/// ```
244///
245/// # Error Handling
246///
247/// If magic bytes do not match, the file is either:
248/// - Not a Hexz snapshot (e.g., wrong file type)
249/// - Corrupted (e.g., truncated, damaged sectors)
250/// - Generated by incompatible software (e.g., future format with different signature)
251///
252/// In all cases, reject the file immediately without attempting deserialization.
253pub const MAGIC_BYTES: &[u8; 4] = b"HEXZ";
254
255/// Format version number for snapshots written by this build.
256///
257/// This constant is written to the `version` field of new snapshot headers and
258/// defines the on-disk format structure. When the format changes incompatibly
259/// (e.g., new index layout, different serialization), this value is incremented.
260///
261/// # Current Version
262///
263/// **Version 1**: Initial Hexz format with:
264/// - Two-level index (master index + paginated block metadata)
265/// - bincode serialization for headers and indices
266/// - LZ4 and Zstd compression support
267/// - Optional AES-256-GCM encryption
268/// - Thin provisioning via parent snapshot references
269/// - Dual streams (disk + memory)
270///
271/// # Version History
272///
273/// | Version | Hexz Release | Key Changes |
274/// |---------|--------------|-------------|
275/// | 1       | 0.1.0        | Initial format |
276///
277/// # Version Checking
278///
279/// This constant defines what version is **written**. For logic on what versions
280/// are **readable**, see [`crate::format::version::MIN_SUPPORTED_VERSION`] and
281/// [`crate::format::version::MAX_SUPPORTED_VERSION`].
282///
283/// # Examples
284///
285/// ```
286/// use hexz_core::format::magic::FORMAT_VERSION;
287/// use hexz_core::format::header::Header;
288///
289/// let mut header = Header::default();
290/// assert_eq!(header.version, FORMAT_VERSION);
291/// assert_eq!(FORMAT_VERSION, 1);
292/// ```
293pub const FORMAT_VERSION: u32 = 1;
294
295/// Size of the fixed header region at the start of snapshot files.
296///
297/// Every Hexz snapshot begins with a 4096-byte (4 KB) header containing:
298/// - Magic bytes (4 bytes)
299/// - Serialized [`Header`] structure (variable, typically 200-500 bytes)
300/// - Zero-padding to fill remaining space
301///
302/// This size is **immutable** across all format versions. Changing it would
303/// break offset calculations for existing readers.
304///
305/// # Alignment Properties
306///
307/// The 4096-byte size provides:
308///
309/// - **Page alignment**: Matches OS page size on x86, ARM, and most architectures
310/// - **Block alignment**: Compatible with 4KB physical sector size (Advanced Format drives)
311/// - **Cache efficiency**: Entire header fits in a single cache line or TLB entry
312/// - **DMA optimization**: Hardware I/O controllers transfer page-aligned data efficiently
313///
314/// # Layout Within Header Region
315///
316/// ```text
317/// Offset | Size  | Contents
318/// -------|-------|--------------------------------------------------
319/// 0      | 4     | Magic bytes (b"HEXZ")
320/// 4      | ~400  | Serialized Header (bincode format)
321/// ~404   | ~3692 | Zero-padding (reserved for future extensions)
322/// 4096   | ...   | Start of compressed block data
323/// ```
324///
325/// # Forward Compatibility
326///
327/// The large fixed size allows future format versions to add header fields
328/// without changing the header size or breaking alignment. New fields are
329/// serialized within the existing 4096-byte envelope.
330///
331/// # Performance Characteristics
332///
333/// Reading the header requires a single I/O operation:
334///
335/// - **SSD latency**: ~50-100 μs (single 4KB read)
336/// - **HDD latency**: ~5-10 ms (seek + single 4KB read)
337/// - **Network latency**: Depends on RTT, but single request avoids round-trips
338///
339/// # Size Rationale
340///
341/// Why not 512 bytes (classic sector size)?
342/// - Modern serialized headers with encryption/compression metadata exceed 512 bytes
343/// - Future extensions would require format version bump
344///
345/// Why not 8192 bytes (2 pages)?
346/// - Wastes space for typical headers (~400 bytes serialized)
347/// - Doubles read latency on high-latency backends (e.g., network storage)
348///
349/// 4096 bytes balances space efficiency and forward compatibility.
350///
351/// # Examples
352///
353/// ## Reading the Header
354///
355/// ```rust,ignore
356/// use std::fs::File;
357/// use std::io::Read;
358/// use hexz_core::format::magic::HEADER_SIZE;
359///
360/// let mut file = File::open("snapshot.hxz")?;
361/// let mut header_buf = vec![0u8; HEADER_SIZE];
362/// file.read_exact(&mut header_buf)?;
363///
364/// // header_buf now contains magic bytes + serialized header + padding
365/// ```
366///
367/// ## Calculating Block Offsets
368///
369/// ```
370/// use hexz_core::format::magic::HEADER_SIZE;
371///
372/// // First block is written immediately after header
373/// let first_block_offset = HEADER_SIZE;
374/// assert_eq!(first_block_offset, 4096);
375///
376/// // Second block offset (assuming first block is 2048 bytes compressed)
377/// let second_block_offset = HEADER_SIZE + 2048;
378/// assert_eq!(second_block_offset, 6144);
379/// ```
380///
381/// [`Header`]: crate::format::header::Header
382pub const HEADER_SIZE: usize = 4096;