Module binary_file_format

Module binary_file_format 

Source
Expand description

§Binary File Format Value Object

This module defines the binary file format specification for the Adaptive Pipeline system. It provides a standardized format for storing processed files with complete recovery metadata and integrity verification.

§Overview

The binary file format provides:

  • File Recovery: Complete metadata for recovering original files
  • Integrity Verification: Checksums and validation for processed files
  • Processing History: Complete record of processing steps applied
  • Version Management: Format versioning for backward compatibility
  • Compression Support: Efficient storage of processed data

§Architecture

The format follows a structured binary layout:

  • Magic Bytes: File format identification
  • Version Header: Format version information
  • Metadata Section: Processing metadata and recovery information
  • Data Section: Actual processed file data
  • Integrity Section: Checksums and validation data

§Key Features

§File Recovery

  • Original Filename: Preserve original file names
  • File Size: Track original and processed file sizes
  • Processing Steps: Record all processing operations applied
  • Restoration Metadata: Information needed for complete recovery

§Integrity Verification

  • Checksums: Multiple checksum algorithms for verification
  • Validation: Comprehensive validation of file integrity
  • Error Detection: Detect corruption and processing errors
  • Recovery Verification: Verify recovered files match originals

§Format Versioning

  • Version Management: Support for multiple format versions
  • Backward Compatibility: Maintain compatibility with older versions
  • Migration Support: Automatic migration between format versions
  • Feature Evolution: Support for new features in future versions

§Usage Examples

§Creating a Binary File

§Reading and Validating a Binary File

§File Recovery Process

§File Format Specification

§Binary Layout

The .adapipe file format uses the following binary layout:

§Header Components

  • Magic Bytes: 8 bytes - “ADAPIPE\0” (0x41444150495045000)
  • Format Version: 2 bytes - Current version number
  • Header Length: 4 bytes - Length of JSON header in bytes
  • JSON Header: Variable length - Metadata and processing information
  • Processed Data: Variable length - Actual processed file content

§JSON Header Structure

§Processing Steps

§Supported Operations

  • Compression: Various compression algorithms (brotli, gzip, lz4)
  • Encryption: Encryption algorithms (AES-256-GCM, ChaCha20-Poly1305)
  • Validation: Checksum and integrity validation
  • Transformation: Custom data transformations

§Step Parameters

Each processing step can include parameters:

  • Compression Level: Compression quality/speed tradeoff
  • Encryption Keys: Key derivation and management information
  • Algorithm Options: Algorithm-specific configuration
  • Custom Parameters: Application-specific parameters

§Integrity Verification

§Checksum Algorithms

  • SHA-256: Primary checksum algorithm
  • Blake3: High-performance alternative
  • CRC32: Fast integrity checking
  • Custom: Support for custom checksum algorithms

§Verification Process

  1. Format Validation: Verify magic bytes and version
  2. Header Validation: Validate JSON header structure
  3. Data Integrity: Verify processed data checksum
  4. Recovery Verification: Verify recovered data matches original

§Error Handling

§Format Errors

  • Invalid Magic Bytes: File is not in .adapipe format
  • Unsupported Version: Format version not supported
  • Corrupt Header: JSON header is malformed or corrupt
  • Invalid Data: Processed data is corrupt or invalid

§Recovery Errors

  • Missing Steps: Required processing steps are missing
  • Invalid Parameters: Processing parameters are invalid
  • Checksum Mismatch: Data integrity verification failed
  • Recovery Failure: Unable to recover original data

§Performance Considerations

§File Size Optimization

  • Efficient Encoding: Compact binary encoding for metadata
  • Compression: Built-in compression for processed data
  • Minimal Overhead: Minimal format overhead

§Processing Performance

  • Streaming: Support for streaming processing of large files
  • Parallel Processing: Parallel processing of file chunks
  • Memory Efficiency: Efficient memory usage during processing

§Security Considerations

§Data Protection

  • Encryption: Strong encryption for sensitive data
  • Key Management: Secure key derivation and management
  • Integrity: Comprehensive integrity verification

§Attack Prevention

  • Format Validation: Prevent malformed file attacks
  • Size Limits: Prevent resource exhaustion attacks
  • Checksum Verification: Prevent data tampering

§Version Management

§Format Versioning

  • Semantic Versioning: Use semantic versioning for format versions
  • Backward Compatibility: Maintain compatibility with older versions
  • Migration: Automatic migration between format versions

§Feature Evolution

  • New Algorithms: Support for new compression/encryption algorithms
  • Enhanced Metadata: Extended metadata capabilities
  • Performance Improvements: Optimizations in new versions

§Integration

The binary file format integrates with:

  • File Processor: Used by file processor for creating processed files
  • Storage Systems: Store processed files in various storage systems
  • Recovery Systems: Recover original files from processed files
  • Validation Systems: Validate file integrity and format compliance

§Future Enhancements

Planned enhancements include:

  • Streaming Support: Enhanced streaming capabilities
  • Compression Improvements: Better compression algorithms
  • Metadata Extensions: Extended metadata capabilities
  • Performance Optimizations: Further performance improvements

Structs§

ChunkFormat
Format for individual chunks in the file
FileHeader
File header for Adaptive Pipeline processed files (.adapipe format)
ProcessingStep
A single processing step that was applied to the file Steps are stored in the order they were applied, and must be reversed in reverse order

Enums§

ProcessingStepType
Types of processing steps

Constants§

CURRENT_FORMAT_VERSION
Current file format version
MAGIC_BYTES
Magic bytes to identify our file format: “ADAPIPE\0”