Expand description
§Binary File Format Value Object
This module defines the binary file format specification for the Adaptive Pipeline system. It provides a standardized format for storing processed files with complete recovery metadata and integrity verification.
§Overview
The binary file format provides:
- File Recovery: Complete metadata for recovering original files
- Integrity Verification: Checksums and validation for processed files
- Processing History: Complete record of processing steps applied
- Version Management: Format versioning for backward compatibility
- Compression Support: Efficient storage of processed data
§Architecture
The format follows a structured binary layout:
- Magic Bytes: File format identification
- Version Header: Format version information
- Metadata Section: Processing metadata and recovery information
- Data Section: Actual processed file data
- Integrity Section: Checksums and validation data
§Key Features
§File Recovery
- Original Filename: Preserve original file names
- File Size: Track original and processed file sizes
- Processing Steps: Record all processing operations applied
- Restoration Metadata: Information needed for complete recovery
§Integrity Verification
- Checksums: Multiple checksum algorithms for verification
- Validation: Comprehensive validation of file integrity
- Error Detection: Detect corruption and processing errors
- Recovery Verification: Verify recovered files match originals
§Format Versioning
- Version Management: Support for multiple format versions
- Backward Compatibility: Maintain compatibility with older versions
- Migration Support: Automatic migration between format versions
- Feature Evolution: Support for new features in future versions
§Usage Examples
§Creating a Binary File
§Reading and Validating a Binary File
§File Recovery Process
§File Format Specification
§Binary Layout
The .adapipe file format uses the following binary layout:
§Header Components
- Magic Bytes: 8 bytes - “ADAPIPE\0” (0x41444150495045000)
- Format Version: 2 bytes - Current version number
- Header Length: 4 bytes - Length of JSON header in bytes
- JSON Header: Variable length - Metadata and processing information
- Processed Data: Variable length - Actual processed file content
§JSON Header Structure
§Processing Steps
§Supported Operations
- Compression: Various compression algorithms (brotli, gzip, lz4)
- Encryption: Encryption algorithms (AES-256-GCM, ChaCha20-Poly1305)
- Validation: Checksum and integrity validation
- Transformation: Custom data transformations
§Step Parameters
Each processing step can include parameters:
- Compression Level: Compression quality/speed tradeoff
- Encryption Keys: Key derivation and management information
- Algorithm Options: Algorithm-specific configuration
- Custom Parameters: Application-specific parameters
§Integrity Verification
§Checksum Algorithms
- SHA-256: Primary checksum algorithm
- Blake3: High-performance alternative
- CRC32: Fast integrity checking
- Custom: Support for custom checksum algorithms
§Verification Process
- Format Validation: Verify magic bytes and version
- Header Validation: Validate JSON header structure
- Data Integrity: Verify processed data checksum
- Recovery Verification: Verify recovered data matches original
§Error Handling
§Format Errors
- Invalid Magic Bytes: File is not in .adapipe format
- Unsupported Version: Format version not supported
- Corrupt Header: JSON header is malformed or corrupt
- Invalid Data: Processed data is corrupt or invalid
§Recovery Errors
- Missing Steps: Required processing steps are missing
- Invalid Parameters: Processing parameters are invalid
- Checksum Mismatch: Data integrity verification failed
- Recovery Failure: Unable to recover original data
§Performance Considerations
§File Size Optimization
- Efficient Encoding: Compact binary encoding for metadata
- Compression: Built-in compression for processed data
- Minimal Overhead: Minimal format overhead
§Processing Performance
- Streaming: Support for streaming processing of large files
- Parallel Processing: Parallel processing of file chunks
- Memory Efficiency: Efficient memory usage during processing
§Security Considerations
§Data Protection
- Encryption: Strong encryption for sensitive data
- Key Management: Secure key derivation and management
- Integrity: Comprehensive integrity verification
§Attack Prevention
- Format Validation: Prevent malformed file attacks
- Size Limits: Prevent resource exhaustion attacks
- Checksum Verification: Prevent data tampering
§Version Management
§Format Versioning
- Semantic Versioning: Use semantic versioning for format versions
- Backward Compatibility: Maintain compatibility with older versions
- Migration: Automatic migration between format versions
§Feature Evolution
- New Algorithms: Support for new compression/encryption algorithms
- Enhanced Metadata: Extended metadata capabilities
- Performance Improvements: Optimizations in new versions
§Integration
The binary file format integrates with:
- File Processor: Used by file processor for creating processed files
- Storage Systems: Store processed files in various storage systems
- Recovery Systems: Recover original files from processed files
- Validation Systems: Validate file integrity and format compliance
§Future Enhancements
Planned enhancements include:
- Streaming Support: Enhanced streaming capabilities
- Compression Improvements: Better compression algorithms
- Metadata Extensions: Extended metadata capabilities
- Performance Optimizations: Further performance improvements
Structs§
- Chunk
Format - Format for individual chunks in the file
- File
Header - File header for Adaptive Pipeline processed files (.adapipe format)
- Processing
Step - A single processing step that was applied to the file Steps are stored in the order they were applied, and must be reversed in reverse order
Enums§
- Processing
Step Type - Types of processing steps
Constants§
- CURRENT_
FORMAT_ VERSION - Current file format version
- MAGIC_
BYTES - Magic bytes to identify our file format: “ADAPIPE\0”