Expand description
Archive processor for sanitizing files inside .zip, .tar, and .tar.gz archives.
§Architecture
┌───────────────────────┐
│ Archive (zip/tar/gz) │
└────────┬──────────────┘
│ for each entry
▼
┌─────────────────────────────────────────────┐
│ 1. Match entry filename → FileTypeProfile │
│ 2. Try ProcessorRegistry (structured) │
│ 3. Fallback: StreamScanner (streaming) │
└────────┬────────────────────────────────────┘
│ sanitized bytes
▼
┌───────────────────────┐
│ Rebuilt archive │
│ (same format, meta │
│ preserved) │
└───────────────────────┘§Memory Efficiency
Archives are processed entry-by-entry. Each entry is piped
through either a structured processor (which must buffer the full
entry) or the StreamScanner
(which processes in configurable chunks). This means the maximum
memory footprint is proportional to the largest single entry
that uses a structured processor. Files without a profile match
are streamed through the scanner without buffering the whole entry.
For very large individual files inside archives, the streaming
scanner path keeps only chunk_size + overlap_size bytes in memory.
§Thread Safety
ArchiveProcessor is Send + Sync. The underlying
MappingStore provides lock-free
reads for dedup consistency.
§Metadata Preservation
- Tar: modification time, permissions (mode), uid/gid, and username/groupname are copied from the source entry.
- Zip: modification time, compression method, and unix permissions are preserved.
- Symlinks, directories, and other non-regular entries are passed through unchanged.
Structs§
- Archive
Filter - A compiled glob-based entry filter for archive processing.
- Archive
Processor - Processes archives by sanitizing each contained file and rebuilding the archive with the same format and preserved metadata.
- Archive
Progress - Progress snapshot emitted while processing archive entries.
- Archive
Stats - Statistics collected while processing an archive.
Enums§
Constants§
- DEFAULT_
MAX_ ARCHIVE_ DEPTH - Default maximum nesting depth for recursive archive processing.