Skip to main content

Module archive

Module archive 

Source
Expand description

Archive processor for sanitizing files inside .zip, .tar, and .tar.gz archives.

§Architecture

┌───────────────────────┐
│  Archive (zip/tar/gz) │
└────────┬──────────────┘
         │  for each entry
         ▼
┌─────────────────────────────────────────────┐
│  1. Match entry filename → FileTypeProfile  │
│  2. Try ProcessorRegistry (structured)      │
│  3. Fallback: StreamScanner (streaming)     │
└────────┬────────────────────────────────────┘
         │  sanitized bytes
         ▼
┌───────────────────────┐
│  Rebuilt archive       │
│  (same format, meta   │
│   preserved)          │
└───────────────────────┘

§Memory Efficiency

Archives are processed entry-by-entry. Each entry is piped through either a structured processor (which must buffer the full entry) or the StreamScanner (which processes in configurable chunks). This means the maximum memory footprint is proportional to the largest single entry that uses a structured processor. Files without a profile match are streamed through the scanner without buffering the whole entry.

For very large individual files inside archives, the streaming scanner path keeps only chunk_size + overlap_size bytes in memory.

§Thread Safety

ArchiveProcessor is Send + Sync. The underlying MappingStore provides lock-free reads for dedup consistency.

§Metadata Preservation

  • Tar: modification time, permissions (mode), uid/gid, and username/groupname are copied from the source entry.
  • Zip: modification time, compression method, and unix permissions are preserved.
  • Symlinks, directories, and other non-regular entries are passed through unchanged.

Structs§

ArchiveFilter
A compiled glob-based entry filter for archive processing.
ArchiveProcessor
Processes archives by sanitizing each contained file and rebuilding the archive with the same format and preserved metadata.
ArchiveProgress
Progress snapshot emitted while processing archive entries.
ArchiveStats
Statistics collected while processing an archive.

Enums§

ArchiveFormat

Constants§

DEFAULT_MAX_ARCHIVE_DEPTH
Default maximum nesting depth for recursive archive processing.