safe_unzip

Zip extraction that won't ruin your day.

The Problem

Zip files can contain malicious paths that escape the extraction directory:

# Python 3.12, tested January 2025 — STILL VULNERABLE
import zipfile
zipfile.ZipFile("evil.zip").extractall("/var/uploads")
# Extracts ../../etc/cron.d/pwned → /etc/cron.d/pwned 💀

This is Zip Slip, and it still affects Python in 2025.

"But didn't Python fix this?"

Sort of. Python added warnings and ZipInfo.filename sanitization in 2014. In Python 3.12+, there's a filter parameter:

# The "safe" way — but who knows this exists?
zipfile.ZipFile("evil.zip").extractall("/var/uploads", filter="data")

The problem: the safe option is opt-in. The default is still vulnerable. Most developers don't read the docs carefully enough to discover filter="data".

safe_unzip makes security the default, not an afterthought.

The Solution

use safe_unzip::extract_file;

extract_file("/var/uploads", "evil.zip")?;
// Err(PathEscape { entry: "../../etc/cron.d/pwned", ... })

# Python bindings — same safety
from safe_unzip import extract_file

extract_file("/var/uploads", "evil.zip")
# Raises: PathEscapeError

Security is the default. No special flags, no opt-in safety. Every path is validated. Malicious archives are rejected, not extracted.

Why Not Just Use `zip` / `zipfile`?

Because archive extraction is a security boundary, and most libraries treat it as a convenience function.

Library	Default Behavior	Safe Option
Python `zipfile`	Vulnerable	`filter="data"` (opt-in, obscure)
Rust `zip`	Vulnerable	Manual path validation
`safe_unzip`	Safe by default	N/A — always safe

If you're extracting untrusted archives, you need a library designed for that threat model.

Who Should Use This

Backend services handling user-uploaded zip files
CI/CD systems unpacking third-party artifacts
SaaS platforms with file import features
Forensics / malware analysis pipelines
Anything running as a privileged user

If your zip files only come from trusted sources you control, the standard zip crate is fine. If users can upload archives, use safe_unzip.

Features

Zip Slip Protection — Path traversal attacks blocked via path_jail
Zip Bomb Protection — Configurable limits on size, file count, and path depth
Strict Size Enforcement — Catches files that decompress larger than declared
Filename Sanitization — Blocks control characters and Windows reserved names
Symlink Handling — Skip or reject symlinks (no symlink-based escapes)
Secure Overwrite — Removes symlinks before overwriting to prevent symlink attacks
Overwrite Policies — Error, skip, or overwrite existing files
Filter Callback — Extract only the files you want
Two-Pass Mode — Validate everything before writing anything
Permission Stripping — Removes setuid/setgid bits on Unix

Installation

Rust:

[dependencies]
safe_unzip = "0.1"

Python:

pip install safe-unzip

Python Bindings

The Python bindings are thin wrappers over the Rust implementation via PyO3. This means:

✅ Identical security guarantees — same code path, same validation
✅ Identical limits — same defaults (1GB total, 10K files, 100MB per file)
✅ Identical semantics — same error conditions, same behavior
✅ No re-implementation — Python calls Rust directly, no logic duplication

Security reviewers: the Python API is a direct binding, not a port.

Quick Start

use safe_unzip::extract_file;

// Extract with safe defaults
let report = extract_file("/var/uploads", "archive.zip")?;
println!("Extracted {} files ({} bytes)", 
    report.files_extracted, 
    report.bytes_written
);

Usage Examples

Basic Extraction

use safe_unzip::Extractor;

let report = Extractor::new("/var/uploads")?
    .extract_file("archive.zip")?;

Create Destination if Missing

use safe_unzip::Extractor;

// Extractor::new() errors if destination doesn't exist (catches typos)
// Extractor::new_or_create() creates it automatically
let report = Extractor::new_or_create("/var/uploads/new_folder")?
    .extract_file("archive.zip")?;

// The convenience functions (extract_file, extract) also create automatically
use safe_unzip::extract_file;
extract_file("/var/uploads/new_folder", "archive.zip")?;

Custom Limits (Prevent Zip Bombs)

use safe_unzip::{Extractor, Limits};

let report = Extractor::new("/var/uploads")?
    .limits(Limits {
        max_total_bytes: 500 * 1024 * 1024,  // 500 MB total
        max_file_count: 1_000,                // Max 1000 files
        max_single_file: 50 * 1024 * 1024,   // 50 MB per file
        max_path_depth: 10,                   // No deeper than 10 levels
    })
    .extract_file("archive.zip")?;

Filter by Extension

use safe_unzip::Extractor;

// Only extract images
let report = Extractor::new("/var/uploads")?
    .filter(|entry| {
        entry.name.ends_with(".png") || 
        entry.name.ends_with(".jpg") ||
        entry.name.ends_with(".gif")
    })
    .extract_file("archive.zip")?;

println!("Extracted {} images, skipped {} other files",
    report.files_extracted,
    report.entries_skipped
);

Overwrite Policies

use safe_unzip::{Extractor, OverwritePolicy};

// Skip files that already exist
let report = Extractor::new("/var/uploads")?
    .overwrite(OverwritePolicy::Skip)
    .extract_file("archive.zip")?;

// Or overwrite them
let report = Extractor::new("/var/uploads")?
    .overwrite(OverwritePolicy::Overwrite)
    .extract_file("archive.zip")?;

// Default: Error if file exists
let report = Extractor::new("/var/uploads")?
    .overwrite(OverwritePolicy::Error)  // This is the default
    .extract_file("archive.zip")?;

Symlink Policies

use safe_unzip::{Extractor, SymlinkPolicy};

// Default: silently skip symlinks
let report = Extractor::new("/var/uploads")?
    .symlinks(SymlinkPolicy::Skip)
    .extract_file("archive.zip")?;

// Or reject archives containing symlinks
let report = Extractor::new("/var/uploads")?
    .symlinks(SymlinkPolicy::Error)
    .extract_file("archive.zip")?;

Extraction Modes

Mode	Speed	On Failure	Use When
`Streaming` (default)	Fast (1 pass)	Partial files remain	Speed matters; you'll clean up on error
`ValidateFirst`	Slower (2 passes)	No files if validation fails	Can't tolerate partial state

⚠️ Neither mode is truly atomic. If extraction fails mid-write (e.g., disk full), partial files remain regardless of mode. ValidateFirst only prevents writes when validation fails (bad paths, limits exceeded), not when I/O fails during extraction.

use safe_unzip::{Extractor, ExtractionMode};

// Two-pass extraction:
// 1. Validate ALL entries (no disk writes)
// 2. Extract (only if validation passed)
let report = Extractor::new("/var/uploads")?
    .mode(ExtractionMode::ValidateFirst)
    .extract_file("untrusted.zip")?;

Use ValidateFirst when you can't tolerate partial state from malicious archives. Use Streaming (default) when speed matters and you can clean up on error.

Extracting from Memory

use safe_unzip::Extractor;
use std::io::Cursor;

let zip_bytes: Vec<u8> = download_zip_somehow();
let cursor = Cursor::new(zip_bytes);

let report = Extractor::new("/var/uploads")?
    .extract(cursor)?;

Security Model

Threat	Attack Vector	Defense
Zip Slip	Entry named `../../etc/cron.d/pwned`	`path_jail` validates every path
Zip Bomb (size)	42KB → 4PB expansion	`max_total_bytes` limit + streaming enforcement
Zip Bomb (count)	1 million empty files	`max_file_count` limit
Zip Bomb (lying)	Declared 1KB, decompresses to 1GB	Strict size reader detects mismatch
Symlink Escape	Symlink to `/etc/passwd`	Skip or reject symlinks
Symlink Overwrite	Create symlink, then overwrite target	Symlinks removed before overwrite
Path Depth	`a/b/c/.../1000levels`	`max_path_depth` limit
Invalid Filename	Control chars, `CON`, `NUL`	Filename sanitization
Overwrite	Replace sensitive files	`OverwritePolicy::Error` default
Setuid	Create setuid executables	Permission bits stripped

Default Limits

Limit	Default	Description
`max_total_bytes`	1 GB	Total uncompressed size
`max_file_count`	10,000	Number of files
`max_single_file`	100 MB	Largest single file
`max_path_depth`	50	Directory nesting depth

Error Handling

use safe_unzip::{extract_file, Error};

match extract_file("/var/uploads", "archive.zip") {
    Ok(report) => {
        println!("Success: {} files", report.files_extracted);
    }
    Err(Error::PathEscape { entry, detail }) => {
        eprintln!("Blocked path traversal in '{}': {}", entry, detail);
    }
    Err(Error::TotalSizeExceeded { limit, would_be }) => {
        eprintln!("Archive too large: {} bytes (limit: {})", would_be, limit);
    }
    Err(Error::FileTooLarge { entry, size, limit }) => {
        eprintln!("File '{}' too large: {} bytes (limit: {})", entry, size, limit);
    }
    Err(Error::FileCountExceeded { limit }) => {
        eprintln!("Too many files (limit: {})", limit);
    }
    Err(Error::AlreadyExists { path }) => {
        eprintln!("File already exists: {}", path);
    }
    Err(Error::InvalidFilename { entry }) => {
        eprintln!("Invalid filename: {}", entry);
    }
    Err(e) => {
        eprintln!("Extraction failed: {}", e);
    }
}

Limitations

Format Limitations

Zip format only — Tar/gzip support planned for v0.2
Requires seekable input — No stdin streaming (zip format requires reading the central directory at the end of the file)
No password-protected zips — Use the zip crate directly for encrypted archives

Extraction Behavior

Partial state in Streaming mode — If extraction fails mid-way, already-extracted files remain on disk. Use ExtractionMode::ValidateFirst to validate before writing.
Filters not applied during validation — In ValidateFirst mode, limits are checked against ALL entries. Filtered entries still count toward limits. This is conservative: validation may reject archives that would succeed with filtering.

Security Scope

These threats are not fully addressed (by design or complexity):

Limitation	Reason
Case-insensitive collisions	On Windows/macOS, `File.txt` and `file.txt` map to the same file. We don't track extracted names to detect this.
Unicode normalization	`café` (NFC) vs `café` (NFD) appear identical but are different bytes. Full normalization requires ICU.
TOCTOU race conditions	Between path validation and file creation, a symlink could theoretically be created. Mitigated by secure overwrite, but not fully atomic.
Sparse file attacks	Not applicable to zip format.
Hard links	Zip format doesn't support hard links.
Device files	Zip format doesn't support special device files.

Filename Restrictions

These filenames are rejected for security:

Control characters (including null bytes)
Backslashes (\) — prevents Windows path separator confusion
Paths longer than 1024 bytes
Path components longer than 255 bytes
Windows reserved names: CON, PRN, AUX, NUL, COM1-9, LPT1-9

License

MIT OR Apache-2.0

safe_unzip 0.1.1