safe_unzip
Zip extraction that won't ruin your day.
The Problem
Zip files can contain malicious paths that escape the extraction directory:
# Python 3.12, tested January 2025 — STILL VULNERABLE
# Extracts ../../etc/cron.d/pwned → /etc/cron.d/pwned 💀
This is Zip Slip, and it still affects Python in 2025.
"But didn't Python fix this?"
Sort of. Python added warnings and ZipInfo.filename sanitization in 2014. In Python 3.12+, there's a filter parameter:
# The "safe" way — but who knows this exists?
The problem: the safe option is opt-in. The default is still vulnerable. Most developers don't read the docs carefully enough to discover filter="data".
safe_unzip makes security the default, not an afterthought.
The Solution
use extract_file;
extract_file?;
// Err(PathEscape { entry: "../../etc/cron.d/pwned", ... })
# Python bindings — same safety
# Raises: PathEscapeError
Security is the default. No special flags, no opt-in safety. Every path is validated. Malicious archives are rejected, not extracted.
Why Not Just Use zip / zipfile?
Because archive extraction is a security boundary, and most libraries treat it as a convenience function.
| Library | Default Behavior | Safe Option |
|---|---|---|
Python zipfile |
Vulnerable | filter="data" (opt-in, obscure) |
Rust zip |
Vulnerable | Manual path validation |
safe_unzip |
Safe by default | N/A — always safe |
If you're extracting untrusted archives, you need a library designed for that threat model.
Who Should Use This
- Backend services handling user-uploaded zip files
- CI/CD systems unpacking third-party artifacts
- SaaS platforms with file import features
- Forensics / malware analysis pipelines
- Anything running as a privileged user
If your zip files only come from trusted sources you control, the standard zip crate is fine. If users can upload archives, use safe_unzip.
Features
- Zip Slip Protection — Path traversal attacks blocked via path_jail
- Zip Bomb Protection — Configurable limits on size, file count, and path depth
- Strict Size Enforcement — Catches files that decompress larger than declared
- Filename Sanitization — Blocks control characters and Windows reserved names
- Symlink Handling — Skip or reject symlinks (no symlink-based escapes)
- Secure Overwrite — Removes symlinks before overwriting to prevent symlink attacks
- Overwrite Policies — Error, skip, or overwrite existing files
- Filter Callback — Extract only the files you want
- Two-Pass Mode — Validate everything before writing anything
- Permission Stripping — Removes setuid/setgid bits on Unix
Installation
Rust:
[]
= "0.1"
Python:
Python Bindings
The Python bindings are thin wrappers over the Rust implementation via PyO3. This means:
- ✅ Identical security guarantees — same code path, same validation
- ✅ Identical limits — same defaults (1GB total, 10K files, 100MB per file)
- ✅ Identical semantics — same error conditions, same behavior
- ✅ No re-implementation — Python calls Rust directly, no logic duplication
Security reviewers: the Python API is a direct binding, not a port.
Quick Start
use extract_file;
// Extract with safe defaults
let report = extract_file?;
println!;
Usage Examples
Basic Extraction
use Extractor;
let report = new?
.extract_file?;
Create Destination if Missing
use Extractor;
// Extractor::new() errors if destination doesn't exist (catches typos)
// Extractor::new_or_create() creates it automatically
let report = new_or_create?
.extract_file?;
// The convenience functions (extract_file, extract) also create automatically
use extract_file;
extract_file?;
Custom Limits (Prevent Zip Bombs)
use ;
let report = new?
.limits
.extract_file?;
Filter by Extension
use Extractor;
// Only extract images
let report = new?
.filter
.extract_file?;
println!;
Overwrite Policies
use ;
// Skip files that already exist
let report = new?
.overwrite
.extract_file?;
// Or overwrite them
let report = new?
.overwrite
.extract_file?;
// Default: Error if file exists
let report = new?
.overwrite // This is the default
.extract_file?;
Symlink Policies
use ;
// Default: silently skip symlinks
let report = new?
.symlinks
.extract_file?;
// Or reject archives containing symlinks
let report = new?
.symlinks
.extract_file?;
Extraction Modes
| Mode | Speed | On Failure | Use When |
|---|---|---|---|
Streaming (default) |
Fast (1 pass) | Partial files remain | Speed matters; you'll clean up on error |
ValidateFirst |
Slower (2 passes) | No files if validation fails | Can't tolerate partial state |
⚠️ Neither mode is truly atomic. If extraction fails mid-write (e.g., disk full), partial files remain regardless of mode. ValidateFirst only prevents writes when validation fails (bad paths, limits exceeded), not when I/O fails during extraction.
use ;
// Two-pass extraction:
// 1. Validate ALL entries (no disk writes)
// 2. Extract (only if validation passed)
let report = new?
.mode
.extract_file?;
Use ValidateFirst when you can't tolerate partial state from malicious archives. Use Streaming (default) when speed matters and you can clean up on error.
Extracting from Memory
use Extractor;
use Cursor;
let zip_bytes: = download_zip_somehow;
let cursor = new;
let report = new?
.extract?;
Security Model
| Threat | Attack Vector | Defense |
|---|---|---|
| Zip Slip | Entry named ../../etc/cron.d/pwned |
path_jail validates every path |
| Zip Bomb (size) | 42KB → 4PB expansion | max_total_bytes limit + streaming enforcement |
| Zip Bomb (count) | 1 million empty files | max_file_count limit |
| Zip Bomb (lying) | Declared 1KB, decompresses to 1GB | Strict size reader detects mismatch |
| Symlink Escape | Symlink to /etc/passwd |
Skip or reject symlinks |
| Symlink Overwrite | Create symlink, then overwrite target | Symlinks removed before overwrite |
| Path Depth | a/b/c/.../1000levels |
max_path_depth limit |
| Invalid Filename | Control chars, CON, NUL |
Filename sanitization |
| Overwrite | Replace sensitive files | OverwritePolicy::Error default |
| Setuid | Create setuid executables | Permission bits stripped |
Default Limits
| Limit | Default | Description |
|---|---|---|
max_total_bytes |
1 GB | Total uncompressed size |
max_file_count |
10,000 | Number of files |
max_single_file |
100 MB | Largest single file |
max_path_depth |
50 | Directory nesting depth |
Error Handling
use ;
match extract_file
Limitations
Format Limitations
- Zip format only — Tar/gzip support planned for v0.2
- Requires seekable input — No stdin streaming (zip format requires reading the central directory at the end of the file)
- No password-protected zips — Use the
zipcrate directly for encrypted archives
Extraction Behavior
- Partial state in Streaming mode — If extraction fails mid-way, already-extracted files remain on disk. Use
ExtractionMode::ValidateFirstto validate before writing. - Filters not applied during validation — In
ValidateFirstmode, limits are checked against ALL entries. Filtered entries still count toward limits. This is conservative: validation may reject archives that would succeed with filtering.
Security Scope
These threats are not fully addressed (by design or complexity):
| Limitation | Reason |
|---|---|
| Case-insensitive collisions | On Windows/macOS, File.txt and file.txt map to the same file. We don't track extracted names to detect this. |
| Unicode normalization | café (NFC) vs café (NFD) appear identical but are different bytes. Full normalization requires ICU. |
| TOCTOU race conditions | Between path validation and file creation, a symlink could theoretically be created. Mitigated by secure overwrite, but not fully atomic. |
| Sparse file attacks | Not applicable to zip format. |
| Hard links | Zip format doesn't support hard links. |
| Device files | Zip format doesn't support special device files. |
Filename Restrictions
These filenames are rejected for security:
- Control characters (including null bytes)
- Backslashes (
\) — prevents Windows path separator confusion - Paths longer than 1024 bytes
- Path components longer than 255 bytes
- Windows reserved names:
CON,PRN,AUX,NUL,COM1-9,LPT1-9
License
MIT OR Apache-2.0