s-zip
███████╗ ███████╗██╗██████╗
██╔════╝ ╚══███╔╝██║██╔══██╗
███████╗█████╗ ███╔╝ ██║██████╔╝
╚════██║╚════╝ ███╔╝ ██║██╔═══╝
███████║ ███████╗██║██║
╚══════╝ ╚══════╝╚═╝╚═╝
s-zip is a streaming ZIP reader and writer designed for backend systems that need
to process large archives with minimal memory usage.
The focus is not on end-user tooling, but on providing a reliable ZIP building block for servers, batch jobs, and data pipelines.
Why s-zip?
Most ZIP libraries assume small files or in-memory buffers.
s-zip is built around streaming from day one.
- Constant memory usage
- Suitable for very large files
- Works well in containers and memory-constrained environments
- Designed for backend and data-processing workloads
Key Features
- Streaming ZIP writer (no full buffering)
- Arbitrary writer support (File, Vec, network streams, etc.)
- Streaming ZIP reader with minimal memory footprint
- ZIP64 support for files >4GB
- Multiple compression methods: DEFLATE, Zstd (optional)
- Predictable memory usage: ~2-5 MB constant with 1MB buffer threshold
- High performance: Zstd 3x faster than DEFLATE with 11-27x better compression
- Rust safety guarantees
- Backend-friendly API
Non-goals
- Not a CLI replacement for zip/unzip
- Not focused on desktop or interactive usage
- Not optimized for small files convenience
Typical Use Cases
- Generating large ZIP exports on the server
- Packaging reports or datasets
- Data pipelines and batch jobs
- Infrastructure tools that require ZIP as an intermediate format
Performance Highlights
Based on comprehensive benchmarks (see BENCHMARK_RESULTS.md):
| Metric | DEFLATE level 6 | Zstd level 3 | Improvement |
|---|---|---|---|
| Speed (1MB) | 610 MiB/s | 2.0 GiB/s | 3.3x faster ⚡ |
| File Size (1MB compressible) | 3.16 KB | 281 bytes | 11x smaller 🗜️ |
| File Size (10MB compressible) | 29.97 KB | 1.12 KB | 27x smaller 🗜️ |
| Memory Usage | 2-5 MB constant | 2-5 MB constant | Same ✓ |
| CPU Usage | Moderate | Low-Moderate | Better ✓ |
Key Benefits:
- ✅ No temp files - Direct streaming saves disk I/O
- ✅ ZIP64 support for files >4GB
- ✅ Zstd compression: faster + smaller than DEFLATE
- ✅ Constant memory usage regardless of archive size
Quick Start
Add this to your Cargo.toml:
[]
= "0.3"
# Optional: Enable Zstd compression support
# s-zip = { version = "0.3", features = ["zstd-support"] }
Optional Features
zstd-support: Enables Zstd compression (method 93) for reading and writing ZIP files with better compression ratios. This adds thezstdcrate as a dependency.
Reading a ZIP file
use StreamingZipReader;
Writing a ZIP file
use StreamingZipWriter;
Custom compression level
use StreamingZipWriter;
let mut writer = with_compression?; // Max compression
// ... add files ...
writer.finish?;
Using Zstd compression (requires zstd-support feature)
use ;
Note: Zstd compression provides better compression ratios than DEFLATE but may have slower decompression on some systems. The reader will automatically detect and decompress Zstd-compressed entries when the zstd-support feature is enabled.
Using Arbitrary Writers (Advanced)
NEW in v0.3.0: s-zip now supports writing to any type that implements Write + Seek, not just files. This enables:
- In-memory ZIP creation (Vec, Cursor)
- Network streaming (TCP streams with buffering)
- Custom storage backends (S3, databases, etc.)
use StreamingZipWriter;
use Cursor;
⚠️ IMPORTANT - Memory Usage by Writer Type:
| Writer Type | Memory Usage | Best For |
|---|---|---|
File (StreamingZipWriter::new(path)) |
✅ ~2-5 MB constant | Large files, production use |
| Network streams (TCP, pipes) | ✅ ~2-5 MB constant | Streaming over network |
Vec/Cursor (from_writer()) |
⚠️ ENTIRE ZIP IN RAM | Small archives only (<100MB) |
⚠️ Critical Warning for Vec/Cursor:
When using Vec<u8> or Cursor<Vec<u8>> as the writer, the entire compressed ZIP file will be stored in memory. While the compressor still uses only ~2-5MB for its internal buffer, the final output accumulates in the Vec. Only use this for small archives or when you have sufficient RAM.
Recommended approach for large files:
- Use
StreamingZipWriter::new(path)to write to disk (constant ~2-5MB memory) - Use network streams for real-time transmission
- Reserve
Vec<u8>/Cursorfor small temporary ZIPs (<100MB)
The implementation uses a 1MB buffer threshold to periodically flush compressed data to the writer, keeping compression memory low (~2-5MB) for all writer types. However, in-memory writers like Vec<u8> will still accumulate the full output.
See examples/arbitrary_writer.rs for more examples.
Supported Compression Methods
| Method | Description | Default | Feature Flag | Best For |
|---|---|---|---|---|
| DEFLATE (8) | Standard ZIP compression | ✓ | Always available | Text, source code, JSON, XML, CSV, XLSX |
| Stored (0) | No compression | - | Always available | Already compressed files (JPG, PNG, MP4, PDF) |
| Zstd (93) | Modern compression algorithm | - | zstd-support |
All text/data files, logs, databases |
Compression Method Selection Guide
Use DEFLATE (default) when:
- ✅ Maximum compatibility required (all ZIP tools support it)
- ✅ Working with: text files, source code, JSON, XML, CSV, HTML, XLSX
- ✅ Standard ZIP format compliance needed
Use Zstd when:
- ⚡ Best performance: 3.3x faster compression, 11-27x better compression ratio
- ✅ Working with: server logs, database dumps, repetitive data, large text files
- ✅ Backend/internal systems (don't need old tool compatibility)
- ✅ Processing large volumes of data
Use Stored (no compression) when:
- ✅ Files are already compressed: JPEG, PNG, GIF, MP4, MOV, PDF, ZIP, GZ
- ✅ Need fastest possible archive creation
- ✅ CPU resources are limited
Performance Benchmarks
s-zip includes comprehensive benchmarks to compare compression methods:
# Run all benchmarks with Zstd support
# Or run individual benchmark suites
Benchmarks measure:
- Compression speed: Write throughput for different compression methods and levels
- Decompression speed: Read throughput for various compressed formats
- Data patterns: Highly compressible text, random data, and mixed workloads
- File sizes: From 1KB to 10MB to test scaling characteristics
- Multiple entries: Performance with 100+ files in a single archive
Results are saved to target/criterion/ with HTML reports showing detailed statistics, comparisons, and performance graphs.
Quick Comparison Results
File Size (1MB Compressible Data)
| Method | Compressed Size | Ratio | Speed |
|---|---|---|---|
| DEFLATE level 6 | 3.16 KB | 0.31% | ~610 MiB/s |
| DEFLATE level 9 | 3.16 KB | 0.31% | ~494 MiB/s |
| Zstd level 3 | 281 bytes | 0.03% | ~2.0 GiB/s ⚡ |
| Zstd level 10 | 358 bytes | 0.03% | ~370 MiB/s |
Key Insights:
- ✅ Zstd level 3 is 11x smaller and 3.3x faster than DEFLATE on repetitive data
- ✅ For 10MB data: Zstd = 1.12 KB vs DEFLATE = 29.97 KB (27x better!)
- ✅ Random data: All methods ~100% (both handle incompressible data efficiently)
- ✅ Memory: ~2-5 MB constant regardless of file size
- ✅ CPU: Zstd level 3 uses less CPU than DEFLATE level 9
💡 Recommendation: Use Zstd level 3 for best performance and compression. Only use DEFLATE when compatibility with older tools is required.
📊 Full Analysis: See BENCHMARK_RESULTS.md for detailed performance data including:
- Complete speed benchmarks (1KB to 10MB)
- Memory profiling
- CPU usage analysis
- Multiple compression levels comparison
- Random vs compressible data patterns
License
MIT License - see LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Author
Ton That Vu - @KSD-CO