s3-unspool
s3-unspool is a Rust crate for fast, streaming extraction of large ZIP
archives from S3 into S3 prefixes.
This crate README focuses on the Rust API surface and the behavior a library
consumer needs to know before embedding s3-unspool.
The crate is designed for large archives and environments with limited local
storage. It reads the source ZIP with ranged S3 GetObject calls, and streams
extracted files directly into S3 PutObject requests or a local directory. The
crate also includes zip helpers that stream a local directory or existing S3
prefix into a local or S3 ZIP and embed the catalog used by later incremental
extracts.
For CLI usage, benchmark tooling, repository layout, and economics discussion, see the project README.
Install
Examples
These examples show the most common S3 workflows. Local endpoint combinations
are available through the same crate and through the s3-unspool CLI.
Extract an S3 ZIP to a Destination Prefix
Use sync_zip_to_s3 when the ZIP already exists in S3. The source archive is
read with ranged S3 requests, and missing or changed entries are streamed
directly into the destination prefix.
use BehaviorVersion;
use Client;
use ;
async
Extract Selected ZIP Entries
Use SyncOptions::with_selection() when only part of an archive should be restored.
Selection patterns use gitignore-style syntax and are matched against normalized
ZIP paths before source range planning, so s3-unspool only plans the source
blocks the selection requires. Exclude-only selections restore every
non-excluded ZIP entry.
use BehaviorVersion;
use Client;
use ;
async
Selected extracts cannot be combined with delete_extra_objects(), because
unselected destination objects are outside the restore scope.
Upload a Directory as a Cataloged ZIP
Use upload_directory_zip_to_s3 when you want the crate to produce the source
ZIP and embed the catalog used by later incremental extracts. Empty local
directories are written as ZIP directory entries.
use BehaviorVersion;
use Client;
use ;
async
Upload an S3 Prefix as a Cataloged ZIP
Use zip_s3_prefix_to_s3 when the source files already live in S3 and should be
snapshotted into a ZIP object without local object storage. Use
zip_s3_prefix_to_file when the ZIP destination should be a local file instead.
use BehaviorVersion;
use Client;
use ;
async
Behavior
The high-level extraction contract is:
- Reads source ZIP data with ranged S3
GetObjectrequests. - Lists the destination prefix once with
ListObjectsV2. - Uses listed destination ETags instead of per-object
HeadObjectcalls. - Uploads missing files with
If-None-Match: *. - Uploads changed files with
If-Match: <listed destination ETag>. - Can selectively extract gitignore-style ZIP path patterns with
UnzipSelection; selection is applied before source range planning. - Optionally deletes destination objects that are not present in the ZIP.
- Supports Stored, Deflate, and Zstandard method 93 ZIP entries when default features are enabled.
- Uploads generated source ZIPs with S3 multipart upload.
- Uploads existing S3 prefixes into generated ZIPs without local object storage.
- Preserves ZIP directory entries and zero-byte S3 folder marker objects.
- Emits optional progress events to handlers configured on upload options.
- Can force the fallback extract-and-hash path with
SyncOptions::force_hash_comparison(). - Can fail fast on destination write races with
SyncOptions::fail_on_conflict()orLocalZipSyncOptions::fail_on_conflict(). - Keeps source ZIP blocks in a bounded memory window and replays cached blocks
across destination
PutObjectretries when they are still resident. - Exposes
SyncOptions::with_put_concurrency()andSyncOptions::with_put_retry_policy()for destination write backoff, including shared throttling for S3SlowDown.
Required S3 Permissions
Extraction needs:
| Scope | Permission | Why |
|---|---|---|
| Source ZIP object | s3:GetObject |
Read ZIP metadata and ranged source bytes. |
| Destination bucket | s3:ListBucket |
List destination keys and ETags once. |
| Destination prefix | s3:PutObject |
Write missing and changed objects. |
| Destination prefix | s3:GetObject |
Authorize conditional overwrites with If-Match. |
| Destination prefix | s3:DeleteObject |
Only needed when delete_extra_objects() is enabled. |
S3-prefix upload additionally needs s3:ListBucket for the source bucket,
s3:GetObject for the source prefix, and multipart-upload write permissions for
the destination ZIP object.
The destination s3:GetObject permission is required even though
s3-unspool does not issue per-file destination HeadObject requests or read
destination object bodies. S3 authorizes PutObject requests with
If-Match: <etag> against object-read permission; without destination
s3:GetObject, changed files are rejected with AccessDenied.
Advanced Usage
Use sync_zip_to_s3_with_clients when source ranged reads and destination
streaming writes should use different S3 client configuration. This is useful
for high-concurrency extraction, where a destination request body can pause
while waiting for planned ZIP bytes. Configure the destination client with AWS
SDK upload stalled-stream protection relaxed or disabled, and keep download
stalled-stream protection enabled for source reads.
Use inspect_s3_zip to read source ZIP size and file count before choosing
memory settings. adaptive_source_get_concurrency and AdaptiveSourceWindow
can then derive scheduler settings for memory-bounded runtimes such as Lambda.
ZIPs created by upload_directory_zip_to_s3 include an embedded catalog at
.s3-unspool/catalog.v1.json. The catalog stores each file path and MD5 digest
so later extracts can skip unchanged files before decompressing them.
Generated ZIPs use Deflate for regular file entries by default. With default
features enabled, use with_compression(ZipCompression::Zstd) on the relevant
ZIP option type to write Zstandard method 93 entries. Use
default-features = false to compile without Zstd support. Zstd-in-ZIP support
is not universal in OS-native ZIP tools, so prefer Deflate when broad
compatibility matters.
Directory markers are preserved explicitly. ZIP directory entries such as
assets/empty/ extract to zero-byte S3 objects with trailing-slash keys, and
empty local directories or zero-byte S3 keys ending in / upload as ZIP
directory entries. Nonzero S3 objects ending in / are rejected as ambiguous.
Assumptions
- Destination objects are written with single-part
PutObject. - Destination ETags are MD5 hashes of object content.
- Multipart destination objects and SSE-C destination ETags are out of scope for comparison.
- Destination writes use single
PutObjectrequests, not multipart upload. - S3-prefix upload rejects nonzero objects whose keys end in
/.
The CLI and Lambda harness live in the repository workspace, but they are not
included in the published s3-unspool crate.