s3-unspool 0.1.0-beta.6

Fast streaming extraction of large ZIP archives from S3 into S3 prefixes with conditional writes.
Documentation

s3-unspool

s3-unspool is a Rust crate for fast, streaming extraction of large ZIP archives from S3 into S3 prefixes.

This crate README focuses on the Rust API surface and the behavior a library consumer needs to know before embedding s3-unspool.

The crate is designed for large archives and environments with limited local storage. It reads the source ZIP with ranged S3 GetObject calls, and streams extracted files directly into S3 PutObject requests or a local directory. The crate also includes zip helpers that stream a local directory or existing S3 prefix into a local or S3 ZIP and embed the catalog used by later incremental extracts.

For CLI usage, benchmark tooling, repository layout, and economics discussion, see the project README.

Install

cargo add s3-unspool

Examples

These examples show the most common S3 workflows. Local endpoint combinations are available through the same crate and through the s3-unspool CLI.

Extract an S3 ZIP to a Destination Prefix

Use sync_zip_to_s3 when the ZIP already exists in S3. The source archive is read with ranged S3 requests, and missing or changed entries are streamed directly into the destination prefix.

use aws_config::BehaviorVersion;
use aws_sdk_s3::Client;
use s3_unspool::{S3Object, S3Prefix, SyncOptions, sync_zip_to_s3};

#[tokio::main]
async fn main() -> s3_unspool::Result<()> {
    let config = aws_config::load_defaults(BehaviorVersion::latest()).await;
    let client = Client::new(&config);

    let extract = SyncOptions::new(
        S3Object::parse("s3://my-bucket/releases/site.zip")?,
        S3Prefix::parse("s3://my-bucket/www/")?,
    )
    .delete_extra_objects();

    let report = sync_zip_to_s3(&client, extract).await?;
    println!("changed files: {}", report.summary.uploaded_changed);

    Ok(())
}

Extract Selected ZIP Entries

Use SyncOptions::with_selection() when only part of an archive should be restored. Selection patterns use gitignore-style syntax and are matched against normalized ZIP paths before source range planning, so s3-unspool only plans the source blocks the selection requires. Exclude-only selections restore every non-excluded ZIP entry.

use aws_config::BehaviorVersion;
use aws_sdk_s3::Client;
use s3_unspool::{S3Object, S3Prefix, SyncOptions, UnzipSelection, sync_zip_to_s3};

#[tokio::main]
async fn main() -> s3_unspool::Result<()> {
    let config = aws_config::load_defaults(BehaviorVersion::latest()).await;
    let client = Client::new(&config);

    let extract = SyncOptions::new(
        S3Object::parse("s3://my-bucket/releases/site.zip")?,
        S3Prefix::parse("s3://my-bucket/www/")?,
    )
    .with_selection(
        UnzipSelection::new()
            .include("index.md")
            .include("docs/**/*.md")
            .exclude("docs/drafts/**"),
    );

    let report = sync_zip_to_s3(&client, extract).await?;
    println!("processed entries: {}", report.summary.zip_files);

    Ok(())
}

Selected extracts cannot be combined with delete_extra_objects(), because unselected destination objects are outside the restore scope.

Upload a Directory as a Cataloged ZIP

Use upload_directory_zip_to_s3 when you want the crate to produce the source ZIP and embed the catalog used by later incremental extracts. Empty local directories are written as ZIP directory entries.

use aws_config::BehaviorVersion;
use aws_sdk_s3::Client;
use s3_unspool::{S3Object, UploadOptions, upload_directory_zip_to_s3};

#[tokio::main]
async fn main() -> s3_unspool::Result<()> {
    let config = aws_config::load_defaults(BehaviorVersion::latest()).await;
    let client = Client::new(&config);

    let upload = UploadOptions::new(
        "./site",
        S3Object::parse("s3://my-bucket/releases/site.zip")?,
    );
    let report = upload_directory_zip_to_s3(&client, upload).await?;

    println!("uploaded {} files", report.files);

    Ok(())
}

Upload an S3 Prefix as a Cataloged ZIP

Use zip_s3_prefix_to_s3 when the source files already live in S3 and should be snapshotted into a ZIP object without local object storage. Use zip_s3_prefix_to_file when the ZIP destination should be a local file instead.

use aws_config::BehaviorVersion;
use aws_sdk_s3::Client;
use s3_unspool::{S3Object, S3Prefix, S3PrefixUploadOptions, zip_s3_prefix_to_s3};

#[tokio::main]
async fn main() -> s3_unspool::Result<()> {
    let config = aws_config::load_defaults(BehaviorVersion::latest()).await;
    let client = Client::new(&config);

    let upload = S3PrefixUploadOptions::new(
        S3Prefix::parse("s3://my-bucket/www/")?,
        S3Object::parse("s3://my-bucket/releases/site.zip")?,
    );
    let report = zip_s3_prefix_to_s3(&client, upload).await?;

    println!("uploaded {} files", report.files);

    Ok(())
}

Behavior

The high-level extraction contract is:

  • Reads source ZIP data with ranged S3 GetObject requests.
  • Lists the destination prefix once with ListObjectsV2.
  • Uses listed destination ETags instead of per-object HeadObject calls.
  • Uploads missing files with If-None-Match: *.
  • Uploads changed files with If-Match: <listed destination ETag>.
  • Can selectively extract gitignore-style ZIP path patterns with UnzipSelection; selection is applied before source range planning.
  • Optionally deletes destination objects that are not present in the ZIP.
  • Supports Stored, Deflate, and Zstandard method 93 ZIP entries when default features are enabled.
  • Uploads generated source ZIPs with S3 multipart upload.
  • Uploads existing S3 prefixes into generated ZIPs without local object storage.
  • Preserves ZIP directory entries and zero-byte S3 folder marker objects.
  • Emits optional progress events to handlers configured on upload options.
  • Can force the fallback extract-and-hash path with SyncOptions::force_hash_comparison().
  • Can fail fast on destination write races with SyncOptions::fail_on_conflict() or LocalZipSyncOptions::fail_on_conflict().
  • Keeps source ZIP blocks in a bounded memory window and replays cached blocks across destination PutObject retries when they are still resident.
  • Exposes SyncOptions::with_put_concurrency() and SyncOptions::with_put_retry_policy() for destination write backoff, including shared throttling for S3 SlowDown.

Required S3 Permissions

Extraction needs:

Scope Permission Why
Source ZIP object s3:GetObject Read ZIP metadata and ranged source bytes.
Destination bucket s3:ListBucket List destination keys and ETags once.
Destination prefix s3:PutObject Write missing and changed objects.
Destination prefix s3:GetObject Authorize conditional overwrites with If-Match.
Destination prefix s3:DeleteObject Only needed when delete_extra_objects() is enabled.

S3-prefix upload additionally needs s3:ListBucket for the source bucket, s3:GetObject for the source prefix, and multipart-upload write permissions for the destination ZIP object.

The destination s3:GetObject permission is required even though s3-unspool does not issue per-file destination HeadObject requests or read destination object bodies. S3 authorizes PutObject requests with If-Match: <etag> against object-read permission; without destination s3:GetObject, changed files are rejected with AccessDenied.

Advanced Usage

Use sync_zip_to_s3_with_clients when source ranged reads and destination streaming writes should use different S3 client configuration. This is useful for high-concurrency extraction, where a destination request body can pause while waiting for planned ZIP bytes. Configure the destination client with AWS SDK upload stalled-stream protection relaxed or disabled, and keep download stalled-stream protection enabled for source reads.

Use inspect_s3_zip to read source ZIP size and file count before choosing memory settings. adaptive_source_get_concurrency and AdaptiveSourceWindow can then derive scheduler settings for memory-bounded runtimes such as Lambda.

ZIPs created by upload_directory_zip_to_s3 include an embedded catalog at .s3-unspool/catalog.v1.json. The catalog stores each file path and MD5 digest so later extracts can skip unchanged files before decompressing them.

Generated ZIPs use Deflate for regular file entries by default. With default features enabled, use with_compression(ZipCompression::Zstd) on the relevant ZIP option type to write Zstandard method 93 entries. Use default-features = false to compile without Zstd support. Zstd-in-ZIP support is not universal in OS-native ZIP tools, so prefer Deflate when broad compatibility matters.

Directory markers are preserved explicitly. ZIP directory entries such as assets/empty/ extract to zero-byte S3 objects with trailing-slash keys, and empty local directories or zero-byte S3 keys ending in / upload as ZIP directory entries. Nonzero S3 objects ending in / are rejected as ambiguous.

Assumptions

  • Destination objects are written with single-part PutObject.
  • Destination ETags are MD5 hashes of object content.
  • Multipart destination objects and SSE-C destination ETags are out of scope for comparison.
  • Destination writes use single PutObject requests, not multipart upload.
  • S3-prefix upload rejects nonzero objects whose keys end in /.

The CLI and Lambda harness live in the repository workspace, but they are not included in the published s3-unspool crate.