Expand description
§sulfite
§Overview
sulfite is a high-level S3 client built on AWS SDK for Rust for even better ease of use, reliability, and bandwidth saturation (>50 Gbps).
The name: SO3^2-, an anion, implying a companion to some other cation (application), is commonly used as a preservative in wines and dried fruits (preserve to S3). It’s S3 with an O in the middle, a play on oxidization.
§Motivation
The AWS SDK is a little low-level for users to take advantage of the concurrency & parallelism, with the following challenges:
- You need to orchestrate the parallel multipart download & upload for large files.
- The built-in retry settings are too basic (limited to HTTP status codes, none for bytestream errors), and we allow installing higher-level retries.
- The async API doesn’t agree well with the filesystem for high-throughput operations, when it comes to streaming small chunks from/to disk.
To address them, we provide implementations for the parallel multipart download & upload, and higher-level retries. We also make sure the on-disk file is adequately buffered to avoid async-sync overhead.
Low-level access — S3Client exposes the underlying AWS SDK client as the public inner field so you can call SDK operations not covered by this crate.
§Testing
Integration tests use LocalStack for S3.
- Local: Start LocalStack, then run the ignored integration tests:
docker run --rm -it -p 4566:4566 -e SERVICES=s3 localstack/localstack cargo test -p sulfite --test localstack -- --ignored - CI: GitHub Actions runs LocalStack as a service container and runs the same tests (see
.github/workflows/ci.yml).
Structs§
- Common
Prefix Info - A common prefix from a list_objects_v2 response (delimiter-based “directory”).
- List
Objects V2Page Iter - Page-by-page iterator for list_objects_v2. Yields one page at a time; retries are applied per page request, so a failure on one page does not invalidate the iterator. MaxKeys is not set (SDK default, typically 1000 keys per page).
- Object
Info - Metadata for an S3 object (from HEAD or LIST).
- Retry
Config - Configuration for retry behavior (max retries, strategy, and which client status codes to retry).
Use
RetryConfig::defaultfor default retry behavior (no high-level retries). - S3Client
- S3Client
Config - Configuration for the underlying AWS S3 client (region, endpoint, credentials, timeouts).
Enums§
- Retry
Strategy - Configuration for how failed operations are retried.
- S3Error
Constants§
- DEFAULT_
READ_ TIMEOUT - Default read timeout in seconds for the underlying HTTP client (boto default).
- DEFAULT_
RETRIABLE_ CLIENT_ STATUS_ CODES - Default HTTP status codes treated as retriable client errors (408 Request Timeout, 429 Too Many Requests). Error code SlowDown is also retried.
- DEFAULT_
RETRIABLE_ CLIENT_ STATUS_ CODES_ STR - Comma-separated default for CLI; must match DEFAULT_RETRIABLE_CLIENT_STATUS_CODES.
Functions§
- generate_
random_ hex - Returns a string of
n_digitsrandom lowercase hex characters (e.g. for temp file suffixes).