Skip to main content

Crate sulfite

Crate sulfite 

Source
Expand description

§sulfite

Crates.io Docs.rs License

§Overview

sulfite is a high-level S3 client built on AWS SDK for Rust for even better ease of use, reliability, and bandwidth saturation (>50 Gbps).

The name: SO3^2-, an anion, implying a companion to some other cation (application), is commonly used as a preservative in wines and dried fruits (preserve to S3). It’s S3 with an O in the middle, a play on oxidization.

§Motivation

The AWS SDK is a little low-level for users to take advantage of the concurrency & parallelism, with the following challenges:

  1. You need to orchestrate the parallel multipart download & upload for large files.
  2. The built-in retry settings are too basic (limited to HTTP status codes, none for bytestream errors), and we allow installing higher-level retries.
  3. The async API doesn’t agree well with the filesystem for high-throughput operations, when it comes to streaming small chunks from/to disk.

To address them, we provide implementations for the parallel multipart download & upload, and higher-level retries. We also make sure the on-disk file is adequately buffered to avoid async-sync overhead.

Low-level accessS3Client exposes the underlying AWS SDK client as the public inner field so you can call SDK operations not covered by this crate.

§Testing

Integration tests use LocalStack for S3.

  • Local: Start LocalStack, then run the ignored integration tests:
    docker run --rm -it -p 4566:4566 -e SERVICES=s3 localstack/localstack
    cargo test -p sulfite --test localstack -- --ignored
  • CI: GitHub Actions runs LocalStack as a service container and runs the same tests (see .github/workflows/ci.yml).

Structs§

CommonPrefixInfo
A common prefix from a list_objects_v2 response (delimiter-based “directory”).
ListObjectsV2PageIter
Page-by-page iterator for list_objects_v2. Yields one page at a time; retries are applied per page request, so a failure on one page does not invalidate the iterator. MaxKeys is not set (SDK default, typically 1000 keys per page).
ObjectInfo
Metadata for an S3 object (from HEAD or LIST).
RetryConfig
Configuration for retry behavior (max retries, strategy, and which client status codes to retry). Use RetryConfig::default for default retry behavior (no high-level retries).
S3Client
S3ClientConfig
Configuration for the underlying AWS S3 client (region, endpoint, credentials, timeouts).

Enums§

RetryStrategy
Configuration for how failed operations are retried.
S3Error

Constants§

DEFAULT_READ_TIMEOUT
Default read timeout in seconds for the underlying HTTP client (boto default).
DEFAULT_RETRIABLE_CLIENT_STATUS_CODES
Default HTTP status codes treated as retriable client errors (408 Request Timeout, 429 Too Many Requests). Error code SlowDown is also retried.
DEFAULT_RETRIABLE_CLIENT_STATUS_CODES_STR
Comma-separated default for CLI; must match DEFAULT_RETRIABLE_CLIENT_STATUS_CODES.

Functions§

generate_random_hex
Returns a string of n_digits random lowercase hex characters (e.g. for temp file suffixes).