Expand description
S3 backend for the ObjectStore trait.
S3Store wraps aws-sdk-s3. The SDK owns SigV4, retries, connection
pooling, and timeout policy; this module owns the URL → SDK config
translation, the error-code classifier ([classify]), and the
hand-rolled multipart download orchestrator that the SDK does not
provide.
§Key composition
S3Store does not auto-prepend the RemoteUrl prefix. Trait
keys are byte-prefix per the contract on
ObjectStore::list; the URL prefix is
a repository concern and is composed by callers that build keys like
<prefix>/refs/.../<sha>.bundle.
§Conditional writes
put_if_absent uses
If-None-Match: "*". S3 returns either 412 (PreconditionFailed)
when the key already exists or 409 (ConditionalRequestConflict)
when two PUTs race. Both collapse to Ok(false).
§Size limits
AWS caps a single PutObject body at [SINGLE_PUT_LIMIT_BYTES]
(5 GiB) and a multipart upload at [S3_MAX_PARTS] (10 000) parts;
the per-object ceiling is 5 TiB. The helper auto-promotes uploads
above [super::multipart::MULTIPART_PUT_THRESHOLD] onto the
multipart path, so callers do not have to reason about the 5 GiB
single-PUT cutoff. The upload path is not resumable across
process death — see the README “Known limitations” section.
§Atomic get_to_file
Both the small-object and multipart download paths write to a sibling
tempfile::NamedTempFile and rename on success so a partial
failure cannot leave a corrupt destination for the unbundle step.
Every GET carries If-Match: <etag> derived from the preceding
HeadObject call. If the object is overwritten between head and
the body download, S3 returns 412 and get_to_file retries once
with a fresh head/ETag. After one retry the 412 propagates as
ObjectStoreError::PreconditionFailed.
§HTTP transport tuning
aws-sdk-s3’s default HTTP client keeps idle pooled connections
indefinitely, so a pooled connection to a rotated VIP would wedge
an in-flight request until the OS-level TCP retransmit timeout
fires (~15 minutes on Linux). S3Store::from_remote_url installs
a custom HTTP client built via aws_smithy_http_client::Builder
with [POOL_IDLE_TIMEOUT] bounded to 30 s, so a rotation costs at
most one short-circuited request rather than minutes of wedged
transfer. Tracking issue: #26.
Pool-idle alone does not bound a hot pooled connection — one that
was used within the last 30 s but has since become stuck — and the
412 retry in ObjectStore::get_to_file is a deliberate-server-
response retry, so forcing a fresh connection there does not help.
Instead, the SDK’s aws_config::timeout::TimeoutConfig is given
a [READ_TIMEOUT] so a stuck request fails fast and the SDK’s
internal retry layer can pick a fresh one. connect_timeout is
left at the SDK default (3.1 s, already aggressive). Tracking
issue: #26.
Note: smithy’s read_timeout resolves the HTTP connector future at
“response-headers received.” That bounds:
- Uploads in full — the connector future cannot resolve until
the request body is sent and the response status arrives, so a
stuck upload trips at [
READ_TIMEOUT].put_bodytherefore overrides the timeout per-operation so large bundle uploads are not cut off at 30 s. - Downloads only up to time-to-first-byte. Once response
headers arrive the future resolves; subsequent body-chunk reads
are not bounded by
read_timeout. A peer that wedges mid-body on a GET (e.g. a stuck multipart range) is still subject to the pool-idle / TCP-keepalive layers, but not toREAD_TIMEOUT. Lesson #2 indocs/development/lessons_learned.mdcovers this.
TCP keepalive (the second knob suggested in #27) is not wired
on the S3 path: aws-smithy-http-client 1.1.12’s public Builder
/ ConnectorBuilder API exposes pool_idle_timeout but does not
expose tcp_keepalive. The dominant DNS-rotation failure in #26 is
pool reuse of a dead VIP, which pool_idle_timeout already fixes;
the gap relative to the Azure backend (which uses reqwest and
gets keepalive for free) is documented in CHANGELOG.md.
§Multipart-upload lifetime safety
S3 retains uncompleted multipart uploads indefinitely without an
explicit lifecycle rule, so a future dropped between
CreateMultipartUpload and CompleteMultipartUpload orphans the
upload-id and bills the caller for the staged parts (issues #169,
#171). S3Store::start_multipart_upload therefore hands back a
[MultipartUploadGuard] that owns the upload-id and best-effort
issues AbortMultipartUpload on Drop; finish_multipart_upload
is the only call site that may disarm the guard.
Future contributors must not introduce an early ?-return
between obtaining the upload-id and constructing the
[MultipartUploadGuard] inside start_multipart_upload, nor
between start_multipart_upload and the matching
finish_multipart_upload: a bare upload-id outside the guard
reintroduces the leak the guard exists to prevent. (The
ok_or_else for a missing upload-id field on the SDK response is
benign — there is no upload-id to abort.)
Azure has no equivalent need — uncommitted blocks auto-expire after
seven days (azure.rs).
§Stdout discipline
Per .claude/rules/protocol-stdout.md, this module never writes to
stdout. Diagnostics go through tracing (which the helper binaries
configure to write to stderr).
Structs§
- S3Store
- Production
ObjectStorebacked byaws-sdk-s3.