Expand description
Azure Blob Storage backend for the ObjectStore trait.
AzureStore wraps azure_storage_blob. Like the S3 backend, this
module owns the URL → SDK config translation, the error-code
classifier ([classify]), and the credential resolution plumbing.
Unlike S3, the SDK already does parallel range downloads inside
BlobClient::download(), so there is no hand-rolled multipart
orchestrator (asymmetric with S3 by design).
§Authentication
The official azure_storage_blob 0.12 crate currently exposes only
Arc<dyn TokenCredential> (Entra ID) on its constructors. Azurite
does not implement Entra ID without an --oauth basic HTTPS setup,
and many production accounts still authenticate with shared keys.
To bridge both, we install our own [auth::SharedKeySigningPolicy]
as a per-try azure_core::http::policies::Policy and pass None
for the SDK’s credential parameter. The SDK then forwards every
request through our policy, which signs the request using the Azure
Storage shared-key v2 scheme. Tracking issue:
Azure/azure-sdk-for-rust#2975.
Resolution order for ?credential=<NAME> in the URL:
AZSTORE_<NAME>_KEY— base64 account key → shared-key signing.AZSTORE_<NAME>_CONNECTION_STRING— connection string withAccountName=/AccountKey=→ shared-key signing.AZSTORE_<NAME>_SAS— SAS query string appended verbatim to every outgoing request URL.
When no ?credential= flag is set we fall back to
azure_identity::DeveloperToolsCredential (env, workload identity,
managed identity, Azure CLI, …).
§Conditional writes
put_if_absent uses
If-None-Match: "*" (the SDK’s
BlockBlobClientUploadOptions::with_if_not_exists convenience).
Azure returns 409 (BlobAlreadyExists) or 412
(ConditionNotMet) for the contention case; both collapse to
Ok(false).
§Atomic get_to_file
Identical to the S3 path: head → tempfile → download(if_match) →
persist. The SDK’s download() aggregates parallel range fetches
internally, so no per-chunk semaphore here. A single retry with a
fresh ETag covers the head-then-GET race (412 mid-download).
§copy(src, dst)
azure_storage_blob 0.12 does not expose a BlobClient::copy_from_url
method (only BlockBlobClient::upload_blob_from_url, which requires
a SAS-tokened source URL or an x-ms-copy-source-authorization
header — neither integrates cleanly with our credential model). We
implement copy as a stream-through-tempfile round trip:
get_to_file writes src to a NamedTempFile, then put_path
uploads it to dst. Both legs already stream — get_to_file
consumes the SDK’s chunked download into the file without buffering
the body, and put_path switches to our explicit
stage_block + commit_block_list orchestrator (see
AzureStore::multipart_put_path) once the body crosses
[super::multipart::MULTIPART_PUT_THRESHOLD]. Peak in-flight bytes
are bounded by
[super::multipart::MULTIPART_PUT_MAX_CONCURRENCY] ×
[super::multipart::MULTIPART_PUT_PART_SIZE] regardless of blob
size, which matters for manage doctor’s duplicate-bundle
quarantine path (crate::manage::doctor::Doctor::evict_losing_bundle)
— that path can copy multi-GiB bundles. Zero-byte lock files still
round-trip fast: get_to_file short-circuits the GET on size == 0
and put_path issues a single zero-byte Put Blob. Body is
preserved; user metadata is not propagated, matching the S3 backend’s
CopyObject path which similarly carries only body bytes.
This is asymmetric with the S3 backend, which uses CopyObject for
a true server-side copy — Azure’s equivalent (Copy Blob,
Put Blob From URL) requires a SAS-signed source URL or an
x-ms-copy-source-authorization header that the 0.12 SDK does not
ergonomically expose. The download+reupload path is the safe
correct fallback until the SDK closes that gap.
§A note on Range and zero-byte blobs
A Range request against a zero-byte blob returns HTTP 416. We
never issue Range requests directly — BlobClient::download()
owns that — but the zero-size short-circuit in
get_to_file also avoids any download
SDK call against a known-empty blob, which sidesteps the issue
entirely.
§Size limits
Azure caps a block blob at 50 000 committed blocks (~4.75 TiB at
the SDK’s default block size) and a single Put Blob body at
5000 MiB; above [super::multipart::MULTIPART_PUT_THRESHOLD] the
helper switches to explicit stage_block + commit_block_list,
so callers do not have to reason about the single-call cutoff.
The upload path is not resumable across process death — see
the README “Known limitations” section.
§HTTP transport tuning
azure_core 0.35’s default transport keeps idle pooled connections
forever and never sets TCP keepalive, so a pooled connection to a
rotated VIP would hang an in-flight request until the OS-level TCP
retransmit timeout fires (~15 minutes on Linux). AzureStore
installs a custom reqwest::Client via Transport on
ClientOptions::transport with four bounds:
- [
POOL_IDLE_TIMEOUT] (30 s) — drops idle pooled connections before a typical DNS rotation makes them stale. - [
TCP_KEEPALIVE] (30 s) — detects a dead-but-not-closed TCP session in seconds rather than the 2-hour Linux default; covers hot pooled connections that pool-idle alone cannot. - [
CONNECT_TIMEOUT] (10 s) — bounds a fresh-connect attempt to a dead VIP rather than waiting on the OS connect timeout. - [
READ_TIMEOUT] (30 s) — per-read timeout that resets after a successful read, so a stuck transfer fails fast without limiting total body size.
Together these cap a DNS-rotation hang at tens of seconds rather
than minutes. The custom transport leaves
ClientOptions::per_try_policies (where the shared-key signing
lives) untouched — the SDK pipeline runs per-try policies
independently of the transport. Tracking issue: #26.
§Stdout discipline
Per .claude/rules/protocol-stdout.md, this module never writes to
stdout. Diagnostics go through tracing (which the helper binaries
configure to write to stderr).
Modules§
- auth
- Credential resolution and the shared-key / SAS signing policies for the Azure Blob backend.
Structs§
- Azure
Store - Production
ObjectStorebacked byazure_storage_blob.