# Changelog
All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [Unreleased]
## [0.2.2] - 2026-05-24
Maintenance release. No library or CLI source changes — dependency
bumps and CI housekeeping only.
### Changed
- Bumped `aws-config` from 1.8.16 to 1.8.17.
- Bumped `aws-sdk-s3` from 1.132.0 to 1.133.0.
- Bumped `serde_json` from 1.0.149 to 1.0.150.
- Bumped `astral-tokio-tar` to 0.6.2 and cleaned the corresponding
`cargo deny` ignores.
- Bumped pinned GitHub Actions in the `actions` Dependabot group.
## [0.2.1] - 2026-05-17
First release cut end-to-end by the CI release pipeline. 0.2.0 was
manually published to crates.io to bootstrap Trusted Publisher
registration and so has no matching GitHub Release artifacts (no
pre-built binaries, no `.deb` / `.rpm` / `.apk` packages, no SLSA
provenance). 0.2.1 ships byte-identical library and CLI source plus
the full automated-release artifact set.
### Fixed
- Release workflow preflight version-parity check read every
`cargo metadata` `dependencies[]` entry, including
`cli/Cargo.toml`'s path-only dev-dep that opts the integration
tests into the library's `test-util` feature. The dev-dep has no
`version =` field, so the requirement came back as `*` and the
strip-and-compare against the tag version failed. Filter the
query to normal-kind dependencies only, materialise the result as
a JSON array, validate cardinality first, and emit case-specific
error messages for missing / multiple / drifted entries.
## [0.2.0] - 2026-05-16
### Added
- **Rustdoc check wired into `make pre-commit` and `make ci`.** New
`make doc-check` target (and `_pc-doc-check` / `_ci-doc-check`
wrappers) so the same rustdoc gate the CI `docs` job enforces runs
locally before commit. The dedicated `docs` job in
`.github/workflows/ci.yml` now delegates to `make _ci-doc-check`,
making the Makefile target the single source of truth. Fixed three
pre-existing rustdoc lint failures: a broken intra-doc link in
`src/manage/doctor.rs` and a redundant explicit link target in
`src/object_store/s3.rs` (both visible to the existing CI command),
plus a broken intra-doc link in `src/object_store/mock.rs` that was
hidden until the consolidated `make doc-check` command added
`--all-features`, which activates the `test-util`-gated mock
`ObjectStore` so its doc comments are also checked.
- **Environment-variables reference.** New
`docs/environment-variables.md` page consolidating every env var read
by the helper binaries, LFS agent, management CLI, and test suites
(`GIT_REMOTE_OBJECT_STORE_*`, `AZSTORE_<ALIAS>_*`, the AWS SDK
provider-chain vars, `RUN_LARGE_BODY_TESTS`, the `LIVE_*` shellspec
gates, and `GIT_DIR`). Linked from README, getting-started, and
storage-engines. Also documents env vars that look applicable but are
not honored (`RUST_LOG`, `AZURE_STORAGE_*`). A new
`.claude/rules/environment-variables.md` rule plus checklist updates
in `fix-issue` and `audit` keep this index in sync as env vars are
added or removed. `tests/env_var_doc_sync.rs` enforces the sync
mechanically: it scans every `pub` / `pub(crate)` `const ENV_*`
declaration under `src/` and fails if the literal value is missing
from the doc, so a forgotten row trips `cargo test` rather than
shipping stale.
### Changed
- **Audit-tier cleanups from the batch fix-2026-05-15 pass (#221).**
Strengthened tombstone-payload assertions in two packchain delete
tests (parse the body via `serde_json::Value` and assert the
embedded `sha`, matching the existing manage-branch template) and
added a sibling test that isolates the listing-mismatch fallback
branch from the unparseable-chain branch in
`try_write_baseline_tombstone`. Tightened the SAS huge-TTL test
to pin the actual error wording. Named the Azure / S3
list-pagination magic numbers (`AZURITE_DEFAULT_MAXRESULTS`,
`S3_DEFAULT_MAXKEYS`) so a future emulator default-page-size bump
produces a meaningful diff. Deduplicated the baseline-tombstone
writer (new `try_write_baseline_tombstone` in `packchain::gc`
replaces the near-identical helpers in `packchain::push` and
`manage::branch`), the baseline-tombstone key prefix (new
`pub(crate) BASELINE_TOMBSTONE_KEY_FRAGMENT` and
`baseline_tombstone_listing_prefix` composer), the config-entries
apply scaffolding (`apply_config_entries` for `config_set_many`
and `config_add_many`), the grace-hours resolver
(`resolve_grace_hours` — deliberately without the `Some(0)` clamp
that `resolve_lock_ttl_seconds` uses, since `--grace-hours 0` is
a legitimate force-mode operator intent), and the TTL saturation
idiom (`saturating_duration_seconds`). Reused the bundle
header-line read buffer across calls and simplified the URL
boolean parser. No behaviour change beyond the strengthened
test assertions.
- **Closed the path-B coverage gap in `build_blob_sas_url` (#224).**
Added `build_blob_sas_url_expiry_overflow_returns_error_not_panic`
alongside the existing huge-TTL test so the
`OffsetDateTime::checked_add` overflow path (`"SAS expiry
overflow"` wording) is regression-guarded — previously only the
`i64::try_from` overflow path (`"SAS ttl too large"`) had a test,
and a naïve `matches!(err, ObjectStoreError::Other(_))` assertion
passed through either path without distinguishing them. Surfaced
during the #221 audit pass.
- **Homebrew formula publishes to the shared `dekobon/homebrew-tap`
repository.** Previously the release workflow pushed to a dedicated
`dekobon/homebrew-git-remote-object-store` tap that never got
created, so every release skipped the tap push with a warning in
the runner log. The formula now lands in `dekobon/homebrew-tap`
alongside `host-identity` and other `dekobon` tools, so end users
can install with `brew tap dekobon/tap && brew install
git-remote-object-store`. The `HOMEBREW_TAP_TOKEN` secret must now
grant `contents: write` on `dekobon/homebrew-tap`. Two related
hardenings on the release workflow's tap-push step: an unreachable
tap (set-but-misconfigured PAT) is now a hard failure rather than a
silent warn-and-skip, and the push retries up to five times with
`git fetch && git rebase origin/main` between attempts so a sibling
pipeline landing a commit on the shared tap's `main` between our
clone and push no longer aborts our release. Token-unset still
skips gracefully for the pre-public-flip window. Updated in
`docs/development/cutting-a-release.md`.
- **Corrected the `ENVIRONMENT` sections of the helper man pages.**
`man/git-remote-s3+https.1` and `man/git-remote-az+https.1` previously
listed variables the helpers do not actually honor — `RUST_LOG`, the
Azure CLI `AZURE_STORAGE_*` / `AZURE_TENANT_ID` / `AZURE_CLIENT_*`
family. Replaced them with what the code actually reads: the project
`GIT_REMOTE_OBJECT_STORE_ALLOW_HTTP` / `_VERBOSE` /
`_LOCK_TTL_SECONDS` variables, the Azure `AZSTORE_<ALIAS>_*` scheme,
and the most common AWS provider-chain variables. Each page now
points at `docs/environment-variables.md` for the complete
reference.
- **`doctor` flag renamed and now honors the lock-TTL env var (#178, #183, #192).**
The flag is renamed `doctor --lock-ttl` → `doctor --lock-ttl-seconds`
to match `compact --lock-ttl-seconds` (breaking; no compat shim per
`AGENTS.md`). The type changed from `u64` with a compile-time default
to `Option<u64>` that defers to `lock_ttl_from_env()` when unset.
`doctor --delete-stale-locks` now agrees with the push / compact /
delete-branch consumers about what "stale" means under any
`GIT_REMOTE_OBJECT_STORE_LOCK_TTL_SECONDS` value, removing the
data-race vector previously caused by the env-blind 60s default.
Doc-comments at `src/protocol/push.rs::DEFAULT_LOCK_TTL_SECONDS` and
the env-vars index updated to match.
- **`delete-branch` man page now documents
`GIT_REMOTE_OBJECT_STORE_LOCK_TTL_SECONDS`.** The subcommand reads the
variable through `lock_ttl_from_env()` at `src/manage/branch.rs`, but
the doc-comment on `Command::DeleteBranch` (and so the generated
`man/git-remote-object-store-delete-branch.1`) did not surface it.
Updated the doc-comment and regenerated; the wording matches the
`compact --lock-ttl-seconds` description ("falling back to 60s").
- **`tracing_init` doc comments now state the reload is one-way.**
`option verbosity 2+` can raise the subscriber to `info` but the
protocol provides no inverse to lower it. The module and `raise_to_info`
doc comments described the reload as a generic "flip", which implied
bidirectional control.
- **Unified configuration-value vocabulary (#187).** URL booleans
(`?zip=`, `?bundle_uri=`) and `GIT_REMOTE_OBJECT_STORE_ALLOW_HTTP`
now share one parser. Accepted truthy tokens: `1`, `true`, `yes`,
`on` (case-insensitive). Accepted falsy tokens: `0`, `false`,
`no`, `off`. Previously the URL flags accepted only case-sensitive
`1|true|0|false` and the env var accepted only the literal `"1"`.
Documented the case-sensitivity policy for string-typed URL flags
(`engine`, `addressing`, `credential`, `profile`, `region`) in
`docs/getting-started.md`.
- **Single-source MSRV (#191).** Workspace `Cargo.toml` declares
`[workspace.package] rust-version = "1.94"`; `cli/Cargo.toml` and
`xtask/Cargo.toml` inherit via `rust-version.workspace = true`.
CI and release workflows derive the toolchain from
`cargo metadata` instead of duplicating the literal value. Removes
the two "keep in sync" comments that previously acknowledged the
drift risk.
- **`GcOpts`, `CompactOpts`, `DoctorOpts` default-value fields are
now `Option<u64>` (#185, #189).** `grace_hours`, `gc_grace_hours`,
and `lock_ttl_seconds` on the public opts structs changed type
from `u64` to `Option<u64>`. `None` defers to the matching
env-var helper (`grace_hours_from_env` / `lock_ttl_from_env`).
On `CompactOpts`, `Some(0)` is clamped to the env-or-default
value at the library boundary so it no longer silently disables
per-ref locking (#208). On `DoctorOpts` the clamp is deliberately
NOT applied — doctor only compares lock ages, it never acquires
a lock, so `--lock-ttl-seconds 0` is an operator-deliberate
"treat every lock as stale" request and is honoured. Breaking
change for any out-of-tree library consumer that constructed
these structs by literal.
- **`bundle::unbundle` / `git::unbundle` / `git::unbundle_at`
signature change (#195).** The unused `ref_name: &RefName`
parameter was removed from all three. Drops a wasted
`RefName.clone()` from the parallel-fetch hot path. Breaking
change for any out-of-tree caller of these `pub` async functions.
- **`ENV_GC_GRACE_HOURS` and `grace_hours_from_env` visibility
downgraded to `pub(crate)` (#185).** Brings them into line with
`ENV_LOCK_TTL_SECONDS` and `lock_ttl_from_env`. The env-var name
remains the public contract (documented in
`docs/environment-variables.md`); the constant import path is
no longer part of the crate's public API.
- **`BackendError` variants now route storage-side wording through
`container_word(kind)` (#193).** `UnknownStoredEngine` and
`EngineMismatch` gained a `kind: BackendKind` field; their
`Display` strings switch between "bucket" and "container" based
on the backend. `NotAuthorized`'s previously-unused `kind` is now
load-bearing. Azure operators no longer see "bucket" in fatal
backend errors against Azure containers. `validate_format` gained
a `BackendKind` parameter — breaking for out-of-tree callers.
- **`ManageError::StaleSnapshot` now distinguishes Deleted vs
ResidueOnly causes (#199).** The variant changed from `String`
to `{ entity: String, reason: StaleReason }`. `Display`
branches on the reason so the typed error matches the
stdout message the doctor wrote one line earlier.
### Fixed
- **Reader-vs-writer race on `GIT_REMOTE_OBJECT_STORE_LOCK_TTL_SECONDS`
closed.** `EnvGuard` (test-only) now upgrades to a per-key `RwLock`
write lock and `test_util::env_var_read_lock` exposes the read side.
`protocol::push::lock_ttl_from_env` — which is read indirectly by
every test that drives `push_one` — acquires the read lock under
`cfg(any(test, feature = "test-util"))`, so push tests on parallel
threads now serialise against the env-mutating tests
(`lock_ttl_env_override_*`, `resolved_lock_ttl_honors_env_*`) instead
of racing them. The mutating test recurses into the same read path
from its own thread; a thread-local writer-key set lets the reader
fast-path skip the lock so the test does not deadlock on its own
write guard. Release binaries pay nothing — the cfg gate is off when
`test-util` is disabled.
- **Env-mutating tests no longer leak on panic (#220).** Tests that
toggled process-global env vars (`GIT_REMOTE_OBJECT_STORE_LOCK_TTL_SECONDS`,
`GIT_REMOTE_OBJECT_STORE_GC_GRACE_HOURS`, `GIT_REMOTE_OBJECT_STORE_VERBOSE`,
`GIT_REMOTE_OBJECT_STORE_ALLOW_HTTP`, the per-test `AZSTORE_AUTH_TEST_*`
fixtures) used a paired `set_var` / `remove_var` pattern that leaked
the env var to subsequent tests when an assertion between the two
panicked. Introduced `EnvGuard` in `git_remote_object_store::test_util`
— an RAII guard that holds a per-key serialization mutex and restores
the prior value on drop, including on unwind. Replaces the bespoke
`ENV_LOCK_TTL_TEST_MUTEX` (cross-module serialization is now folded
into the guard's per-key registry) and the panic-vulnerable
`with_allow_http_env` closure in `tests/url_parsing.rs`.
- **`doctor` no longer silently skips future-stamped locks (#223).**
`scan_stale_locks` and `delete_stale_lock_if_still_stale` both
computed `age = now - last_modified` via
`Duration::try_from(...).ok()` and treated the `Err` branch
(negative age, i.e. `last_modified` in the future) as "not stale" —
silently filtering the lock out at every TTL, even
`--lock-ttl-seconds 0`. Now the negative-age branch is explicit:
emit a `warn!` naming the key and the skew magnitude, then include
the lock ONLY when the operator opts in via TTL=0 ("treat every
lock as stale"); any positive TTL still excludes it because a
future-stamped lock is not "older than" any positive threshold.
- **Cap bundle-header reads to prevent OOM (#194).** `BundleHeader::read`
now bounds per-line (16 KiB) and total-header (64 MiB) byte budgets
via `BufRead::take(...).read_until(b'\n', ...)`. A malformed bundle
whose first byte sequence has no `\n` until EOF (or many GB between
newlines) previously allocated unboundedly into a single `String`
before validation. Reached by every fetch path — pack delta from
any bucket the operator can read.
- **Bound `apply_delta` output per op to prevent OOM (#206).**
`src/packchain/read.rs::apply_delta` now checks `out.len() >
dst_size_usize` inside the opcode loop. Mirrors git's
`patch-delta.c` `size -= cp_size` invariant. A 1 GiB delta of
`0x80` opcodes could previously push the intermediate `Vec<u8>`
to ~64 TiB before the post-loop size check fired.
- **Reject silently-truncated ranged GETs (#207).** Real S3 and
Azure return HTTP 206 with the body truncated when
`range.start < body.len() <= range.end`. The S3/Azure backends
now run a post-flight length check and surface the truncation as
`ObjectStoreError::RangeNotSatisfiable`. Aligns the mock with the
real backends and surfaces pack-store corruption (truncated pack
file vs stale `chain.json`) that previously fed short data to
the decoder.
- **`bundle_uri_presign_ttl` capped at 7 days (#219).** New
`MAX_BUNDLE_URI_PRESIGN_TTL_SECONDS = 604_800` constant; URL parser
rejects larger values with `BundleUriPresignTtlTooLarge`. The
Azure SAS builder's `time::Duration::seconds_f64` previously
panicked on `ttl > i64::MAX` seconds; replaced with a panic-free
`i64::try_from` path that returns `ObjectStoreError`. Matches the
AWS SigV4 ceiling so behavior is consistent across backends.
- **Reject URL-special bytes in bundle-URI ref names (#213).**
`is_safe_for_bundle_uri_emission` now rejects `=`, `#`, `%`, `&`,
`;`, `,`, `?` in ref names. Previously only `=` was blocked
(wire framing); `#` truncated the URL at the fragment and `%XX`
let intermediaries re-encode the path. Refs with disallowed
bytes warn-and-skip via the existing path.
- **LFS install / debug toggles are now idempotent (#198, #210).**
Re-running `git-lfs-object-store install`, `enable-debug`, or
`disable-debug` no longer accumulates duplicate git-config
entries or fails with `ConfigKeyNotSet`. New `git::config_set`
and `git::config_unset_if_present` helpers underpin the
rewrite; legacy duplicate entries from older binary versions
are collapsed on the next idempotent write.
- **LFS agent installs SIGPIPE mask in main (#216).**
`git-lfs-object-store` now calls `install_sigpipe_mask`
before entering the REPL, matching the helper binaries. The
existing `is_broken_pipe()` clean-exit arm was previously
unreachable in production — git-lfs closing stdout killed
the agent with SIGPIPE instead of producing a graceful exit.
- **LFS agent honors `GIT_REMOTE_OBJECT_STORE_VERBOSE` (#180).**
The agent's non-debug REPL path now delegates to
`protocol::tracing_init::init`, sharing the single-knob
verbosity policy with the helper binaries and management CLI.
The `enable-debug` path is untouched (its `debug` floor and
file destination are its contract).
- **Management CLI no longer honors `RUST_LOG` (#179).**
`init_tracing` now delegates to `protocol::tracing_init::init`
instead of `EnvFilter::try_from_default_env`. All three binaries
now share one verbosity policy: `GIT_REMOTE_OBJECT_STORE_VERBOSE`
is the only env var that affects startup level. Matches the
documented policy in `docs/environment-variables.md`.
- **Packchain helper-protocol delete no longer races concurrent
fetch (#203).** The packchain-engine helper-protocol delete path
(`git push :refs/heads/foo` on a packchain remote) now writes a
baseline tombstone before sweeping per-ref artefacts, deferring
the `<full_at>.bundle` delete to `gc sweep`. Mirrors the existing
pattern from #134 (compact/force-push), #143 (delete-branch on
packchain refs), and #157 (bundle force-push). A fetcher that
resolved the bundle SHA from a stale `chain.json` before the
delete now still finds the bundle on the bucket through the
grace window. Bundle-engine deletes (helper-protocol or `git-remote-object-store
delete-branch`) remain synchronous: there is no chain reference
to orphan, and operators rely on `git push :ref` to remove the
bundle promptly.
- **Surface non-UTF-8 Azure credential env vars (#218).**
`resolve_alias` in `src/object_store/azure/auth.rs` now
distinguishes `VarError::NotPresent` (continue chain) from
`VarError::NotUnicode` (surface as
``env var `<NAME>` is set but its value is not valid UTF-8``).
Previously a corrupted env value silently fell through to
"credential alias has no env var set".
- **Bundle-engine contention message names delete (#217).**
`push_one`'s lock-contention error now says "Another client
may be pushing or deleting", matching the packchain engine's
wording. The same code path handles both `Push` and `Delete`
arms since #133. Test strengthened from `.contains` to
byte-exact match.
- **Removed false claim from LFS man page (#181).** The agent
honors credentials env vars only; the "lock TTL" half was
wrong (LFS reads no env vars and takes no per-ref locks).
- **Pinned Azure `x-ms-date` format (#174).** Replaced
`time::format_description::well_known::Rfc2822` plus
`str::replace("+0000", "GMT")` with an explicit
`format_description!` matching the RFC 1123 shape Azure documents.
The previous approach coupled Azure-auth correctness to the exact
byte emission of the `time` crate's RFC 2822 formatter — a future
minor-version change (e.g., `+0000` → `+00:00`) would silently
turn the replace into a no-op and break every signed request.
Added a byte-exact unit test pinning the wire format to
`"Sun, 06 Nov 1994 08:49:37 GMT"` so a regression on the
format description trips a focused test rather than only a
network-level Azurite test downstream.
- **Cross-platform pread for multipart uploads (#176).** Replaced
`src/object_store/multipart.rs`'s unguarded
`use std::os::unix::fs::FileExt;` and `read_exact_at` call with
a cross-platform `pread_exact` helper:
`#[cfg(unix)]` delegates to `FileExt::read_exact_at`;
`#[cfg(windows)]` uses `FileExt::seek_read` in a short-read loop
that returns `UnexpectedEof` on a premature zero-byte read so
the existing S3/Azure abort-on-truncation path still fires.
Restores reachability of the
`x86_64-pc-windows-msvc` / `aarch64-pc-windows-msvc` release
targets advertised in `.github/workflows/release.yml`. CI does
not yet exercise the Windows leg; a follow-up to add a Windows
runner to `ci.yml` is appropriate.
### Changed
- **Deduped gc mark/sweep output (#175).** Extracted the operator-
facing "gc mark" and "gc sweep" output lines into a new
`pub(crate)` helper `manage::gc_output` taking
`&mut impl Write`. `Gc::run` and `Compact::run_gc` both delegate
to the helper, eliminating the four duplicated `println!` /
`writeln!` format strings that previously lived in both files.
Pluralisation now uses the `if n == 1 { "pack" } else { "packs" }`
pattern (matching `fmt_partial_delete`) instead of the inline
`"pack(s)"` / `"tombstone(s)"` / `"object(s)"` tokens. Byte-exact
unit tests pin the output for zero / singular / plural / deferred
/ all-singular / all-plural / mixed-counter cases. The
`gc.rs::Gc::run` writer is now plain `&mut std::io::stdout()` so
the lock is not held across the `mark`/`sweep` await points.
- **Bundled review-loop cleanups F-001/F-003/F-006/F-008/F-009/F-010 (#177).**
Six independent code-quality findings from the iterative
review-loop pass landed in a single commit:
- F-010: dropped stale `#[allow(dead_code)]` on
`PathIndex::from_json_bytes` (reached via
`manifest::load_path_index`).
- F-006: removed the `GIT_REMOTE_S3_VERBOSE` alias claim from
the `protocol::tracing_init` module doc; only the canonical
env var is honoured (per `AGENTS.md`: no compatibility
aliases).
- F-003: introduced `HmacKey` (pre-decoded bytes with a
redacting `Debug`). `SharedKeySigningPolicy` and
`SasSigningKey` now store the decoded key; the per-request
base64 decode in `hmac_sha256_base64` is gone.
`compute_authorization` takes `&HmacKey`. (Follow-up in the
same batch: replaced the manual `SasSigningKey::Debug` with
`derive(Debug)` since the inner `HmacKey` already redacts.)
- F-001: `build_blob_sas_url` rejects `\n`/`\r` (and other ASCII
control bytes) in `container` / `blob_path` so a literal
newline cannot shift fields in the SAS string-to-sign.
`auth::header_str` now applies the same
trim-and-unfold-newline transform `canonicalized_headers`
already uses, so both string-to-sign feeds sanitise
consistently.
- F-008: `GcOpts.mark_only` / `sweep_only` booleans replaced
with `enum GcMode { Default, MarkOnly, SweepOnly }`. The CLI
parser keeps the two flags; the new `gc_mode_from_flags`
translates them at the boundary and rejects the conflicting
combination with a clear error (instead of the previous
silent no-op).
- F-009: LFS oid validation moved to the `run.rs` REPL
boundary. `Agent::upload` / `download` now take `&LfsOid`.
On validation failure the run loop emits a `complete` wire
event with an empty `oid` field and the raw rejected value
folded into `error.message`. `parse_oid` and `OpError::oid`
are gone from `agent.rs`.
- **Split `Doctor::list_and_handle_stale_locks` (#167).** Extracted
the per-key HEAD-recheck + delete loop into a free
`delete_stale_lock_if_still_stale` helper returning a
`DeleteOutcome` enum, and lifted the stale-scan filter into a
`scan_stale_locks` helper. `list_and_handle_stale_locks` is now a
~40-line orchestrator covering scan, report, and outcome
aggregation. Operator-visible output text and the `tracing` call
shape are preserved byte-for-byte. Added unit coverage for each
`DeleteOutcome` variant through `MockStore`.
- **Consolidated `ObjectStore` test decorators (#166).** A new
`delegate_to_inner_impl!` macro under
`object_store::test_support` (test-only) emits the
`#[async_trait::async_trait] impl ObjectStore` block alongside
per-method forwarders to `self.inner`, so each per-test decorator
collapses from ~80 lines of hand-written forwarders to ~15 lines:
the struct, one or more overrides, and a `forward:` clause naming
the methods to delegate. Migrated `PostHeadHookStore`,
`PostListDeleteStore`, `PostDeleteHookStore`, `PostListHookStore`,
`PostGetHookStore`, both `EvolvingChainStore`s (fetch + read),
`VanishingChainStore`, and `CountingStore`. Behavior preserved
byte-for-byte; no production code touched.
- **Push reuses the pre-lock tombstone set under the per-ref lock
(#165).** `protocol::push::prepare_push` now calls
`packchain::gc::tombstoned_bundle_keys` once and stashes the result
on `PushReadyState`; `perform_push_under_lock` passes the cached
set through `bundles_for_ref`'s new `cached_hidden` parameter
instead of re-listing `<prefix>/gc/` and re-fetching every
baseline tombstone. Sound because all tombstone writers for a
given ref (`defer_prior_bundle_via_tombstone`,
`compact::tombstone_prior_baseline_bundle`,
`manage::branch::write_baseline_tombstone_for_orphan`) serialize
through the same per-ref lock — no new tombstone for this ref can
land between the pre-lock and under-lock calls inside one
`push_one` invocation. Halves the tombstone-listing cost on every
push (one round-trip pair instead of two) and removes a redundant
per-tombstone `get_bytes` fan-out from the lock's critical path.
- **packchain gc mark scales with parallel chain.json fetches.**
`packchain::gc::list_referenced_packs` now fetches `chain.json`
bodies via `futures::stream::buffer_unordered` bounded by
`MAX_FETCH_CONCURRENCY` (= 8), mirroring the shape
`packchain::list::list_refs` already used. After #89 widened the
mark phase's listing prefix from `<prefix>/refs/heads/` to
`<prefix>/refs/`, the candidate set spans heads + tags + notes +
any other namespace the packchain engine writes under `refs/`;
the previous sequential per-ref `get_bytes` made GC wall-clock
scale linearly in total ref count instead of overlapping fetches
the way list already did. The bodies stream-fold into the
referenced-set as each fetch completes, so parse work overlaps the
next batch's fetch latency and no intermediate `Vec<Bytes>` is
held. Fail-closed semantics on parse errors and transport errors
are preserved — the mark phase still aborts rather than
tombstoning live packs against a partial referenced set (#97).
- **Documentation positioning shift.** The project no longer documents
itself as a Rust port of, or maintains any compatibility contract
with, `awslabs/git-remote-s3` (Python). The on-bucket key layout,
URL grammar, locking semantics, error wording, helper-protocol
output bytes, LFS JSON events, and management-CLI shape are all this
project's own decisions, free to evolve. README, `AGENTS.md`,
crate-level docs, `src/url.rs::StorageEngine::Bundle`, and the
lessons-learned guide were rewritten to drop the "upstream Python
tool" framing; every in-source `git_remote_s3/...py[:LINE]` citation
comment was stripped from `src/`, `cli/`, and `tests/`. The
`bucket-compat` issue label is no longer used. The `awslabs` /
`bgahagan` / `nicolas-graves` projects are still credited in the
README as inspiration; nothing else.
Binary names (`git-remote-s3-https`, `git-remote-az-https`, …) and
URL scheme prefixes (`s3+https://`, `az+https://`) are unchanged.
Renaming them is a breaking change and is deliberately out of scope
for this positioning shift; any future rename will be tracked on its
own issue.
### Fixed
- **S3 multipart uploads abort on future drop (#169, #171).** A new
RAII `MultipartUploadGuard` in `src/object_store/s3.rs` owns the
`upload_id` returned by `CreateMultipartUpload` and, while armed,
fires a best-effort `AbortMultipartUpload` from its `Drop` impl via
`tokio::spawn`. `multipart_put_bytes`, `multipart_put_path`, and
`multipart_copy` all hand the guard end-to-end through
`finish_multipart_upload`, which disarms it after a successful
`CompleteMultipartUpload` or after the inline awaited abort on a
per-part error. If `CompleteMultipartUpload` itself fails, the
function `?`-returns with the guard still armed and `Drop` fires
the abort on a detached task. A caller that drops the upload
future mid-stream (cancellation, panic, the losing arm of a
`select!`) no longer orphans the upload-id and is no longer billed
for the parts already uploaded. Azure's commit-list model has no
equivalent need: uncommitted blocks auto-expire after seven days,
so this is an S3-only fix. Drop runs outside any tokio runtime
warn-logs and returns cleanly rather than panicking.
- **Live packchain GC test tracks the baseline-tombstone sweep (#164).**
`mark_then_sweep_after_grace_deletes_orphans` in
`cli/tests/common/packchain_live.rs` asserted that a force-push +
`mark` + `sweep` cycle reclaimed exactly one tombstone and two
objects (pack + idx). Since #134 / commit `21a9ccd`, a force-push
also writes a *baseline tombstone* via
`force_push_baseline_cleanup` so an in-flight fetch reading the
prior `chain.json` can still download the bundle through the
operator-configured grace window; `sweep` walks both tombstone
namespaces, so the real outcome is two swept tombstones and three
deleted objects (pack + idx + prior baseline bundle). The test
failed against live S3 / Azure backends (`integration-s3`,
`integration-azure`) under the default zero-grace assertion. The
assertions and the post-sweep absence check now cover the prior
baseline bundle as well, and the post-condition checks were
extracted into an `assert_not_found` helper to keep the scenario
under clippy's per-function ceiling.
### Added
- **Cross-backend integration coverage for the best-effort zip-artifact
upload (#142).** A new `ZipPutFaultStore` decorator in
`cli/tests/common/zip_fault.rs` wraps any `Arc<dyn ObjectStore>` and
injects a one-shot `Network` error on `put_path(<zip-key>, …)`,
letting the bundle, `HEAD`, and `FORMAT` writes go through to the
real backend while the zip-only put fails. The shared scenario
`push_with_zip_put_fault_succeeds_and_omits_zip` then drives a
`?zip=1` push and asserts the issue #127 contract end-to-end: helper
exits `ok refs/heads/main\n\n`, the bundle key is durable on the
backend, the zip key is absent, and the fault fired exactly once
(so a regression that quietly retried under the swallow path would
still surface). A symmetric happy-path scenario
`push_with_zip_uploads_artifact` covers the no-fault case (bundle +
zip both present at their documented keys) on S3 (RustFS); the
Azure mirror is deferred to #161 because the hyphen-laden
`codepipeline-artifact-revision-summary` user-metadata key is
rejected by Azure's "valid C# identifier" rule, causing every
Azure `?zip=1` push to take the swallow path. Sibling of the
`MockStore` unit pin
`perform_push_under_lock_succeeds_when_zip_upload_fails` in
`src/protocol/push.rs`.
- **Cross-backend integration coverage for delete-path protection and
lock serialization (#141).** New shellspec cases pin the delete-path
guards introduced in #128 / #130 (PROTECTED# marker rejects an
empty-source push for both engines) and the under-lock delete
serialization from #133 (pre-seeded fresh `LOCK#.lock` produces a
`failed to acquire ref lock` refusal and leaves the bundle in place),
plus a packchain mirror of the force-push-with-PROTECTED# refusal from
#129. Adds `assert_chain_present` / `assert_chain_absent` to
`spec/support/bucket_assertions.sh` and new
`spec/integration/{s3,az}/packchain_protected_spec.sh` files that
drive the packchain engine end-to-end via `?engine=packchain` against
rustfs and Azurite, so any real listing-semantics regression surfaces
here instead of only against MockStore.
- `cargo xtask install` workspace automation that runs `cargo install
--path cli` and creates the four `+`-form helper symlinks
(`git-remote-s3+https`, `git-remote-s3+http`, `git-remote-az+https`,
`git-remote-az+http`) alongside the cargo-installed hyphenated
binaries. Replaces the manual `for s in s3+https …; do ln -sf …;
done` loop the README used to ship. Re-runs are idempotent, the
task refuses to clobber any existing regular file or directory at a
`+`-form path, and `--bin-dir` / `--no-install` / `--dry-run` flags
cover custom layouts and pre-flight previews. Lives in a new
`xtask/` workspace member, wired up via a `cargo xtask` alias in
`.cargo/config.toml`. (#25)
- Annotated-tag refs whose chain ends at a tree or blob now push
and fetch correctly across both the `bundle` and `packchain`
engines. The pack carries the tag chain plus the leaf object — for
tree-tipped chains, the full recursive blob closure is included
via an explicit depth-first walk fed to `count::objects` with
`ObjectExpansion::AsIs` (gix-pack's `TreeContents` expansion is
documented for commits/tags only, so we don't rely on it for bare
tree input). Bare-tree and bare-blob refs (a ref pointing directly
at a tree or blob with no tag wrapper) are supported as a natural
extension of the same dispatch. Force-push is the only way to
convert a commit-tipped ref to a non-commit-tipped ref (and
vice-versa); a non-force kind change is rejected as
not-an-ancestor (#80).
### Changed
- **Breaking** (on-bucket schema): `path-index.json` field `commit`
is renamed to `tip` and now stores the unpeeled chain.tip OID
instead of the underlying commit SHA. The schema version is
bumped from 1 to 2; per the project's greenfield policy
(`AGENTS.md`) no read-side migration is provided, so stale v=1
files in older buckets are treated as absent and re-emitted on
the next push. Tree-tipped chains (annotated tag of tree, bare
tree ref) now have a `path-index.json`; blob-tipped chains do
not, since there is no tree to walk (#80).
- `parse_bundle_key` now rejects bundle keys whose extracted ref
path fails `gix-validate`'s ref-name check (`..` traversal,
control characters, `.lock` suffixes). Mirrors the packchain-side
hardening from #72 — both engines now validate ref paths before
emitting them in the `list` response (#73).
- `Doctor::run` now delegates to `run_into<W: Write>` with an
injectable writer, making the full doctor output (report, fixer
prompts, stale-lock scan) unit-testable without spawning the
management binary (#74).
- `make shellspec-live-s3` and `make shellspec-live-azure` now run
every implemented storage engine in turn (`bundle`, `packchain`)
instead of bundle only. The Makefile knob `ENGINE=<name>` is
replaced with `ENGINES="<name> ..."`; pass `ENGINES=bundle` (or
`ENGINES=packchain`) to scope a run to a single engine. Empty
`ENGINES` is rejected at the target boundary instead of silently
no-opping.
- Stale `Phase N / not yet implemented` doc-comments across the
`packchain` module rewritten to reflect shipped reality: push
(#63), fetch (#64), `read_blob` (#65), GC (#66), and compaction
(#67) are described as implemented; references to "Phase 5 GC"
replaced with `manage gc`.
- **Breaking** (env var): `GIT_REMOTE_S3_LOCK_TTL_SECONDS` renamed
to `GIT_REMOTE_OBJECT_STORE_LOCK_TTL_SECONDS`, dropping the
legacy `GIT_REMOTE_S3_` prefix and aligning with
`GIT_REMOTE_OBJECT_STORE_ALLOW_HTTP`. Hard rename with no
read-both shim — operators who set the old name in CI or
shell config must update to the new (#90).
- **Breaking** (env var): `GIT_REMOTE_S3_GC_GRACE_HOURS` renamed
to `GIT_REMOTE_OBJECT_STORE_GC_GRACE_HOURS`, mirroring #90.
Hard rename, no read-both shim. (#91)
- **Breaking** (env var): the `GIT_REMOTE_S3_VERBOSE` upstream-compat
alias is removed; only `GIT_REMOTE_OBJECT_STORE_VERBOSE` is read.
AGENTS.md disclaims any awslabs/git-remote-s3 parity, so the shim
was misleading. (#93)
- README dependency status text drops the hardcoded `gix 0.82`
reference in favour of version-neutral wording, so future
bumps do not require documentation churn (#88).
- `Sha40` gains a `from_oid(&gix_hash::oid)` constructor that pre-sizes
the buffer and skips the lowercase-hex re-validation that
`Sha40::try_new(oid.to_string())` performed. Used on the `walk_tree`
blob path (once per tree entry on every push), the `path-index.json`
tip, the pack-trailer SHA, and the push local-tip — every production
call site that already had an oid in hand. Test fixtures still build
from `&str` literals via `try_new`. (#95)
### Fixed
- Delete paths now perform a post-sweep `head(<prefix>/<ref>/PROTECTED#)`
probe as belt-and-suspenders surveillance for #151. The primary defence
remains the per-ref lock: `delete-branch` (#158), the helper-protocol
bundle and packchain delete handlers (#125, #133), and `protect` /
`unprotect` (#159) all acquire `<prefix>/<ref>/LOCK#.lock`, so a
`protect` cannot land a marker between the under-lock listing and the
sweep. The post-sweep probe surfaces a structured `error!` if the
marker is ever observed — that would indicate a lock-contract violation
(a bypass, bucket inconsistency, or misbehaving sibling tool) — and is
a no-op on the happy path. Pinned by a regression test that asserts
`protect` returns `LockContended` while delete-branch holds the lock,
proving the race window is mechanically closed.
- Azure `?zip=1` pushes now land the zip artifact (#161). The
zip-only `put_path` previously attached a
`codepipeline-artifact-revision-summary` user-metadata entry whose
hyphenated key is invalid on Azure (metadata names must be valid
C# identifiers). Azure rejected the upload, and the issue #127
best-effort swallow path hid the failure, so every `?zip=1` push
to an Azure remote reported success while silently omitting
`<prefix>/<ref>/repo.zip`. `perform_push_under_lock` now gates the
metadata on `BackendKind::S3` (where AWS CodePipeline consumes it);
Azure pushes attach only `Content-Disposition`. The deferred
happy-path mirror left open by the #142 cross-backend coverage
(`push_with_zip_uploads_artifact` against Azurite) is now wired up.
- Bundle-engine `git push :<ref>` now serializes against concurrent
pushes by listing and sweeping under the per-ref lock, eliminating a
silent false-success race window (#133).
- Bundle delete now rejects the operation when a `PROTECTED#` marker is
present in the under-lock listing, even when the entry count happens
to match expected — closes a count-match TOCTOU bypass of branch
protection (#128).
- Force-push protection check now runs under the per-ref lock in both
bundle and packchain engines, preventing a concurrent `protect` from
being raced by an in-flight `git push --force` (#129).
- Protocol push now treats the optional `repo.zip` artifact upload as
best-effort once the bundle, HEAD, and FORMAT are durable, mirroring
the prior-bundle delete (#121) — a transient store error on the zip
no longer reports the push as failed while the git data is already
live (#127).
- `manage delete-branch` now re-lists the branch immediately before the
deletion loop so objects from a concurrent push landing during the
confirmation prompt are swept; `PROTECTED#` is re-checked on the
fresh listing, `NotFound` is tolerated during the sweep, and empty
fresh listings report "already deleted" instead of silent
success (#139).
- Pinned the TOCTOU window between the initial protection check and the
deletion loop in `ManageBranch::delete` with an explicit regression
test; the post-prompt re-list introduced by #139 closes the
race (#131).
- `manage protect` now re-verifies the branch still has user data
immediately before writing the `PROTECTED#` marker; a concurrent
`delete-branch` no longer leaves an orphan marker that would block
future operations on a recreated branch (#137).
- `manage delete-branch` no longer short-circuits on the first per-key
delete failure; the loop now sweeps every listed key, then returns
`ManageError::PartialDelete` naming exactly the keys whose deletes
failed so a retry can converge. `NotFound` mid-sweep continues to be
tolerated (#122).
- `manage doctor` re-HEADs each stale-listed lock immediately before
deleting it so a fresh, active lock at the same key is not silently
revoked when the initial bucket listing has gone stale during
interactive prompts; skipped locks are surfaced in the doctor
report (#132).
- `manage doctor fix-head` now re-verifies the operator's chosen branch
still has user data on the bucket before writing `HEAD`; a concurrent
push or delete-branch between the snapshot listing and the HEAD write
no longer recreates the invalid-HEAD condition the doctor was trying
to fix. Returns the new typed `ManageError::StaleSnapshot` (#138).
- `packchain::delete_remote_ref_packchain` now honors the `PROTECTED#`
marker before sweeping a ref, refusing protected deletes with the
canonical wire-format message and closing the lockless-`protect`
TOCTOU window by running the check on the under-lock listing (#130).
- `packchain compact` and force-push no longer immediately delete the
prior baseline bundle; the bundle is now claimed by a
`gc/baseline-tomb-*` tombstone and reclaimed by `manage gc sweep`
after the same grace window that protects segment packs. Closes a
race where a concurrent fetch that already read the prior
`chain.json` failed with `BaselineMissing` (#134).
- `manage gc sweep` re-derives the live-referenced pack set per
tombstone instead of caching a once-per-sweep snapshot, closing a
race where a concurrent push committing `chain.json` mid-sweep
(notably a force-revert that aliases an existing pack key via
deterministic gix pack emission) could leave a permanently dangling
chain reference (#140).
- `manage gc mark` lists packs first, then chains, eliminating a
false-positive orphan tombstone when a concurrent push uploads a
pack and commits its chain between mark's two listings (#135).
- `packchain::read_blob` now transparently retries `PackMissing`
failures caused by a concurrent `manage gc sweep` deleting
compacted-away packs, reloading `chain.json` between attempts. After
exhausting the bounded retry schedule (3 retries, ~2.6s worst case),
the call surfaces the new typed
`PackchainError::ConcurrentGcRetriesExhausted` so callers can
distinguish a vigorous compact+sweep cycle from a permanent bucket
inconsistency (#136).
- `is_bundle_candidate` no longer drops bundle keys whose ref name
contains the substrings `.zip` or `LOCKS`; the predicate is now a
positive `<sha>.bundle` final-segment check (#109).
- `packchain::compact` no longer reports failure when the prior baseline
bundle delete fails after `chain.json` is already durable; the cleanup
is best-effort and orphan keys are logged at WARN for manual cleanup
(#113).
- Bundle-engine `perform_push_under_lock` no longer reports failure when
the prior-bundle delete fails after the new bundle is already durable;
the cleanup is best-effort and the orphan key is logged at WARN for
manual cleanup, matching the `compact` / `force_push_baseline_cleanup`
pattern (#121).
- Management `delete-branch` now refuses to delete a branch that has a
`PROTECTED#` marker, matching the helper-protocol delete path (#110).
- `GIT_REMOTE_OBJECT_STORE_LOCK_TTL_SECONDS=0` no longer silently
disables per-ref locking; zero now falls back to the default TTL
(#112).
- `packchain read_blob` now bounds the last-entry pack fetch at
`MAX_RANGE_BYTES` (1 GiB) and rejects entries whose implied range
exceeds the cap with a typed `MalformedPackEntry` error, instead of
buffering the entire pack body (#115).
- Snapshot classifier now uses the canonical exact-equality
`keys::is_protected_marker_segment` helper, so future `PROTECTED#`-
prefixed keys are not misclassified as the protection marker (#111).
- `is_protected` no longer uses a byte-prefix `list()` scan for the
`PROTECTED#` marker; it now does an exact `head()` check (cheaper and
resistant to future `PROTECTED#`-prefixed sibling keys) (#119).
- `packchain gc sweep --force` no longer deletes packs that became live
between mark and sweep; the live-pack re-check now always runs and
`--force` only skips the grace window (#117).
- `packchain` fetch and compact now validate `ChainSegment.pack` format
before deriving bucket keys, so a crafted `chain.json` cannot drive
bucket GETs at arbitrary keys (#120).
- `packchain` push and compact now write `chain.json` before
`path-index.json`; a crash between them is detected by the reader and
surfaced as the new typed `TransientChainPathIndexMismatch` instead of
the misleading `BlobNotInChain` (#114).
- `packchain` delete refspec acquires the per-ref lock before sweeping
ref keys, preventing a concurrent push from losing mutual exclusion
when its `LOCK#.lock` was erased by an unrelated delete (#116).
- Per-ref lock can no longer be stolen by stale-recovery while a
long-running critical section (notably `packchain compact`) is still
in flight. Locks now carry a background heartbeat that refreshes the
key every `ttl/3` (#118).
- `packchain` delete now probes `chain.json` under the per-ref lock, so
a concurrent deleter cannot mask the documented "not found" wire
error (#125).
- `doctor` now reports `<…>.bundle` keys whose stem is not 40 lowercase
hex chars (push silently filters them; doctor lists each key with its
ref-path and a manual-deletion hint) (#124).
### Added
- `cargo xtask man` generates Unix manpages for every shipped binary
(clap-derived for the management CLI, hand-authored troff stubs for
the four helper-protocol shims and `git-lfs-object-store`). The
`man/` directory is checked in and packaged under
`$prefix/share/man/man1/` (#123).
### Changed
- Renamed `AuditReport` child types for consistency:
`OrphanReport` → `OrphanSummary` and `BranchAuditRow` → `BranchRow` so
the per-row sibling types share the `*Row` suffix and the `*Report`
suffix is reserved for the top-level container (#104).
- Renamed manage-side `ManageCompactOpts` to `CompactOpts` so it matches
the `Doctor`/`DoctorOpts` and `Gc`/`GcOpts` sibling pattern (#105).
- Renamed internal LFS wire-payload struct `EventError` to
`ErrorPayload` so the `*Error` suffix stays reserved for real Rust
error types (#107).
- Renamed `bundle::BundleHeader::parse` to `bundle::BundleHeader::read`
for naming alignment with `std::fs::read` (#106).
- Refactored key-builder helpers: `crate::keys::join` now takes
`Option<&str>` for consistency with `bundle_key` and every
`packchain::keys::*` builder; the redundant `_with_prefix` and
verb-mismatched `parse_*` names were renamed to
`pack_key_from_relative` and `sha_from_pack_key`; the
`optional_prefix` shim is gone (#103).
- Shell scripts under `spec/` and `utils/` use `UPPERCASE` variables per
`.claude/rules/bash.md` (#108).
- `packchain gc` no longer tombstones (and after grace, deletes) packs
reachable only from chains under non-`refs/heads/` namespaces.
`list_referenced_packs` previously listed `<prefix>/refs/heads/`,
so chains under `refs/tags/`, `refs/notes/`, `refs/pull/`, etc.
were invisible to the mark phase and their packs were treated as
orphans. The listing prefix is now `<prefix>/refs/`; the existing
`is_chain_json_key` and `parse_pack_key_sha` filters remain
sufficient to reject sibling artefacts. This is a data-loss-class
fix; no on-bucket layout change. (#89)
- The packchain helper-protocol `list` command now surfaces refs in
every `refs/*` namespace, not only `refs/heads/`. `list_refs`
previously scanned `<prefix>/refs/heads/` only, so tags
(`refs/tags/`), notes (`refs/notes/`), and other namespaces were
invisible to `git ls-remote` / `git clone` against the packchain
engine. Mirrors the gc.rs fix landed for #89. (#82)
- `OFS_DELTA` recursion in `packchain::read::decode_entry` now
consumes the same `MAX_DELTA_DEPTH` budget as `REF_DELTA`. The
guard previously lived in `read_object_from_chain`, which only
the `REF_DELTA` branch re-entered, so a long pure-`OFS_DELTA`
chain in a malformed or attacker-controlled pack could
stack-overflow the reader. Security/DoS fix; no on-bucket
format change. (#83)
- `packchain compact` now deletes the previous baseline bundle
after the new `chain.json` is durable, instead of leaking it
forever. The old `<sha>.bundle` lived outside the `packs/`
namespace that GC scans, so each compact silently leaked
bucket storage. Delete runs under the same per-ref lock and
tolerates `NotFound` for idempotency. (#84)
- `walk_tree` (the recursive helper backing `extract_path_index`)
is now bounded by an ancestor-set cycle detector. A corrupted
or adversarial ODB whose tree references itself directly or
transitively previously caused unbounded recursion and a
stack overflow. The detector tracks the per-descent ancestor
set so legitimate shared subtrees at distinct paths still
walk correctly; cycles abort with the new typed
`PackchainError::TreeCycle { oid }`. (#81)
- Engine diagnostics list every supported engine. The
`UnknownStoredEngine` and `?engine=` parse errors previously
said "this client only supports `bundle`", omitting the
`packchain` engine. Both wordings are now driven from a
single `StorageEngine::ALL` source so future variants update
the message automatically. (#85)
- `git push :protected-branch` now reports a protection-specific
refusal that names the management CLI's `unprotect` workflow,
instead of misreporting the situation as multi-bundle
corruption and pointing users at `doctor`. The `PROTECTED#`
marker is detected before the generic fallback; the
multi-bundle error path is preserved for genuine
corruption. (#86)
- Push-refusal tests now pin exact wire bytes (including the
trailing `?` recoverable-error marker) at the call sites
flagged in #87, and the protected-ref refusal test asserts
the would-be local-tip bundle was not uploaded on refusal.
Pure test tightening; no production code change. (#87)
- `delete_remote_ref` now distinguishes the protected-marker case
from genuine multi-bundle corruption with a last-segment
equality check on `keys::PROTECTED_MARKER_SEGMENT`, replacing
the previous substring match. Protects against a future bucket
schema where the literal could appear outside the marker
segment. The shared marker constant is also reused by
`is_bundle_candidate`, `is_protected`, `manage::branch`, and
`manage::snapshot`. (#94)
- `git fetch --depth=N` from a shallow clone now correctly deepens the
local repository. The helper previously merged new shallow boundaries
with the prior `.git/shallow`, leaving the original tip in the file;
git treats every entry in `.git/shallow` as hard parentless via
`shallow.c::register_shallow` grafts, so the newly-installed parent
commits stayed hidden and `git log` still showed only the tip. The
helper now prunes any prior boundary whose parents are present in the
ODB before writing, and unlinks the file when no boundaries remain
(matching git's own `prune_shallow` semantics). Affects both bundle
and packchain engines and all storage backends — the issue was first
observed against real Azure but reproduces on every tier. New
shellspec coverage exercises re-shallow, deepen-to-full-history, and
successive-deepen flows (#78).
- `doctor` bundle-shape report no longer misclassifies packchain
bookkeeping directories (`packs/`, `gc/`) and LFS storage
(`lfs/`) as bare refs. Refs with a `chain.json` manifest now
report "Ok" instead of "No bundles" (#75).
- Pushing an annotated tag now works against both engines. The
packchain engine previously crashed at push time
(`Expected object of kind commit but got tag`) because
`gix::Repository::rev_walk` was called with the unpeeled tag-OID.
The bundle engine appeared to succeed but emitted a pack
containing only commit-reachable objects, so a fetch-back of the
tag could not resolve the ref. Both engines now peel the
resolved spec to its underlying commit and append the tag chain
(annotated tag, or tag-of-tag) verbatim into the emitted pack
via a second `count::objects` pass with
`ObjectExpansion::AsIs`. Branch and lightweight-tag pushes are
unaffected. Tag refs whose target is a tree or blob were
initially rejected; #80 lifted that restriction (#79).
### Removed
- **Breaking** (Rust API): `GitError::TagTargetUnsupported` variant
removed. The variant existed only to reject tag-of-tree /
tag-of-blob pushes deferred from #79; #80 implements full support
for those cases, so the rejection path no longer fires.
Downstream code that exhaustively matched `GitError` and had a
branch for `TagTargetUnsupported` will need to drop the branch
(the kind dispatch now lives inside the new `PeeledTip` enum
returned by `peel_tag_chain`).
- **Breaking** (Rust API): `ProtocolError::EngineNotImplemented`
variant removed. The variant was a leftover from packchain Phase 1
scaffolding and was never constructed once push (#63), fetch (#64),
`read_blob` (#65), GC (#66), and compaction (#67) shipped. Both
`StorageEngine` variants (`bundle`, `packchain`) cover the full
protocol surface, so the variant could not fire. Downstream code
that exhaustively matched `ProtocolError` and had a branch for
`EngineNotImplemented` will need to drop the branch (the branch was
dead code anyway).
### Added
- Live-cloud shellspec tier expanded to engine parity. New
`spec/live_s3_spec.sh` mirrors `spec/live_az_spec.sh`'s structure
with unit-level coverage of `spec/support/live_s3.sh` (URL grammar,
`aws` argv composition, `clear_prefix` safety guard) — runs as part
of the default `make shellspec` suite, no cloud calls. New
`spec/live/{s3,az}/manage_cli_spec.sh` and
`spec/live/{s3,az}/shallow_fetch_spec.sh` port the integration-tier
`manage_cli` and `shallow_fetch` scenarios to the live tier so the
management CLI and shallow-fetch paths are exercised against real
AWS / Azure SDK chains.
- `assert_ls_remote_ref_present` and `assert_ls_remote_sha` helpers
in `spec/support/git_scenarios.sh` provide engine-agnostic pre/
post-conditions for tests where the bundle-format-only assertions
(`assert_bundle_count`, `assert_bundle_sha_for_ref`) are gated
behind `live_engine_is_bundle` and would otherwise pass vacuously
under packchain. Applied retroactively to `core_spec.sh`
delete-branch and `force_push_spec.sh` force-push tests for both
engines.
- `script(1)` added to the live-tier tools list in
`spec/live/README.md` (only required for `manage_cli_spec.sh`'s
pty-allocated `delete-branch` confirmation prompt).
- `packchain` `bundle-uri` presigned URLs (issue #76, completes
the deferred follow-up from #71): a new
`?bundle_uri_presign_ttl=<seconds>` URL flag asks the helper to
emit per-ref signed URLs (S3 SigV4 / Azure service-blob SAS)
instead of canonical bucket URLs, so private-bucket users can
also benefit from `bundle-uri`-accelerated clones. The TTL
parses to `Option<NonZeroU64>` so `=0` is rejected at the URL
boundary. New `ObjectStore::presigned_get_url(key, ttl)` trait
method drives the presigning per backend; the default impl
returns `ObjectStoreError::Unsupported` so backends without a
presigning model (`MockStore` in tests, Azure `TokenCredential`
/ SAS-env-var paths) inherit a clean error without a stub.
S3 presigning uses `aws-sdk-s3::presigning::PresigningConfig`;
Azure SAS is a hand-built `sv=2022-11-02` service-blob signature
in `src/object_store/azure/sas.rs` (storage-key-signed; user-
delegation SAS is out of scope per #76). Live round-trip tests
exercise SigV4 against RustFS and SAS against Azurite — both
emit the expected `X-Amz-Signature` / `sig` query parameters
and the URL fetches the body via plain `reqwest::get` with no
further auth. The `BundleUriError::PresigningUnsupported`
variant is removed.
- `packchain` live integration tests against RustFS and Azurite
(issue #69, completes the live-coverage gap for Phases 2–5 of
#52): two new test binaries
(`cli/tests/packchain_live_s3.rs`,
`cli/tests/packchain_live_azure.rs`) drive a backend-agnostic
scenario module (`cli/tests/common/packchain_live.rs`) against
fresh-per-test buckets / containers. Scenarios cover Phase 2
(first push lays down `chain.json` + `path-index.json` +
`<tip>.bundle` + `packs/<sha>.{pack,idx}` + `FORMAT` + `HEAD`;
incremental push appends a chain segment newest-first; force
push collapses to a single segment); Phase 3 (fetch into an
empty repo lands the tip; chain-walk fetch installs every
segment in dependency order); Phase 4 (`read_blob` returns
byte-equal content, and the cache survives an `.idx` deletion
between calls — pinning `PackIndexCache` reuse without
instrumenting the store); Phase 5 (`mark` writes a tombstone
for orphan packs, `sweep` with `grace_hours = 0` deletes them
through the production grace-comparison path). CI runs both
suites in the existing integration-test jobs.
- `packchain` `bundle-uri` capability (issue #71): packchain remotes
can now advertise the git remote-helper `bundle-uri` capability,
letting `git clone` fetch the baseline bundle from a public bucket
or CDN-fronted endpoint in parallel before the helper protocol
negotiates only the incremental tail. Opt in with `?bundle_uri=1`
on a `?engine=packchain` URL; bundle-engine remotes ignore the
flag (their bundle filenames rotate per push, so a stable URL
would race the next push). The `bundle-uri` command response
emits one entry per ref (`bundle.<ref>.uri=<url>` +
`bundle.<ref>.creationToken=<full_at>`), letting clients cache
the bundle across clones until `full_at` advances (force push or
compact). Per-ref parse failures warn-and-skip; a corrupt chain
on one branch does not blackhole the others. Default emission
is canonical bucket URLs (works against public-read buckets,
S3-compatible CDNs, and Azure containers with anonymous-read
access); private buckets opt in to per-ref presigned URLs via
the `?bundle_uri_presign_ttl=<seconds>` flag (issue #76).
- `packchain` `compact` subcommand (issue #67, completes Phase 5
of #52): new `git-remote-object-store compact <remote>` rewrites
a packchain ref's `chain.json` to a single-segment chain at the
current tip, with a fresh baseline pack and bundle. Old segment
packs become orphans for `gc` to reap on the next mark/sweep
cycle. Flags: `--ref <name>` to target a single branch (default
scans every ref via the audit and prompts for confirmation),
`--force` to bypass the segments-/bytes-since-`full_at`
heuristic, `--with-gc` to chain mark+sweep after a successful
compact, `--lock-ttl-seconds <N>` to extend the per-ref lock TTL
for large repos (resolves Open Q4 from #52). Implementation uses
the local-clone-then-repack approach: downloads the entire chain
into a tempdir-backed bare repo, runs `build_baseline_pack` at
the current tip, regenerates `path-index.json`, builds a fresh
baseline bundle, uploads, and atomically commits the new
`chain.json`. New `packchain::compact` library API and
`manage::compact::Compact` runner.
- `packchain` doctor extensions (issue #68): the management
`doctor` subcommand now emits a `=== Packchain ===` section
whenever the resolved engine is `packchain`. The section reports
orphan pack count and bytes, pending tombstones (run id, marked
timestamp, age, orphan count) sorted oldest-first, per-branch
segment / byte totals with a `[recommend compact]` flag when
either threshold is exceeded, and dangling chain references
(chain.json segments pointing at packs missing from the bucket)
surfaced as ERRORS. New public `packchain::audit` module with
`audit`, `AuditReport`, `OrphanReport`, `TombstoneRow`,
`BranchAuditRow`, `DanglingRow`, and the threshold constants
`COMPACT_SEGMENTS_THRESHOLD` (>20 segments) and
`COMPACT_BYTES_THRESHOLD` (>100 MiB). Bundle-engine remotes see
the existing report unchanged.
- Operator guide for `gc` (issue #70): a "Garbage collection"
section in `docs/getting-started.md` covers when to run, the
default mark+sweep flow, a cron-friendly weekly schedule with
crontab and GitHub Actions samples, `--grace-hours` and
`GIT_REMOTE_S3_GC_GRACE_HOURS` tuning, the `--force` re-check-
skip semantics, and how to read the per-phase output.
- `packchain` storage engine — Phase 5 partial (orphan-pack garbage
collection) of issue #52: new `git-remote-object-store gc <remote>`
subcommand and `git_remote_object_store::packchain::gc` library
module. Two-phase mark-and-sweep design: phase 1 lists every
`<prefix>/refs/heads/*/chain.json`, derives the orphan pack set
(in `packs/` but not referenced by any chain), and writes a
tombstone at `<prefix>/gc/tombstones-<run_id>-<rfc3339>.json`.
Phase 2 walks tombstones older than `--grace-hours` (default 24,
env override `GIT_REMOTE_S3_GC_GRACE_HOURS`), re-derives the
current orphan set to skip packs re-referenced between phases,
deletes `.pack` + `.idx` idempotently, and removes the tombstone.
Mark fails closed on a corrupt `chain.json` so a parse error never
tombstones live packs. `--mark-only` and `--sweep-only` separate
the phases for cron scheduling; `--force` skips both grace and
re-check (operator-asserted safe). Sources of orphans handled:
force push, lost-race push, aborted push, branch deletion, and
(future) compaction. (#66, sub-issue of #52, partial — `compact`
subcommand and `doctor` orphan-reporting extensions deferred to
follow-ups.)
- `packchain::gc` public surface: `mark`, `sweep`, `MarkOpts`,
`MarkOutcome`, `SweepOpts`, `SweepOutcome`, `DEFAULT_GRACE_HOURS`,
`ENV_GC_GRACE_HOURS`, `grace_hours_from_env` for library consumers
that drive GC programmatically (CI agents, scheduled lambdas).
- `manage::gc::Gc` runner that the CLI's `gc` subcommand wraps,
matching the existing `Doctor` / `ManageBranch` shape so a
non-interactive frontend can drive the same flow.
- `packchain` storage engine — Phase 4 (direct file access) of issue
#52: new public `read_blob(remote, ref_name, path, &cache)` library
API fetches a single file at a ref's tip without cloning or running
git. The lookup walks `chain.json` + `path-index.json` to resolve
the path to a blob SHA, scans each segment's `.idx` newest-first
for the entry, and ranged-GETs the blob's pack bytes via
`ObjectStore::get_bytes_range`, zlib-decompressing and applying
`OFS_DELTA` / `REF_DELTA` chains up to a fixed depth (`MAX_DELTA_DEPTH
= 50`, matching git's own cap). Total: 4–5 API calls for a warm
lookup against a single-segment chain. (#65, sub-issue of #52)
- `PackIndexCache` — byte-bounded LRU keyed by `(prefix, content-sha)`
that amortises pack-index parses across `read_blob` calls. Default
capacity is 64 MiB; long-running consumers (CI agents, build
systems) keep one cache for the lifetime of the process so the
per-call cost drops to one `chain.json` GET, one `path-index.json`
GET, and the ranged pack read. Single-shot callers can pass
`&PackIndexCache::default()` and let it GC at drop.
- Engine guardrail on `Remote`: `Remote::open` now stores the resolved
`StorageEngine`, exposed via `Remote::engine()`. `read_blob` rejects
bundle remotes up front with `PackchainError::WrongEngine` rather
than blindly fetching a non-existent `chain.json`.
- New `PackchainError` variants for Phase 4 failure modes:
`WrongEngine`, `PathIndexAbsent`, `PathNotFound`, `MalformedPath`,
`PathNotABlob`, `BlobNotInChain`, `MalformedPackEntry`, `Decompress`,
`DeltaTooDeep`, `MalformedDelta`, and `InvalidRefName`. Each
identifies the specific corruption / misuse class so a Phase 5
`doctor` can flag them individually.
- `packchain` storage engine — Phase 3 (fetch) of issue #52: a
packchain bucket written by Phase 2 is now clonable and fetchable.
`git fetch` against `?engine=packchain` reads `chain.json`, walks
segments newest → oldest until a locally-known ancestor is found,
downloads the needed packs (and the `<full_at>.bundle` baseline
when the receiver has no anchor) in parallel up to
`MAX_FETCH_CONCURRENCY = 8`, and installs each pack
oldest-first into the local `objects/pack` directory. Cross-batch
dedup via the existing session-wide `FetchedRefs` cache works
identically to the bundle engine. `chain.json` references that
resolve to a missing pack on the bucket surface a typed
`PackchainError::PackMissing` with the absent key, satisfying
issue #64's "fail loud, not silent zero-byte fetch" criterion.
(#64, sub-issue of #52)
- Shallow fetch on the packchain engine: under `option depth N`,
the engine downloads segments **sequentially** newest-first,
installs each, and runs `shallow_boundaries` after every install,
stopping as soon as the boundary set is non-empty. This is a
deliberate divergence from the bundle engine's parallel-fetch
shape; the boundary calculation depends on inspecting the
installed objects between segments, so a future "speed up
packchain shallow fetch" change must NOT re-parallelise.
- `PackchainError::ChainAbsent`, `PackchainError::PackMissing`, and
`PackchainError::BaselineMissing` typed variants for fetch-side
failure modes; surfaced through the new
`FetchError::Packchain(_)` wrapper. The `PackchainError` type is
re-exported at the crate root so consumers can match on packchain
failures without naming the `pub(crate)` engine module.
- `packchain` storage engine — Phase 2 (incremental push) of issue #52:
pushing to `?engine=packchain` now writes a content-SHA-keyed pack
under `packs/`, a sibling `.idx`, a newest-first `chain.json`
manifest, a nested `path-index.json` mapping repo paths to blob
SHAs at the tip, and (on first / force push) a baseline bundle at
`<tip>.bundle` so Phase 3 fetch can short-circuit a fresh clone.
First push is `TreeContents` from the local tip; incremental
pushes use `TreeAdditionsComparedToAncestor`, which yields a
self-contained ancestor-aware pack (the ancestor commit and tree
travel with the new commit; only ancestor-only blobs are omitted,
to be picked up from prior chain packs at fetch). `chain.json` is
the linearization point — pack/idx/baseline upload pre-lock to
keep the per-ref lock window bounded by JSON-PUT latency, and
under the lock the push writes path-index → FORMAT → HEAD →
chain.json. Concurrent pushers leave orphan packs on the loser;
Phase 5 GC reaps them. (#63, sub-issue of #52)
- Force push on the packchain engine collapses the chain to a fresh
single-segment manifest with `full_at = new tip` and replaces the
baseline bundle, deleting the prior baseline at the old `full_at`
best-effort (failure is logged at `warn` and never fails the push,
since chain.json has already committed).
- Idempotent same-SHA push on the packchain engine: if the local tip
matches the on-bucket `chain.tip`, push is a wire-level no-op
(`ok <ref>` with no uploads), parity with the bundle engine's
same-bundle short-circuit.
- Shallow-clone push rejection on the packchain engine: a local
repository with a `.git/shallow` boundary that the rev-walk crosses
surfaces `cannot push from a shallow clone` as a per-ref
`error <ref>` line rather than producing a permanently incomplete
remote.
- `packchain` storage engine — Phase 1 (foundation) of issue #52: new
`Packchain` variant of the `?engine=` URL selector and `FORMAT` key,
`get_bytes_range(key, Range<u64>)` on `ObjectStore` (S3 + Azure +
mock, with HTTP 416 mapped to `ObjectStoreError::RangeNotSatisfiable`),
on-bucket schema types (`chain.json` and nested-tree
`path-index.json`) with a validating `Sha40` newtype, and a
`git::extract_path_index` tree walker that builds a path-index from
a tip commit. Phase 1's blanket-abort dispatch is replaced in this
release by the per-engine routing introduced for Phase 2: a
packchain `fetch` still aborts with `EngineNotImplemented` (Phase
3 will fill it in) but `capabilities`, `list`, and `push` succeed.
(#52)
- Per-chunk upload progress for `git push`: bundle and zip-archive
uploads now emit one `tracing::info!` line per completed multipart
part / staged block (S3 and Azure), routed to stderr to stay within
helper-protocol stdout discipline. (#55)
- Gated `RUN_LARGE_BODY_TESTS=1` integration tests for >5 GiB upload
round-trips on both S3 and Azure backends, mid-body abort tests
that confirm the multipart abort path leaves no destination key
visible, and a deterministic unit test pinning `read_file_part`'s
io-error propagation. (#56)
- Hand-rolled multipart upload for S3 and explicit
`stage_block` + `commit_block_list` for Azure above a shared
`MULTIPART_PUT_THRESHOLD` (default 64 MiB). On S3 this lifts the
5 GiB single-`PutObject` ceiling and the 5 GiB single-`CopyObject`
ceiling — large LFS objects, large bundle pushes, and the
`manage doctor --fix` quarantine path now succeed for multi-GiB
objects. On Azure the dispatch criterion is the same so multi-GiB
transfers no longer rely on the SDK's opaque internal chunking.
Both backends emit one progress event per completed part / block.
Below the threshold the existing single-call paths are preserved
(no `CreateMultipartUpload` round trip for small bundles, lock
files, or HEAD writes). (#53)
- `ObjectStoreError::PayloadTooLarge { limit_bytes }` variant for
upload-body-too-big failures. The S3 classifier maps
`EntityTooLarge` (HTTP 400) and HTTP 413 onto it (limit 5 GiB single
PUT); the Azure classifier maps HTTP 413 and `RequestBodyTooLarge`
onto it (limit 5000 MiB single Put Blob). The push wire-line now
reads `"upload exceeds backend size limit (5 GiB)"` instead of
dumping an opaque SDK chain when a bundle exceeds the single-PUT
ceiling. (#54)
- Live-cloud shellspec tier under `spec/live/{s3,az}/` exercising the
helper binaries against real AWS S3 and real Azure Blob. New make
targets `shellspec-live-s3`, `shellspec-live-azure`, `shellspec-live`
(umbrella), and `shellspec-live-sweep` are not invoked by `make ci`,
`make pre-commit`, `make test`, or `make shellspec-integration`. Each
suite is gated by its own per-suite flag (`LIVE_S3=1` / `LIVE_AZ=1`,
set by the make target) plus the global acknowledgement variable
`LIVE_TESTS_I_UNDERSTAND_THIS_COSTS_MONEY=1` (loud-fail at
`BeforeAll`). Every run scopes writes under `live-test/<run-id>/`;
`AfterAll` plus an `EXIT`/`INT`/`TERM` trap delete the run prefix;
the cleanup helpers refuse to run unless the target prefix is
non-empty and starts with `live-test/`. `BeforeAll` runs a sentinel
write/read/delete pre-flight to catch missing IAM / RBAC permissions
before any scenario starts. The Azure suite resolves credentials via
the existing `?credential=<NAME>` /
`AZSTORE_<NAME>_KEY|CONNECTION_STRING|SAS` chain, and the
`shellspec-live-sweep` target now scans both backends (configurable
via `--backend s3|az|all`). Operator setup, env vars, costs, and
recovery are documented in `spec/live/README.md`. (#59)
- Storage-engine selector: `?engine=<name>` URL query parameter and
`<prefix>/FORMAT` bucket-level lock key. The only supported engine is
`bundle` (the existing git bundle v2 format, also the default when
`?engine=` is omitted). On the first push the engine is written to
`FORMAT`; subsequent connects read and validate it. A `?engine=` value
that conflicts with the stored `FORMAT` aborts with a clear error:
`"URL specifies engine X but this bucket uses Y; remove the ?engine=
parameter from the remote URL"`. Existing buckets without a `FORMAT`
key continue to work — the key is written on the next push. (#51)
- Shallow-fetch support in the helper protocol: `option depth <N>` is
now recognised and handled end-to-end. Depth is threaded through REPL
state (reset after each batch so it applies per-operation only) and
into `fetch_batch`, which runs a BFS from each fetched ref's tip to
collect the correct boundary commits and writes them atomically to
`.git/shallow` (read–merge–write so existing entries are preserved).
BFS is used rather than topological-walk `.take(N)` because topo order
does not match depth order at merge commits — all parents of the
included set that lie outside it are boundaries. Phase 1 only: bundles
are still downloaded in full; depth-limited bundle storage is a
separate future feature. (#50)
- Shellspec integration suites under `spec/integration/{s3,az}/`
exercising `git clone` / `git push` / `git fetch` /
`git push --force` / `git push --delete` against live rustfs and
Azurite Docker containers. Each backend covers core git ops,
force-push protection (PROTECTED#), the
`git-remote-object-store` management CLI (`protect`, `unprotect`,
`delete-branch`, `doctor --delete-stale-locks`), the LFS round-trip
via `git-lfs-object-store`, and concurrent / stale-lock contention.
Three new Makefile targets (`shellspec-integration-s3`,
`shellspec-integration-azure`, `shellspec-integration`) gate the
new suites behind Docker + cloud-CLI prerequisites;
`image-pin-check` guards against image-tag drift between the
shellspec helpers and the Rust integration tests.
- `protocol::backend::build` now runs an eager probe (single
`ListObjectsV2` for S3, `ListBlobs` first page for Azure with
`maxresults=1`) at backend construction. The probe folds well-known
failures into three categorical `BackendError` variants
(`BucketNotFound`, `NotAuthorized`, `InvalidCredentials`) so helper
binaries can emit single-line `fatal:` diagnostics that match
upstream `git_remote_s3/remote.py:574-593`. The probe runs once per
helper invocation and is off the per-command hot path. (#45)
- The LFS custom-transfer agent now emits `progress` events at each
network-chunk boundary, mirroring upstream
`git_remote_s3/lfs.py`'s `ProgressPercentage.__call__` callback.
Previously the agent emitted a single end-of-transfer event with
`bytesSoFar == size`, which left long uploads / downloads
appearing frozen and stripped `git-lfs` of any signal to detect
stalled transfers. Backends report bytes through a `ProgressSink`;
the agent forwards them through an `mpsc` channel into live
`progress` events on stdout. (#44)
- `Remote` struct as the primary library entry point for external
consumers. `Remote::connect(url)` parses a URL and opens a verified
backend connection in one call; `Remote::key(suffix)` computes correct
prefixed storage keys; `Remote::get_head()`, `Remote::put_head()`, and
`Remote::list()` cover the most common on-bucket operations; and
`Remote::store()` exposes the underlying `ObjectStore` (as `&dyn
ObjectStore`) for advanced use.
- Top-level re-exports for `ObjectStore`, `ObjectMeta`,
`ObjectStoreError`, `RemoteUrl`, `Remote`, `RemoteError`,
`BackendError`, and `BackendKind`; consumers no longer need
three-level module-path imports.
- `ProtocolError::is_broken_pipe()` method; the private
`is_broken_pipe(err: &io::Error)` helper is removed.
### Changed
- `packchain::list::list_refs` now fetches `chain.json` bodies in
bounded parallel (`MAX_FETCH_CONCURRENCY = 8`, matching Phase 3
fetch). Earlier sequential N round trips became a single bounded
batch — meaningful for buckets with many branches; negligible
for typical single-digit-branch repos.
- `packchain::list::list_refs` filters extracted ref paths through
`gix-validate`'s `RefName::new` check before emitting them to
git. A maliciously-planted key like
`<prefix>/refs/heads/../etc/passwd/chain.json` would otherwise
yield ref path `refs/heads/../etc/passwd` in the list response;
the filter rejects such names with `tracing::warn!` and skips
the entry. Defense-in-depth against bucket-write attackers.
- `delete-branch` documented as not deleting pack files for the
packchain engine. Pack keys can be shared across branches under
content-hash dedup (the umbrella issue's "exclusively owned by
that branch" claim was incorrect); `delete-branch` removes only
the branch's `chain.json`, `path-index.json`, baseline bundle, and
`PROTECTED#` marker. Operators run `gc` afterwards to reclaim
orphan packs. The behaviour itself is unchanged — `delete-branch`
always operated under `<prefix>/refs/heads/<branch>/` only — but
the invariant is now explicit.
- Cross-cutting packchain polish: `is_chain_json_key`,
`optional_prefix`, and `parse_pack_key_sha` consolidated into
`src/packchain/keys.rs` so `gc`, `list`, and `read` no longer
duplicate the same string-shape inspectors. `pub mod read;`
matches `pub mod gc;` so both submodules are reachable through
the public rustdoc tree at
`git_remote_object_store::packchain::{gc, read}`. New crate-level
doc-test in `src/lib.rs` walks `Remote::connect` →
`PackIndexCache::default` → `read_blob` using the crate-root
re-exports. New `# Example` sections on `gc::mark` and `gc::sweep`
show the canonical `Remote::connect` → `mark|sweep(remote.store(),
remote.prefix(), Opts::default())` shape for library consumers
driving GC programmatically.
- Tightened shellspec assertions
(`spec/integration/s3/`, `spec/live/s3/`, `spec/integration/az/`):
the `not ancestor` push wording is anchored to the documented
`NOT_ANCESTOR_TOKEN` constant in `src/protocol/push.rs`;
`git ls-remote` "ref absent" assertions distinguish empty-output
success from masked failure via the new
`assert_ls_remote_ref_absent` helper; the concurrent-push race
scenario now requires both divergent winners to be observed across
iterations rather than accepting `A || B`; the LFS spec is split
into two focused `It`s so each example has exactly one
load-bearing assertion that depends on the code under test. (#60)
- `S3Store::get_to_file` no longer ends in `unreachable!()`. The
retry-on-412 (head→GET race) loop is rewritten as an explicit
`match … { Err(PreconditionFailed) => retry once, other => other }`
over a new private `head_then_download` helper, mirroring the
Azure backend's shape. Every control-flow path now returns a
value, so the panic primitive is gone. `clippy::unreachable` is
denied at the workspace level to prevent regressions. (#49)
- Extracted local-branch primitives into a new `git::branch` submodule.
`git::rev_parse` is removed; callers use `git::branch::resolve`
instead. Added `BranchName` newtype that encapsulates the
`refs/heads/<name>` invariant and `git::branch::current` reporting
the branch HEAD points at (returning `None` for detached, unborn,
and non-`refs/heads/` HEADs). (#47)
- Restructured as a Cargo workspace: the library crate
(`git-remote-object-store`) stays at the repository root; the six
binary targets move to a new `cli/` sub-crate
(`git-remote-object-store-cli`). Install from source with
`cargo install --path cli`; `cargo build --workspace` is unchanged
for development builds.
- `protocol::run_main` is no longer part of the library API; it lives
in the CLI crate. `protocol::capabilities` and `protocol::option` are
now `pub(crate)`.
- `bundle_at` and `unbundle_at` now use a native `gix-pack 0.69`
implementation (`src/bundle.rs`) instead of shelling out to
`git bundle create` / `git bundle unbundle`. The `git` binary is no
longer required at runtime for bundle operations. The implementation
walks the commit graph with `rev_walk`, counts objects with
`count::objects` (using `ObjectExpansion::TreeContents` to include
trees and blobs), serialises with the `entry::iter_from_counts` →
`bytes::FromEntriesIter` pipeline, and writes the header + pack
atomically via `NamedTempFile::persist`. Unbundle parses the v2 header,
checks prerequisites, and calls `Bundle::write_to_directory`.
- `git::config_add` / `git::config_unset` now write through
`gix-config` and `gix-lock` instead of spawning `git config --add` /
`--unset`. The in-process path acquires `.git/config.lock`, parses with
`File::from_bytes_no_includes`, mutates via `SectionMut::push` /
`remove`, and atomically renames over `<git-dir>/config`. `--unset` on
a missing key now returns the typed `GitError::ConfigKeyNotSet` (the
callers that previously matched on `Subprocess` are updated).
`git::config_add_many` batches multiple key/value writes into a single
read / parse / lock / write cycle; `lfs::install::install` uses it to
set `lfs.customtransfer.<agent>.path` and `lfs.standalonetransferagent`
in one pass. The LFS agent's `install` / `enable_debug` /
`disable_debug` subcommands lose their `async` qualifier as a side
effect. (#46)
- `protocol::run_main` now returns `std::process::ExitCode` instead of
`anyhow::Result<()>` so the helper binaries
(`git-remote-{s3,az}-{http,https}`) can render categorical
`BackendError`s as upstream-style single-line `fatal:` messages
without `anyhow`'s `Display` chain layering on top. The
management binary (`git-remote-object-store`) downcasts through the
anyhow chain to the same effect. (#45)
- `BackendError` lost its `S3` / `Azure` construction-failure variants
in favour of `BucketNotFound { kind, name }`,
`NotAuthorized { kind, action, name }`, and
`InvalidCredentials { source }`. Greenfield project — no compat
shim. (#45)
- `ObjectStore::get_to_file` now takes a `GetOpts` argument; `PutOpts`
gains an optional `progress` field. Both carry an
`Option<ProgressSink>` that backends drive at chunk boundaries
(per-range for the S3 multipart download path, per body chunk for
the S3 single-PUT and Azure download paths). Bundle / lock / HEAD
call sites pass `GetOpts::default()` and `progress: None`; the LFS
agent populates the sink. This is a public-API break for callers of
`ObjectStore::get_to_file`. (#44)
- Renamed `crate::object_store::Error` to `ObjectStoreError`. Every
importer previously aliased it via `use ... as ObjectStoreError`;
the rename pushes the action prefix into the type so pattern
matches read `ObjectStoreError::NotFound(_)` natively. Breaking
for external library consumers (none in-tree besides the helper /
management binaries). (#37)
- Renamed `PushOutcome::as_protocol_line` to `to_protocol_line`
(allocates `String` via `format!`, so `to_*` matches Rust API
Guidelines C-CONV). Replaced the free helper
`into_dialoguer_error` with `impl From<dialoguer::Error> for
ManageError`, dropping the `map_err(...)` boilerplate at both
call sites in favour of `?`. (#38)
- Renamed `ManageBranch::delete_branch`/`protect_branch`/
`unprotect_branch` to `delete`/`protect`/`unprotect` — the
receiver type already names the subject; the method-side
`_branch` was redundant noise. The CLI subcommand names
(`delete-branch`, `protect`, `unprotect`) are unchanged. (#39)
- Renamed `AzureBlobStore` to `AzureStore` (symmetric with
`S3Store`); renamed `AzureAddressing::Subdomain` to
`AzureAddressing::VirtualHosted` (symmetric with
`S3Addressing::VirtualHosted` and matches AWS-canonical
terminology); renamed the private `protocol::list::BundleEntry`
to `ListedBundle` so it no longer collides with the public
`manage::snapshot::BundleEntry`. (#40)
- Renamed `git::validate_ref_name` to `is_valid_ref_name` so the
`bool`-returning predicate carries the `is_*` prefix per the
project naming rules. (#41)
- Hoisted the empty-prefix key builder out of `manage` into a new
`crate::keys` module so the protocol, LFS, and management layers
all share one source of truth for `<prefix>/<suffix>` joining.
Five sites (`push.rs`, `fetch.rs`, `list.rs`, `lfs/agent.rs`, plus
three management call sites) previously open-coded the same
empty-prefix `match`. Added `network_boxed` next to `other_boxed`
in `object_store::error` so the seven open-coded
`|e| ObjectStoreError::Network(Box::new(e))` closures collapse to
function pointers.
- Tightened protocol-test coverage: dropped the stale
`bucket = "0.a"` proptest seed (no longer reachable from
`arb_bucket()`), replaced placeholder `aaaa.bundle` /
`bbbb.bundle` fixtures with realistic 40-hex SHAs, added a
regression test for the previously-untested
`parse_remote_sha_from_key` failure arm in `protocol::push`,
added end-to-end S3 helper-binary coverage modeled on the
existing Azure pattern (push / clone / fetch / LFS), and pinned
`option verbosity` behaviour for `n >= 2`. (#35)
- Strengthened three tests surfaced by the audit-tests pass:
`pre_lock_multi_bundle_rejection_surfaces_unchanged` now pins the
byte-exact wire bytes (the loose `contains("multiple bundles")`
would not have caught the missing `?` that #34 fixed); added
`fix_head_out_of_range_select_returns_internal_error` to cover
the HEAD-candidate `ManageError::Internal` branch that was
structurally identical to the bundle-index branch but lacked
coverage; and the Azure `put_path_with_opts_uploads_body` test
now verifies `content_disposition` and `x-ms-meta-*` propagate on
the wire via a signed HEAD, mirroring its S3 sibling.
- Documented backend size limits (AWS / Azure SDK API ceilings),
lack of resume after upload failure, and the open `git push`
upload-progress gap (#55) in a new "Known limitations" section in
`README.md`, with cross-references from the s3 and azure
module-level docs. (#57)
- Clarified the `ObjectStore::copy` trait contract: the body is
preserved on every backend, but user-metadata propagation is
best-effort. `S3Store::copy` (server-side `CopyObject`) does
propagate it; `AzureStore::copy` (download-then-upload, since
`azure_storage_blob` 0.12 does not ergonomically expose `Copy
Blob` with shared-key auth) currently drops it. Callers must not
depend on metadata round-tripping through `copy`.
- Removed the stale "Azure backend wired in Phase 11 — until then
the REPL exits early with a 'not yet implemented' error" note
from both Azure helper shim binaries; the wrappers now describe
the current shape symmetrically with the S3 shims. (#31)
- ls-remote / `cmd_list` wire output documentation now matches the
actual behaviour: one line per bundle (not per ref), sorted by
`LastModified` descending, with the `@<head> HEAD` line prepended
only when not `list for-push` and the head ref appears in the
listed bundles. (#36)
- `README.md` "Status" section now describes the gitoxide /
subprocess split honestly: gitoxide is used for rev-parse,
is-ancestor, ref-name validation, remote-URL inspection, archive,
last-commit-message, ref discovery, and object resolution; bundle
`create` and `unbundle` still shell out via the single `run_git`
helper because `gix` 0.82 has no public bundle API. (#36)
### Removed
- Internal `run_git` helper — was the sole subprocess-spawning point in
production; removed once `bundle_at` / `unbundle_at` moved to the native
`gix-pack` path.
- `GitError::GitBinaryMissing` — was only reachable through `run_git`;
removed along with it.
- `GitError::Subprocess` — likewise only reachable through `run_git`.
### Fixed
- `list` command on packchain remotes now returns `chain.tip`
rather than the baseline `<full_at>` SHA. The bundle-engine
`list` handler parsed `<sha>.bundle` filenames; for packchain
the bundle is the (fixed) baseline, not the moving tip, so
after any incremental push `git ls-remote` / `git fetch` /
`git pull` saw stale tips. Fix: engine-aware dispatch in
`protocol::list::handle_list` — bundle keeps its bundle-key
parser, packchain reads each ref's `chain.json` and reports
`chain.tip`. Per-entry `chain.json` parse failures skip with
a `tracing::warn!` so a single corrupt branch does not
blackhole the whole listing. (#72)
- Sanitize the commit-message summary that flows from
`git::last_commit_message` into the
`codepipeline-artifact-revision-summary` user-metadata header on
the zip-archive upload. ASCII control bytes (CR, LF, NUL, …) are
collapsed to spaces so a forged commit summary cannot CRLF-inject
forged user-metadata headers on the upload. Both backend SDKs
reject CRLF at the transport layer today, but defending at the
call site surfaces a clean, predictable header value instead of a
cryptic 400.
- Dotted S3 bucket names (e.g. `bucketname.com`) in virtual-hosted URLs
are now parsed correctly. `detect_s3_addressing` scans for the
rightmost `.s3.` or `.s3-` AWS service infix anywhere in the host
(instead of only checking the second label), and the virtual-hosted
bucket extractor returns the full prefix preceding that infix (instead
of just the leftmost label). Hosts of the shape
`bucketname.com.s3.<region>.amazonaws.com` and the legacy
`bucketname.com.s3-<region>.amazonaws.com` form now resolve to the
correct bucket; the previous behaviour silently routed to the wrong
bucket or produced a misleading `InvalidBucket` error. (#48)
- Both `S3Store` and `AzureStore` now apply HTTP-layer
read/connect timeouts so a *hot* pooled connection that has gone
silent (e.g. mid-LFS push when the server VIP rotates) fails fast
instead of waiting for the OS-level TCP retransmit timeout
(~15 minutes on Linux). Pool-idle alone bounds only *idle* pooled
connections; a connection used within the last 30 s never goes
idle. S3 sets `read_timeout(30s)` on the SDK's `TimeoutConfig`
(smithy semantics: time-to-first-byte, not body-transfer); `connect_timeout`
stays at the SDK default of 3.1 s. Azure sets `connect_timeout(10s)`
and `read_timeout(30s)` on the custom `reqwest::Client` (per-read
semantics: resets after each successful read). The third
remediation checkbox in #26 ("force a fresh connection on
connection-level retry") is reframed: the existing one-shot retry
in `get_to_file` is a 412 mutation-race retry where the connection
is healthy by definition, so forcing a fresh socket there does not
help — the timeout-then-SDK-retry path covers the actual stuck-
connection case. (#26)
- `S3Store::from_remote_url` now installs a custom
`aws-smithy-http-client` with `pool_idle_timeout(30s)` so DNS
rotation no longer wedges a long-running LFS session until the
OS-level TCP timeout fires (~15 minutes on Linux). The same TLS
provider as the SDK's `default-https-client` (`rustls-aws-lc`) is
selected explicitly so cargo unifies on a single rustls stack. TCP
keepalive is **not** wired here: `aws-smithy-http-client` 1.1.12's
public `Builder` API exposes `pool_idle_timeout` but does not
expose `tcp_keepalive`; the dominant pool-reuse-of-dead-VIP
failure is fixed by the idle timeout alone. (#26, #27)
- `AzureStore::from_remote_url` now configures the SDK's HTTP transport
with `pool_idle_timeout(30s)` and `tcp_keepalive(30s)`. Pooled
connections to a rotated VIP can no longer wedge a long-running LFS
session until the OS-level TCP timeout fires (~15 minutes on Linux).
The custom transport leaves `ClientOptions::per_try_policies`
untouched, so shared-key / SAS signing continues to fire on every
request. (#26, #28)
- `push.rs` parse-error message now names the full
`git-remote-object-store doctor` binary instead of the bare word
`doctor`, matching the wording of the other doctor-pointing error
paths. (#22)
- Management CLI (`doctor`, `delete-branch`, `protect`, `unprotect`)
now accepts root-of-bucket remotes (empty repository prefix)
end-to-end, building keys like `refs/heads/main/...` and `HEAD`
without a leading slash. (#29, #32)
- `AzureStore::copy` now streams through a tempfile via
`get_to_file` + `put_path` instead of buffering the whole body in
RAM. Memory is bounded by the SDK's per-block partition size
regardless of blob size, so `Doctor::evict_losing_bundle`'s
duplicate-bundle quarantine no longer pulls multi-GiB bundles
through the helper process. (#30)
- Replaced production `expect()` panics in `manage::doctor`,
`protocol::fetch`, and `object_store::s3` with structured error
propagation. Snapshot-lookup invariants now surface as
`ManageError::Internal`; mutex poisoning is recovered via
`PoisonError::into_inner`; the `JoinSet`/`Arc::try_unwrap` flush
path falls back to a locked-flush instead of aborting. (#33)
- Under-lock duplicate-bundle push error now ends with the trailing
`?` suffix used by every other `error <ref> "..."` message in the
helper, so the wire format is consistent across the pre-lock and
under-lock branches. Deliberate divergence from upstream Python,
which omits the `?` on this path. (#34)
- Both `S3Store` and `AzureStore` now error with
`ObjectStoreError::Other` when a `head_object` response omits
`Content-Length`, instead of treating the missing header as
`size = 0` and silently writing an empty file at the destination.
Mirrors the existing `last_modified` guard. (#43)
- `AzureStore::put_path` streams files from disk via the SDK's
`FileStream` + `BlockBlobClient::upload` (auto-partitioned
`stage_block` + `commit_block_list`), restoring the cross-backend
streaming guarantee from #21 that the Azure side had been silently
inheriting from the trait's read-then-`put_bytes` default. Memory
is bounded by `parallel × partition_size` (≈16 MiB by default)
regardless of file size. (#42)
- `protocol::list::read_remote_head` now treats `Some("")` as a
no-prefix repository, matching the rest of the helper. The previous
inline `match` produced a `/HEAD` key for root-of-bucket remotes
whose prefix parsed as the empty string, which never resolved
on the wire.
- `release_lock` now propagates non-`NotFound` delete failures instead of
silently swallowing them. When the push itself succeeds but the lock
cannot be released, the outcome is replaced with
`error <ref> "failed to release lock. ..."` matching upstream
`cmd_push`'s `finally` block. A genuine push error is never masked by
a release failure. (#18)
- `S3Store::get_to_file` now guards against concurrent object mutation:
every GET carries `If-Match: <etag>` from the preceding `HeadObject`.
If the object is overwritten mid-download, S3 returns 412 and the
operation retries once before propagating `Error::PreconditionFailed`.
(#20)
- Push batches no longer abort on the first per-push transport, git, or
local-I/O failure. `push_batch` now catches `PushError::Store`, `Git`,
`Io`, and `Sha` per-push and converts them to `error <ref> "..."` outcome
lines so the batch continues, mirroring upstream `cmd_push`'s
try/except shape (`../git-remote-s3/git_remote_s3/remote.py:286-296`).
Without this, a single 5xx blip mid-batch would silently drop the
outcome lines for already-completed pushes and leave git's local
ref-tracking inconsistent with the remote. `PushError::Parse`,
`InvalidLocalSpec`, and `RemoteRef` still abort the batch — those mean
subsequent commands cannot be trusted.
- `url::is_valid_bucket` now rejects the AWS-reserved bucket prefixes
(`xn--`, `sthree-`, `amzn-s3-demo-`) and suffixes (`-s3alias`,
`--ol-s3`, `.mrap`, `--x-s3`, `--table-s3`), enforces the
begin-and-end-with-alphanumeric rule, rejects consecutive periods, and
rejects names formatted as IPv4 dotted-quads. `url::is_valid_container`
now enforces the matching Azure rules: alphanumeric bookends and no
consecutive hyphens. Closes #17.
### Security
- `packchain` `bundle-uri` (issue #71) now rejects derived
ref-paths containing `=` before emission. Defense-in-depth
hardening flagged by /security-review: `gix_validate::reference::name`
bans `:`, `\n`, `\r`, ` `, control chars, and other framing-
relevant bytes — but it permits `=`, which git's `bundle-uri`
parser uses as the id/value split. The pre-existing `:` ban
forecloses scheme injection (no host-relocation SSRF), but a
ref-path with `=` could still produce a malformed wire entry on
shared-prefix deployments where another tenant has bucket-write
access. The new `is_safe_for_bundle_uri_emission` check warns
and skips such entries. Mutation-verified
(`skips_chain_json_with_equals_in_ref_name`).
## [0.1.0] - 2026-04-26
Initial release. The full feature surface is in place: URL parser,
gitoxide-backed git operations, the `ObjectStore` trait with S3 and
Azure Blob backends, the helper protocol REPL, parallel `fetch`,
locked `push`, the management CLI (`doctor` / `delete-branch` /
`protect` / `unprotect`), the LFS custom-transfer agent, the
helper-binary shims for both schemes, and the documentation /
packaging / release pipeline.
### Added
- README backend matrix and side-by-side S3/Azure examples covering
clone, push, and management commands. (#14)
- `cargo install` instructions plus the `+`-form symlink workaround
for git's helper lookup (xtask automation tracked as a follow-up
issue). (#14)
- GitHub Actions CI jobs for the `integration-s3` and
`integration-azure` features (Docker-backed RustFS / Azurite
fixtures), plus a `markdownlint-cli2` job and an `--all-features`
clippy pass so feature-gated code paths are linted. (#14)
- Tag-triggered release workflow (`.github/workflows/release.yml`)
that builds release binaries on Linux x86_64 and macOS arm64,
splits debug info into separate `.debug` / `.dSYM` artefacts via
`objcopy --only-keep-debug` / `dsymutil`, strips the primary
binary, and publishes both tarballs to a GitHub Release per the
comment in `Cargo.toml`. (#14)
- `README.md` covering install, URL grammar, the
`protocol.s3+https.allow always` / `protocol.az+https.allow always`
config required for submodule URLs, AWS credential resolution, the
Azure `AZSTORE_<NAME>_KEY` / `_CONNECTION_STRING` / `_SAS` aliases,
and the LFS custom-transfer agent install flow. (#12)
- End-to-end binary tests (Phase 12) in
`tests/azure_store_integration.rs`: drive `git push` / `git clone` /
`git fetch` against the real `git-remote-az+http` helper binary
through Azurite, plus an LFS round-trip exercising
`git-lfs-object-store install`. The cargo bin name
(`git-remote-az-http`) is symlinked to the `+`-form git looks up in a
per-process tempdir prepended to `PATH`. Gated on
`--features integration-azure` alongside the trait-level coverage.
(#12)
- Azure Blob Storage backend (`AzureStore`, Phase 11): full
`ObjectStore` trait implementation against the official
`azure_storage_blob` 0.12 crate. `list` paginates through
`BlobContainerClient::list_blobs`; `get_to_file` streams via the
SDK's parallelised `BlobClient::download` (no hand-rolled multipart
on Azure, asymmetric with S3 by design); `put_bytes` /
`put_if_absent` use `BlockBlobClientUploadOptions::with_if_not_exists`
to surface 409/412 contention as `Ok(false)`. Wired into
`protocol::backend::build`, so existing `git-remote-az+https` /
`git-remote-az+http` shims now drive a real backend. (#11)
- Custom shared-key signing policy (`auth::SharedKeySigningPolicy`):
the SDK does not yet support shared-key authentication
(`Azure/azure-sdk-for-rust#2975`), so we install our own per-try
`azure_core::http::policies::Policy` that signs each outgoing
request with the Azure Storage shared-key v2 scheme. This is the
only way to authenticate against Azurite without an HTTPS+OAuth
setup, and unblocks production accounts that still use account
keys. SAS-token signing (`SasSigningPolicy`) and
`?credential=<NAME>` env-var resolution
(`AZSTORE_<NAME>_KEY` / `_CONNECTION_STRING` / `_SAS`) ship in the
same patch. (#11)
- Azurite-backed integration suite
(`tests/azure_store_integration.rs`, gated on
`--features integration-azure`): mirrors the RustFS S3 fixture
(one shared container, fresh-per-test container allocation, the
16-racer `put_if_absent` contention canary, and round-trips for
`head` / `list` / `copy` / `delete` / `get_to_file` zero-byte and
multi-megabyte). (#11)
- LFS custom-transfer agent (`git-lfs-object-store`, Phase 10): a single
binary that serves both backends. Subcommands `install`,
`enable-debug`, and `disable-debug` mutate the local repo's
`git config`; passing no argument (or `debug`, set automatically by
`enable-debug`) starts the LFS REPL. The REPL handles the `init`,
`upload`, `download`, and `terminate` events of the line-oriented
JSON protocol: uploads HEAD `<prefix>/lfs/<oid>` and skip on hit,
otherwise stream the body and emit a final `progress` plus
`complete`; downloads stream to `<git-dir>/lfs/tmp/<oid>` and emit
`complete` with the path. Debug logs go to
`<git-dir>/lfs/tmp/git-lfs-object-store.log` when enabled, never to
stdout. (#10)
- Management CLI (`git-remote-object-store`) with `doctor`,
`delete-branch`, `protect`, and `unprotect` subcommands. Each accepts a
remote URL (`s3+https://…`, `az+https://…`) or the name of a git remote
configured in the current repository, and dispatches to the right
backend through the `ObjectStore` trait. The doctor analyzes the
on-bucket layout, offers to keep or quarantine duplicate bundles per
ref (`<ref>_<uuid8>` quarantine refs by default; `--delete-bundle`
switches to outright deletion), prompts for a replacement when `HEAD`
is invalid, and scans `*.lock` keys against a TTL (`--lock-ttl`,
defaults to 60 s) with optional `--delete-stale-locks`. Interactive
prompts go through a `Prompter` trait so unit tests drive the same
code path with a scripted prompter against `MockStore`. (#9)
- `ObjectStore::put_path` streams local files to the backend without
buffering in process memory. The push handler now uses it for bundle
and zip artifact uploads, removing OOM risk for large repos and the
5 GiB single-PUT ceiling. (#21)
- Shared protocol-test helpers extracted into `tests/common/mod.rs`,
eliminating ~100 lines of duplicated `git()`, `git_capture()`,
`s3_url()`, `drive_in()`, and `git_available()` across
`protocol_smoke.rs`, `protocol_fetch.rs`, and `protocol_push.rs`.
(#19)
- Phase 8 `push` handler with per-ref locking (`src/protocol/push.rs`): the
REPL now batches `push <refspec>` lines until a blank line and processes
them sequentially under per-ref locks at `<prefix>/<ref>/LOCK#.lock`,
acquired via the trait's `put_if_absent` (S3 `If-None-Match: *` /
Azure `If-None-Match: *`). On contention the handler `head`s the lock
and, if its `LastModified` exceeds the TTL (default 60 s, override via
`GIT_REMOTE_S3_LOCK_TTL_SECONDS` per upstream parity), deletes and
retries once; otherwise it surfaces a "lock held" error line. After
acquiring the lock the handler re-lists bundles and rejects the push if
another client wrote a different bundle ("stale remote") or left the
ref in a multi-bundle state. Force pushes against a ref carrying a
`PROTECTED#` marker are demoted to non-force and re-checked against
`merge-base --is-ancestor`. The `?zip=1` URL flag triggers an
additional `repo.zip` upload alongside the bundle, with
`Content-Disposition: attachment; filename=repo-<short-sha>.zip` and
`codepipeline-artifact-revision-summary` user metadata. Per-push
outcomes (`ok <ref>` / `error <ref> <reason>`) are written one line per
command, followed by the protocol's blank-line terminator. Closes #8.
- `git::bundle_at(cwd, …)`: path-only variant of `git::bundle` so the
push handler does not have to hold `gix::Repository` (which is `!Sync`)
across `.await`, mirroring the path-only `unbundle_at` Phase 7
introduced.
- Phase 7 parallel `fetch` handler (`src/protocol/fetch.rs`): the REPL now
collects `fetch <sha> <ref>` lines until a blank line and dispatches them
through a `tokio::task::JoinSet` bounded by a `tokio::sync::Semaphore`
with `MAX_FETCH_CONCURRENCY = 8` permits (parity with upstream's
`boto3.s3.transfer.TransferConfig(max_concurrency=8)`). Each task
downloads `<prefix>/<ref>/<sha>.bundle` to a private tempdir, runs
`git bundle unbundle` against the local repository's working directory,
and records the SHA in a session-wide `Arc<Mutex<HashSet<Sha>>>` so a
later batch in the same REPL session skips already-fetched refs. The
batch driver drains every task before returning so a single failure
cannot leave zombies running into a closing helper. `protocol::run` now
takes a `repo_dir: PathBuf` parameter; `run_main` derives it from the
process cwd (set by git when it invokes the helper).
- Phase 6 remote-helper protocol skeleton (`src/protocol/`): asynchronous
REPL (`protocol::run`) generic over its reader/writer so tests can drive
it via `tokio::io::duplex`, plus a shared `protocol::run_main` entry that
every `git-remote-{s3,az}-{http,https}` binary now invokes. Implements
the four Phase-6 commands: `capabilities` (announces `*push`, `*fetch`,
`option`), `list` and `list for-push` (lists `<sha> <ref>` lines, sorted
by `LastModified` descending, filtered to
`^refs/.+/.+/[a-f0-9]{40}\.bundle$`, with `@<ref> HEAD` emitted only when
not for-push and the head ref appears in the listing), and `option
verbosity <n>` (responds `ok` and reloads the `tracing` filter to `info`
for `n >= 2`, `unsupported` otherwise). Stripping happens against
`<prefix>/` so a sibling-prefix repo cannot match. HEAD body is trimmed
per upstream `.strip()` semantics; `Error::NotFound` on HEAD is
swallowed silently. `fetch`/`push` lines are recognised but return a
structured "not yet implemented" error pending Phases 7/8 — fail-fast
rather than the upstream silent-queue-then-flush so `git fetch`/`git push`
surfaces a clear reason. Stdin EOF is a clean exit; stdout `BrokenPipe`
is caught at the top level and the process exits 0 (mirroring
upstream's `os.dup2(devnull, stdout)` trick). On Unix, SIGPIPE is masked
via `tokio::signal::unix::signal(SignalKind::pipe())` so writes return
EPIPE rather than killing the process.
- Phase 6 backend factory (`protocol::backend::build`) dispatches a parsed
`RemoteUrl` to `S3Store` (Phase 5) or returns
`BackendError::AzureNotImplemented` for `RemoteUrl::Azure` until
Phase 11 lands the Azure backend.
- Phase 6 stderr-only tracing initialiser (`protocol::tracing_init`)
honours `GIT_REMOTE_OBJECT_STORE_VERBOSE` and the upstream-compat alias
`GIT_REMOTE_S3_VERBOSE`; a numeric `>= 2` bumps the start level to
`info`. The filter sits behind `reload::Layer` so the protocol can flip
verbosity at runtime.
- `clippy.toml` now bans `println!`/`print!`/`dbg!` via `disallowed-macros`
per `.claude/rules/protocol-stdout.md`. The management CLI and LFS
agent opt out at the file level when they need to write to stdout.
- Tokio's `io-std` feature is now enabled so the helper binaries can read
stdin and write stdout asynchronously.
- Smoke test `tests/protocol_smoke.rs` (gated on `feature = "test-util"`)
drives `protocol::run` end-to-end against `MockStore` via
`tokio::io::duplex`, asserting exact stdout bytes for capabilities,
list / list for-push, option verbosity, the `fetch`/`push` stub error
paths, EOF, blank lines, HEAD trimming, sibling-prefix collisions, and
bundle-key filter rejections.
- Phase 5 S3 backend (`src/object_store/s3.rs`): full `ObjectStore`
implementation against `aws-sdk-s3` 1.x. The SDK owns SigV4, retries,
and connection pooling; this module owns URL → SDK config translation
(endpoint normalisation that strips both the bucket label and any
query string before handing the URL to the SDK; region resolution
that honours `?region=`, parses AWS hostnames, and falls back to
`us-east-1` for non-AWS endpoints so SigV4 has a region to sign
with), error classification (404→`NotFound`, 403→`AccessDenied`,
412→`PreconditionFailed`, 409→`Conflict`, network/timeout→`Network`),
and a hand-rolled multipart download orchestrator (HEAD for size,
then concurrent ranged GETs through a Tokio semaphore, max 8 in
flight, 16 MiB chunks, 25 MiB threshold) matching the upstream
`boto3.s3.transfer.TransferConfig` defaults. `put_if_absent` calls
`put_object().if_none_match("*")` and collapses both 412 and 409 to
`Ok(false)` so racing `If-None-Match: "*"` PUTs surface as "lock not
acquired" rather than as hard errors. `get_to_file` writes to a
sibling `NamedTempFile` and persists on success so a partial failure
cannot leave a corrupt destination. `delete` HEADs first to honour
the trait's `Err(NotFound)` contract on missing keys (S3 DELETE is
idempotent). Copy keys with reserved characters (`#` from
`LOCK#.lock`) are percent-encoded before being placed in the
`x-amz-copy-source` header. Integration tests run against RustFS
(Apache-2.0) via `testcontainers` behind the new `integration-s3`
Cargo feature (Docker required). The fixture pins the RustFS image
tag explicitly so alpha-version drift cannot break CI silently.
Tests cover round-trip put/get, pagination beyond one page,
concurrent `put_if_absent` contention, the 50 MiB+ multipart
download path, percent-encoded copy, atomic-fail behaviour of
`get_to_file`, and `AccessDenied` mapping.
- Phase 4 object-store seam (`src/object_store/`): backend-neutral
`ObjectStore` async trait covering list / head / get / put /
put-if-absent / copy / delete, shared `Error` enum mapping S3
and Azure failure codes onto `NotFound` / `AccessDenied` /
`PreconditionFailed` / `Conflict` / `Network` / `Other`, and the
`ObjectMeta` / `PutOpts` value types. The trait is dispatched via
`Arc<dyn ObjectStore>` (`async_trait` macro keeps `dyn + Send + Sync`
ergonomic). An in-memory `MockStore` lives behind a new `test-util`
Cargo feature (also active under `cfg(test)`) so unit tests in this
crate AND integration tests for higher phases can drive push, fetch,
locking, and doctor logic without MinIO/Azurite. The mock supports
FIFO fault injection (`PreconditionFailed` on `put_if_absent`,
`NotFound` on `head`, `Network` on `get_bytes`, `AccessDenied` on
`list`) so Phase 8's stale-lock retry path is deterministic, and
`insert_with` back-dates `last_modified` for the staleness check.
- Phase 3 git wrapper (`src/git.rs`): the helpers from upstream
`git_remote_s3/git.py` ported onto `gix` (gitoxide) with two newtypes
(`Sha`, `RefName`), a `GitError` aggregate, and a single private
`run_git` helper that funnels every `git` subprocess through one
stdio-disciplined entry point. `archive` uses `gix-archive`'s native
zip writer; `bundle`/`unbundle` retain a subprocess fallback because
`gix` 0.82 has no public bundle API. Spike result captured in
`docs/development/spike-gix-bundle-parity.md`.
- URL parser (`src/url.rs`): `parse(&str) -> Result<RemoteUrl, ParseError>`
for the `s3+https`, `s3+http`, `az+https`, `az+http` grammar.
Includes addressing-style auto-detection with
`?addressing=path|virtual` override, query-flag extraction (`zip`,
`profile`, `credential`, `region`), and cleartext-HTTP gating —
non-loopback `*+http://` is rejected unless
`GIT_REMOTE_OBJECT_STORE_ALLOW_HTTP=1` is set.
- Integration tests in `tests/url_parsing.rs` covering every concrete
URL example in the grammar plus negative cases for invalid bucket /
account / container charsets, missing segments, unknown flags,
illegal flag values, and cleartext-HTTP rejection. `proptest`
round-trip (parse → display → parse) for the legal grammar.
- Cargo manifest with the dependency set used throughout (tokio,
thiserror/anyhow, tracing, time, serde, clap v4, url, gix and
selected sub-crates, bytes, tempfile).
- Module skeleton (`url`, `git`, `protocol/*`, `object_store/*`,
`lfs`, `manage/*`).
- Placeholder `[[bin]]` shims for the remote-helper schemes plus the
management and LFS binaries.
- GitHub Actions CI workflow running `cargo fmt --check`,
`cargo clippy --all-targets -- -D warnings`, and `cargo test`.
### Changed
- `protocol::ProtocolError::Push` now wraps a structured `push::PushError`
enum (`Parse` / `InvalidLocalSpec` / `RemoteRef` / `Sha` / `Store` /
`Git` / `Io`) instead of the Phase 6 `PushNotImplemented` placeholder.
The REPL acquired a `Mode::Push` accumulator alongside the existing
`Mode::Fetch` one; switching modes mid-batch resets the opposite
accumulator (mirrors upstream `process_cmd`).
- `git::bundle` and `git::archive` now take `spec: &str` (a permissive
rev-spec) instead of `&RefName`. Storage-key types remain strict; the
rev-spec passed to git itself is just a string git already validates.
- `protocol::ProtocolError::Fetch` now wraps a structured `fetch::FetchError`
enum (`Parse` / `Sha` / `Ref` / `Store` / `Io` / `Git` / `Join`) instead
of the Phase 6 `FetchNotImplemented` placeholder.
- `git::unbundle` is now a thin wrapper over a new
`git::unbundle_at(cwd, …)` path-only variant. The parallel fetch path
uses the path variant because `gix::Repository` is `!Sync` and cannot be
shared across spawned tasks.
- Fixed §3.1 Azure example to use `myaccount` rather than `my-account`;
the previous form contradicted the §3.5 account charset rule
`[a-z0-9]{3,24}` (no hyphens).
- Spike result: `cargo` rejects `+` in `[[bin]] name` (it derives a
crate name from the bin name and `+` is not a legal crate-name
character). The cargo bins therefore use hyphenated names
(`git-remote-s3-https`, `git-remote-s3-http`, `git-remote-az-https`,
`git-remote-az-http`) and a later `xtask` step will rename / hardlink
them to the `+` form expected by `git` at install time.
### Fixed
- `release_lock` now propagates non-`NotFound` delete failures instead of
silently swallowing them. When the push itself succeeds but the lock
cannot be released, the outcome is replaced with
`error <ref> "failed to release lock. ..."` matching upstream
`cmd_push`'s `finally` block. A genuine push error is never masked by
a release failure. (#18)
- `S3Store::get_to_file` now guards against concurrent object mutation:
every GET carries `If-Match: <etag>` from the preceding `HeadObject`.
If the object is overwritten mid-download, S3 returns 412 and the
operation retries once before propagating `Error::PreconditionFailed`.
(#20)
- Push batches no longer abort on the first per-push transport, git, or
local-I/O failure. `push_batch` now catches `PushError::Store`, `Git`,
`Io`, and `Sha` per-push and converts them to `error <ref> "..."` outcome
lines so the batch continues, mirroring upstream `cmd_push`'s
try/except shape (`../git-remote-s3/git_remote_s3/remote.py:286-296`).
Without this, a single 5xx blip mid-batch would silently drop the
outcome lines for already-completed pushes and leave git's local
ref-tracking inconsistent with the remote. `PushError::Parse`,
`InvalidLocalSpec`, and `RemoteRef` still abort the batch — those mean
subsequent commands cannot be trusted.
- `url::is_valid_bucket` now rejects the AWS-reserved bucket prefixes
(`xn--`, `sthree-`, `amzn-s3-demo-`) and suffixes (`-s3alias`,
`--ol-s3`, `.mrap`, `--x-s3`, `--table-s3`), enforces the
begin-and-end-with-alphanumeric rule, rejects consecutive periods, and
rejects names formatted as IPv4 dotted-quads. `url::is_valid_container`
now enforces the matching Azure rules: alphanumeric bookends and no
consecutive hyphens. Closes #17.
### Security
- Disable `aws-sdk-s3`'s default `rustls` feature to drop the legacy
`rustls 0.21` / `rustls-webpki 0.101.x` dependency chain pulled in by
`aws-smithy-runtime/tls-rustls`. The crate now uses the modern
`default-https-client` path (`rustls 0.23` / `rustls-webpki 0.103.x`),
resolving GHSA-4p46-pwfr-66x6 (high — DoS via panic on malformed CRL
BIT STRING) and the two webpki name-constraint advisories
(GHSA-fjxv-7rqg-78g4, GHSA-fhc7-32rr-h57g).