Expand description
Incremental pack-chain storage engine (issue #52).
Push (#63) writes incremental packs keyed by content SHA, a
newest-first [schema::ChainManifest], a nested
[schema::PathIndex] of repo paths to blob SHAs, and a baseline
bundle on the first / force push so a fresh clone short-circuits
through bundle-uri. Fetch (#64), direct file access (#65,
read_blob library API), compaction (#67), and GC (#66) are all
implemented in sibling modules. Push artefacts on the bucket:
<prefix>/FORMAT "packchain"
<prefix>/HEAD "refs/heads/main"
<prefix>/refs/heads/<branch>/LOCK#.lock held during write, released after
<prefix>/refs/heads/<branch>/chain.json newest-first manifest (THE commit point)
<prefix>/refs/heads/<branch>/path-index.json nested tree → blob SHA map
<prefix>/refs/heads/<branch>/<tip>.bundle baseline (first / force push only)
<prefix>/packs/<content-sha>.pack incremental pack
<prefix>/packs/<content-sha>.idx pack indexOnce the push lands, fetch resolves shallow / full clones via
sequential pack install (fetch.rs), read_blob reads single
blobs via the path-index without rehydrating the chain
(read.rs), and the manage compact / manage gc subcommands
reap orphans and collapse the chain (compact.rs, gc.rs).
§Linearization point
chain.json is the commit point: pack/idx/baseline upload
pre-lock, then under the per-ref lock the push writes
FORMAT → HEAD → chain.json → path-index.json. Anything that
crashed before the chain.json PUT leaves orphan keys
(pack/idx/baseline at content-SHA or tip-SHA names) which
manage gc reaps. Anything written after chain.json
(path-index.json overwrite, force-push baseline cleanup) is
post-commit and may be retried by re-running the push or compact.
§chain.json → path-index.json ordering and the reader contract
Writing path-index.json LAST means a crash between the
chain.json PUT and the path-index.json PUT leaves the bucket
with a fresh chain alongside a stale path-index whose tip field
still names the prior chain.tip. The reader detects this with a
single tip-equality check (path_index.tip == chain.tip) and
surfaces it as
PackchainError::TransientChainPathIndexMismatch — a typed,
retry-shaped error — rather than silently returning the wrong
blob bytes or failing with the confusing
PackchainError::BlobNotInChain that the old (path-index-first)
ordering produced (issue #114).
The reverse ordering (path-index before chain.json) is rejected
because it lets a stale chain coexist with a fresh path-index
whose blob SHAs are NOT yet in any chain pack, surfacing as
BlobNotInChain — indistinguishable from genuine corruption.
§Lost-race orphan packs
Packs upload BEFORE the per-ref lock is acquired so the lock-hold
window stays bounded by chain.json + path-index PUT latency. When
two pushers race they both upload their packs pre-lock; the loser
sees stale chain after re-reading chain.json under the lock
and returns without committing, leaving its pack as an
unreferenced orphan that manage gc sweeps. The orphan-bandwidth
cost is the deliberate trade-off for keeping the lock window
short — an in-lock-upload alternative would block sibling pushers
for the full duration of a multi-GiB upload.
Re-exports§
pub use read::PackIndexCache;pub use read::read_blob;
Modules§
- gc
- Two-phase mark-and-sweep garbage collection for orphan packs (issue #66, Phase 5 of #52).
- read
- Direct file access against a packchain remote (issue #65).
Enums§
- Packchain
Error - Errors surfaced by the packchain engine.
pubbecause thecrate::protocol::push::PushError::Packchainvariant — which is public — wraps it; making thispub(crate)would leak a private type through a public API. The packchain engine itself stayspub(crate)(seepub(crate) mod pushetc.); onlygcandreadarepubfor rustdoc / direct-access reachability.