Skip to main content

Module packchain

Module packchain 

Source
Expand description

Incremental pack-chain storage engine (issue #52).

Push (#63) writes incremental packs keyed by content SHA, a newest-first [schema::ChainManifest], a nested [schema::PathIndex] of repo paths to blob SHAs, and a baseline bundle on the first / force push so a fresh clone short-circuits through bundle-uri. Fetch (#64), direct file access (#65, read_blob library API), compaction (#67), and GC (#66) are all implemented in sibling modules. Push artefacts on the bucket:

<prefix>/FORMAT                                "packchain"
<prefix>/HEAD                                  "refs/heads/main"
<prefix>/refs/heads/<branch>/LOCK#.lock        held during write, released after
<prefix>/refs/heads/<branch>/chain.json        newest-first manifest (THE commit point)
<prefix>/refs/heads/<branch>/path-index.json   nested tree → blob SHA map
<prefix>/refs/heads/<branch>/<tip>.bundle      baseline (first / force push only)
<prefix>/packs/<content-sha>.pack              incremental pack
<prefix>/packs/<content-sha>.idx               pack index

Once the push lands, fetch resolves shallow / full clones via sequential pack install (fetch.rs), read_blob reads single blobs via the path-index without rehydrating the chain (read.rs), and the manage compact / manage gc subcommands reap orphans and collapse the chain (compact.rs, gc.rs).

§Linearization point

chain.json is the commit point: pack/idx/baseline upload pre-lock, then under the per-ref lock the push writes FORMAT → HEAD → chain.json → path-index.json. Anything that crashed before the chain.json PUT leaves orphan keys (pack/idx/baseline at content-SHA or tip-SHA names) which manage gc reaps. Anything written after chain.json (path-index.json overwrite, force-push baseline cleanup) is post-commit and may be retried by re-running the push or compact.

§chain.json → path-index.json ordering and the reader contract

Writing path-index.json LAST means a crash between the chain.json PUT and the path-index.json PUT leaves the bucket with a fresh chain alongside a stale path-index whose tip field still names the prior chain.tip. The reader detects this with a single tip-equality check (path_index.tip == chain.tip) and surfaces it as PackchainError::TransientChainPathIndexMismatch — a typed, retry-shaped error — rather than silently returning the wrong blob bytes or failing with the confusing PackchainError::BlobNotInChain that the old (path-index-first) ordering produced (issue #114).

The reverse ordering (path-index before chain.json) is rejected because it lets a stale chain coexist with a fresh path-index whose blob SHAs are NOT yet in any chain pack, surfacing as BlobNotInChain — indistinguishable from genuine corruption.

§Lost-race orphan packs

Packs upload BEFORE the per-ref lock is acquired so the lock-hold window stays bounded by chain.json + path-index PUT latency. When two pushers race they both upload their packs pre-lock; the loser sees stale chain after re-reading chain.json under the lock and returns without committing, leaving its pack as an unreferenced orphan that manage gc sweeps. The orphan-bandwidth cost is the deliberate trade-off for keeping the lock window short — an in-lock-upload alternative would block sibling pushers for the full duration of a multi-GiB upload.

Re-exports§

pub use read::PackIndexCache;
pub use read::read_blob;

Modules§

gc
Two-phase mark-and-sweep garbage collection for orphan packs (issue #66, Phase 5 of #52).
read
Direct file access against a packchain remote (issue #65).

Enums§

PackchainError
Errors surfaced by the packchain engine. pub because the crate::protocol::push::PushError::Packchain variant — which is public — wraps it; making this pub(crate) would leak a private type through a public API. The packchain engine itself stays pub(crate) (see pub(crate) mod push etc.); only gc and read are pub for rustdoc / direct-access reachability.