Skip to main content

parse_commit_extension_headers

Function parse_commit_extension_headers 

Source
pub fn parse_commit_extension_headers(
    commit_content: &[u8],
) -> Vec<(Vec<u8>, Vec<u8>)>
Expand description

Parse the extension headers from a raw git commit object’s content bytes (the bytes git cat-file commit <sha> prints — i.e. gix’s Commit::data), in their exact on-the-wire order, ready to store in State::extra_headers.

A commit’s header block runs from the start of the content up to the first blank line (the header/body separator). Its leading headers are always, in fixed order, tree, zero-or-more parent, author, committer; Heddle models those natively. Every header after committer is an extension header (encoding, gpgsig, mergetag, or any unknown/future name) and is returned here as a (name, value) byte pair at its real position.

This is the single source of truth for extension-header order and bytes. Both git import paths (the CLI bridge and the ingest walker) build extra_headers from it. The alternative — stitching the vec back together from a decoder’s typed accessors (gix surfaces encoding, and historically gpgsig, as fields outside its extra_headers) — silently reorders the headers git happens to model as typed fields, which breaks #566 byte-exact reconstruction. So we never consult those typed accessors for position; the raw header block is authoritative. (#564 de-lossy step 1 — close-the-class.)

Folded continuation lines (a value line beginning with a single space 0x20, used by gpgsig/mergetag) are unfolded: each continuation contributes a \n plus the line with exactly one leading space stripped, so the stored value holds the value’s real internal newlines with no trailing newline. The serializer (#566) re-folds by mapping every \n back to \n (spike §2). A “blank” line inside an armored value is \n on the wire (one space), so it unfolds to an empty segment — never confused with the header/body separator, which is a truly empty line.