pub fn parse_commit_extension_headers(
commit_content: &[u8],
) -> Vec<(Vec<u8>, Vec<u8>)>Expand description
Parse the extension headers from a raw git commit object’s content bytes
(the bytes git cat-file commit <sha> prints — i.e. gix’s Commit::data),
in their exact on-the-wire order, ready to store in State::extra_headers.
A commit’s header block runs from the start of the content up to the first
blank line (the header/body separator). Its leading headers are always, in
fixed order, tree, zero-or-more parent, author, committer; Heddle
models those natively. Every header after committer is an extension
header (encoding, gpgsig, mergetag, or any unknown/future name) and is
returned here as a (name, value) byte pair at its real position.
This is the single source of truth for extension-header order and bytes.
Both git import paths (the CLI bridge and the ingest walker) build
extra_headers from it. The alternative — stitching the vec back together
from a decoder’s typed accessors (gix surfaces encoding, and historically
gpgsig, as fields outside its extra_headers) — silently reorders the
headers git happens to model as typed fields, which breaks #566 byte-exact
reconstruction. So we never consult those typed accessors for position; the
raw header block is authoritative. (#564 de-lossy step 1 — close-the-class.)
Folded continuation lines (a value line beginning with a single space
0x20, used by gpgsig/mergetag) are unfolded: each continuation
contributes a \n plus the line with exactly one leading space stripped, so
the stored value holds the value’s real internal newlines with no trailing
newline. The serializer (#566) re-folds by mapping every \n back to \n
(spike §2). A “blank” line inside an armored value is \n on the wire (one
space), so it unfolds to an empty segment — never confused with the
header/body separator, which is a truly empty line.