pub struct State {Show 24 fields
pub change_id: ChangeId,
pub tree: ContentHash,
pub parents: Vec<ChangeId>,
pub attribution: Attribution,
pub intent: Option<String>,
pub confidence: Option<f32>,
pub created_at: DateTime<Utc>,
pub verification: Option<Verification>,
pub signature: Option<StateSignature>,
pub status: Status,
pub provenance: Option<ContentHash>,
pub logical_change_id: Option<ChangeId>,
pub context: Option<ContentHash>,
pub authored_at: Option<DateTime<Utc>>,
pub risk_signals: Option<ContentHash>,
pub review_signatures: Option<ContentHash>,
pub discussions: Option<ContentHash>,
pub structured_conflicts: Option<ContentHash>,
pub committer: Option<Principal>,
pub authored_tz_offset: i32,
pub committer_tz_offset: i32,
pub raw_message: Option<Vec<u8>>,
pub git_lossy: bool,
pub extra_headers: Vec<(Vec<u8>, Vec<u8>)>,
/* private fields */
}Expand description
A state is an immutable snapshot with rich metadata.
On-disk encoding is rmp-serde’s positional struct format (a fixed-length tuple). This is sensitive to field order: inserting a field in the middle of the tuple breaks every pre-existing on-disk state. The invariant we keep going forward is:
New optional fields are added at the tail of the struct, below
status, with#[serde(default)]. Mid-struct inserts are forbidden. rmp-serde’s positional deserializer tolerates missing trailing fields when they have aDefaultimpl, so tail-only growth is forward-compatible automatically.
Required (non-optional) fields — change_id, tree, parents,
attribution, created_at, status — must never move. Optional fields
may be reordered only among themselves, and only at the tail.
Fields§
§change_id: ChangeId§tree: ContentHash§parents: Vec<ChangeId>§attribution: Attribution§intent: Option<String>§confidence: Option<f32>§created_at: DateTime<Utc>§verification: Option<Verification>§signature: Option<StateSignature>§status: Status§provenance: Option<ContentHash>§logical_change_id: Option<ChangeId>§context: Option<ContentHash>Optional context tree root for code annotations.
Authoring timestamp for this state, when distinct from
created_at.
created_at is the committer time — when the state object
came into being in its current form. authored_at is the
author time — when someone actually wrote the change — which
survives git rebase, cherry-pick, squash-merge, and git commit --amend. The ingest-backed bridge git import path fills
this from the git author time; native heddle commits leave it
None and blame falls back to created_at.
Part of the state hash (#564 de-lossy step 1). Author time
is part of a git commit’s identity: two commits that differ
only by author timestamp are distinct git objects, so folding
it into the hash keeps them from dedup-colliding to one State in
the content-addressed store. None hashes as a single absence
byte, so native commits are unaffected beyond the format bump.
risk_signals: Option<ContentHash>Content hash of the state’s RiskSignalBlob,
when present. Computed and persisted whenever risk signals fire on a
state. None for states from before W1 and for states where no
signals fired.
Hash framing: a single 0 byte when None, [1] + 32-byte hash when
Some. Legacy states without this field deserialize as None and
hash byte-identical to before W1.
review_signatures: Option<ContentHash>Content hash of the state’s ReviewSignaturesBlob,
when reviewers have signed off (read / agent-preview / agent-co-review).
discussions: Option<ContentHash>Content hash of the state’s DiscussionsBlob,
when discussions are anchored to this state.
structured_conflicts: Option<ContentHash>Content hash of the state’s StructuredConflict,
when this state captures an unresolved merge conflict as data.
committer: Option<Principal>The git committer identity, when distinct from the author
(Attribution::principal). Git records both an author (who wrote
the change) and a committer (who created this commit object); for
rebased / cherry-picked / amended commits the two differ. None
for native heddle commits and for legacy imports from before #565.
Timezone offset (seconds east of UTC) of the author timestamp
(State::authored_at / created_at fallback). Git stores the
author’s local offset (e.g. +0000, -0700); Heddle used to
discard it. 0 for native commits and legacy imports.
committer_tz_offset: i32Timezone offset (seconds east of UTC) of the committer timestamp
(created_at). 0 for native commits and legacy imports.
raw_message: Option<Vec<u8>>The verbatim git commit message body (everything after the header
block), preserved exactly so reconstruction is byte-stable. Distinct
from intent, which is the trimmed first line surfaced in the UI.
None for native commits and legacy imports.
Stored as raw bytes, NOT a String: a commit with a non-UTF8
encoding (latin-1, shift-jis, …) carries message bytes that are not
valid UTF-8 (e.g. 0xe9 for latin-1 é); a String could not
round-trip them byte-identically. (non-UTF8 author/committer identity
names are not yet byte-preserved — Principal is still String; see
#564.)
git_lossy: boolThe SINGLE canonical “this state’s content is NOT byte-faithful to the
original git object” marker (#567). Set to true by lossy import
population paths whenever an unrepresentable tree entry was dropped or
converted during import, so the rebuilt tree (hence commit) no longer
hashes to the original SHA. The git-export fidelity guard reads this one
flag to decide whether reconstruct-from-state is safe, instead of
enumerating import surfaces. false for native heddle commits and for
lossless imports.
Provenance metadata, NOT part of the content hash: a lossy import always drops/converts tree entries, so its tree — and therefore the rest of the hashed identity — already differs from a lossless import of the same source; folding the flag in would add nothing but break every existing content hash.
extra_headers: Vec<(Vec<u8>, Vec<u8>)>Every git commit header beyond the ones Heddle models natively
(tree/parents/author/committer), in their original order. ORDER IS
LOAD-BEARING for #566 byte-exactness — this is a Vec, never a map.
Empty for native commits and legacy imports.
gpgsig is just one of these headers and is kept INLINE at its
captured ordinal (not split into a separate field): when a commit’s
extension headers are in non-canonical order — e.g. x-custom, then
gpgsig, then mergetag — splitting gpgsig out would lose its
position and break byte-identical reconstruction. The serialization
source of truth for the signature is its position here (spike §3).
Both the header name and value are raw bytes (Vec<u8>), NOT
Strings: extra-header VALUES (a mergetag payload is a full tag
object; custom headers; gpgsig armor) can be non-UTF8, so a
String would force a lossy to_string() that destroys those bytes.
Names are ASCII by git’s spec but are bytes too so the whole tuple is
byte-exact and no conversion sneaks in.
Implementations§
Source§impl State
impl State
pub fn new( tree: ContentHash, parents: Vec<ChangeId>, attribution: Attribution, ) -> Self
pub fn new_snapshot( tree: ContentHash, parents: Vec<ChangeId>, attribution: Attribution, ) -> Self
pub fn new_merge( tree: ContentHash, parents: Vec<ChangeId>, attribution: Attribution, ) -> Self
pub fn new_refresh_of( tree: ContentHash, parents: Vec<ChangeId>, attribution: Attribution, logical_change_id: ChangeId, ) -> Self
pub fn new_fork_of( tree: ContentHash, parents: Vec<ChangeId>, attribution: Attribution, ) -> Self
pub fn new_collapse_of( tree: ContentHash, parents: Vec<ChangeId>, attribution: Attribution, ) -> Self
pub fn with_intent(self, intent: impl Into<String>) -> Self
pub fn with_confidence(self, confidence: f32) -> Self
pub fn with_verification(self, verification: Verification) -> Self
pub fn with_signature(self, signature: StateSignature) -> Self
pub fn with_provenance(self, provenance: ContentHash) -> Self
Sourcepub fn with_context(self, context: ContentHash) -> Self
pub fn with_context(self, context: ContentHash) -> Self
Set the context tree root.
Sourcepub fn with_risk_signals(self, risk_signals: ContentHash) -> Self
pub fn with_risk_signals(self, risk_signals: ContentHash) -> Self
Attach a RiskSignalBlob hash.
Render-time tick budgeting (selecting which signals to surface) is a
view over this stored data, not part of storage itself.
Not part of the state hash. Risk signals are derived data computed
about a state from the diff against its parent; including them in
identity would make the same logical state hash differently depending
on which signals fired. That breaks every “is this the same state?”
check in the system. See authored_at for the same pattern.
Sourcepub fn with_review_signatures(self, review_signatures: ContentHash) -> Self
pub fn with_review_signatures(self, review_signatures: ContentHash) -> Self
Attach a ReviewSignaturesBlob
hash. The state’s authoring StateSignature is unaffected; review
signatures live alongside it and accumulate over time.
Not part of the state hash. Review signatures accumulate
post-capture; including them in identity would mean every signature
re-keys the state. See authored_at for the same pattern.
Sourcepub fn with_discussions(self, discussions: ContentHash) -> Self
pub fn with_discussions(self, discussions: ContentHash) -> Self
Attach a DiscussionsBlob hash.
Not part of the state hash. Discussions evolve independently of
the state they’re anchored to — appending a turn must not change the
state’s identity. See authored_at for the same pattern.
Sourcepub fn with_structured_conflicts(
self,
structured_conflicts: ContentHash,
) -> Self
pub fn with_structured_conflicts( self, structured_conflicts: ContentHash, ) -> Self
Attach a StructuredConflict hash.
Not part of the state hash. Conflict objects describe the merge’s
disagreement; the state’s tree and parents already encode what’s being
merged. See authored_at for the same pattern.
Record the authoring timestamp separately from created_at.
Used by the git-ingest importer to preserve the distinction
between “when the change was originally written” (authored)
and “when this commit object came into being” (committer time,
stored in created_at so re-imports stay deterministic).
Native heddle commits leave this None; blame display then
falls back to created_at.
Part of the state hash (#564 de-lossy step 1) — see the
authored_at field docs and update_hash.
Sourcepub fn with_committer(self, committer: Principal) -> Self
pub fn with_committer(self, committer: Principal) -> Self
Record the git committer identity (distinct from the author).
Part of the state hash — see the committer field docs and
update_hash. #564 de-lossy step 1.
Sourcepub fn with_tz_offsets(self, authored: i32, committer: i32) -> Self
pub fn with_tz_offsets(self, authored: i32, committer: i32) -> Self
Record the author/committer timezone offsets (seconds east of UTC). Part of the state hash. #564 de-lossy step 1.
Sourcepub fn with_raw_message(self, raw_message: impl AsRef<[u8]>) -> Self
pub fn with_raw_message(self, raw_message: impl AsRef<[u8]>) -> Self
Record the verbatim git commit message body, as raw bytes (so a
non-UTF8 message round-trips byte-identically; see the raw_message
field docs). Part of the state hash. #564 de-lossy step 1.
Sourcepub fn with_git_lossy(self, git_lossy: bool) -> Self
pub fn with_git_lossy(self, git_lossy: bool) -> Self
Mark this state’s content as NOT byte-faithful to the original git
object — set by the --lossy import/ingest paths when a tree entry was
dropped or converted. The git-export fidelity guard reads this single
signal to skip reconstruct-from-state (#567). Not part of the content
hash (see the git_lossy field docs).
Sourcepub fn with_extra_headers(self, extra_headers: Vec<(Vec<u8>, Vec<u8>)>) -> Self
pub fn with_extra_headers(self, extra_headers: Vec<(Vec<u8>, Vec<u8>)>) -> Self
Record the ordered remaining git commit headers as raw bytes. ORDER IS LOAD-BEARING (#566). Part of the state hash. #564 de-lossy step 1.
pub fn with_status(self, status: Status) -> Self
pub fn with_change_id(self, change_id: ChangeId) -> Self
pub fn with_logical_change_id(self, logical_change_id: ChangeId) -> Self
pub fn logical_change_id(&self) -> ChangeId
pub fn with_timestamp(self, timestamp: DateTime<Utc>) -> Self
pub fn compute_hash(&self) -> ContentHash
Sourcepub fn compute_hash_pre_fidelity(&self) -> ContentHash
pub fn compute_hash_pre_fidelity(&self) -> ContentHash
The pre-#565 content hash: the hash a state had BEFORE the git-fidelity
fields were folded into identity (the format bump in #565). It omits the
trailing fidelity block from both the hashed bytes AND the content-length
prefix, exactly as the old code did — so for a state signed before the
bump, this reproduces the hash its StateSignature was actually made
over.
The #570 fidelity backfill verifies an existing signature against this
(in addition to the current compute_hash) before re-signing: a legacy
signature was made over THIS hash, not the post-bump one, so checking
only the new hash would wrongly reject a valid legacy signature as
unreproducible. #565 only appended the fidelity block to hash_len /
update_hash, so stopping before it is a faithful pre-bump hash.