Expand description
DSFB-Debug: causality / graph attribution — root-cause stamping over the service-call graph (no_std).
§Role in the operator workflow
Pure-function service-graph walk. Given the per-(window, signal)
evaluation grid (SignalEvaluation from run_evaluation) and a
service dependency graph encoded as parent→child signal-index
pairs, for each closed episode this module returns the most-
upstream contributing signal as the candidate root cause. The
result is written into DebugEpisode.root_cause_signal_index.
Per panellist P11 (Senior SRE): “questions 2 and 3 of the eight load-bearing on-call questions are ‘which service is the originator?’ and ‘what changed?’. Without graph attribution DSFB-Debug answers neither.” This module is the answer.
§Deterministic algorithm (Theorem 9 preserved)
- For each episode, scan the per-(window, signal) grid over the
window range
[start_window, end_window]. - Find the lexicographically-earliest
(window, signal)pair whoseconfirmed_grammar_state >= BoundaryAND whose absolutesign_tuple.slewexceeds the engine’sslew_deltathreshold. This is the “first slew window”. - Among the signals contributing in the first slew window, find those whose graph-incoming edges (parent → this signal) come from outside the contributing-signal set of the episode — these are the upstream-most signals.
- Return the lowest such signal index. Tie-broken by lowest index for determinism.
§Failure modes (returns None, never silently fabricates)
- Empty graph (no edges supplied) → no attribution
- Episode has fewer than 2 contributing signals → no attribution (single-signal episodes have no upstream/downstream distinction within the episode itself)
None is the honest “I cannot attribute” answer; the engine
never invents a root cause.
Functions§
- attribute_
root_ causes - Walk the service-call graph and stamp each closed episode with its most-upstream contributing signal index, if determinable.