Skip to main content

Module causality

Module causality 

Source
Expand description

DSFB-Debug: causality / graph attribution — root-cause stamping over the service-call graph (no_std).

§Role in the operator workflow

Pure-function service-graph walk. Given the per-(window, signal) evaluation grid (SignalEvaluation from run_evaluation) and a service dependency graph encoded as parent→child signal-index pairs, for each closed episode this module returns the most- upstream contributing signal as the candidate root cause. The result is written into DebugEpisode.root_cause_signal_index.

Per panellist P11 (Senior SRE): “questions 2 and 3 of the eight load-bearing on-call questions are ‘which service is the originator?’ and ‘what changed?’. Without graph attribution DSFB-Debug answers neither.” This module is the answer.

§Deterministic algorithm (Theorem 9 preserved)

  1. For each episode, scan the per-(window, signal) grid over the window range [start_window, end_window].
  2. Find the lexicographically-earliest (window, signal) pair whose confirmed_grammar_state >= Boundary AND whose absolute sign_tuple.slew exceeds the engine’s slew_delta threshold. This is the “first slew window”.
  3. Among the signals contributing in the first slew window, find those whose graph-incoming edges (parent → this signal) come from outside the contributing-signal set of the episode — these are the upstream-most signals.
  4. Return the lowest such signal index. Tie-broken by lowest index for determinism.

§Failure modes (returns None, never silently fabricates)

  • Empty graph (no edges supplied) → no attribution
  • Episode has fewer than 2 contributing signals → no attribution (single-signal episodes have no upstream/downstream distinction within the episode itself)

None is the honest “I cannot attribute” answer; the engine never invents a root cause.

Functions§

attribute_root_causes
Walk the service-call graph and stamp each closed episode with its most-upstream contributing signal index, if determinable.