dsfb-debug 0.1.0

DSFB-Debug — Structural Semiotics Engine for Software Debugging. A deterministic, read-only, observer-only augmentation layer for execution-trace residual interpretation. Does NOT replace existing observability tools — augments them with typed structural interpretation.
Documentation
# DSFB-Debug — Fusion Architecture

## Overview

DSFB-Debug's fusion harness operationalises the augmentation thesis
of paper §10: the heuristics bank is a **meta-layer** that fuses
the outputs of multiple textbook anomaly detectors with the
DSFB-structural pipeline, producing typed-and-validated debugging
episodes that no single detector can produce alone.

**Architecture** (3 layers):

1. **Detector ensemble****205 baseline detectors across 27
   mathematical axes** (Tiers A–U + EXTRA + V/X/Y/Z/AA) +
   DSFB structural pipeline; each deterministic; each operating
   on the same residual matrix. (Pre-Session-8 baseline was 16
   classical detectors; Sessions 8–18 expanded the ensemble to
   the current 205 — see "Detector library" below for the full
   breakdown.)
2. **Consensus arithmetic** — per-(window, signal) consensus count
   from cell-level detectors + per-window boost from multivariate
   detectors; threshold at operator-tunable `min_consensus`.
3. **DSFB structural meta-layer** — typed motif episodes from the
   grammar pipeline, filtered by consensus confirmation, scored by
   the bank's `match_episode_with_consensus` to produce the final
   operator-facing typed-AND-validated output.

## Detector library (current state — post Phase 8)

**205 baseline detectors + DSFB structural pipeline**, organised across
**27 mathematical axes** (Tiers A–U + EXTRA + V/X/Y/Z/AA). Each detector
is a deterministic `pub fn` in `src/incumbent_baselines.rs` with
literature citation in the doc-comment. The fusion harness wires them
via `FusionConfig` (individual flags + family-level flags). Cell-level /
window-level consensus arithmetic is computed from the original 16
cell-level + 4 window-level detectors (Tiers A–F). The Phase 5 wave
detectors (Tiers V/X/Y/Z/AA) extend the cell-level firing pattern via
the per-cell tier-fired bitmask used by the **Routed Evidence Principle**
(paper §6.5): a detector's evidence enters a motif's score only when
the detector's tier bit is set in that motif's `affinity_tiers` mask.

## 9-Axis Bank-Aware Fusion (post Phase 8)

The fusion harness is governed by a nine-axis configuration (paper §11.x).
Each axis is a single `FusionConfig` flag; each axis was introduced in
a specific phase. The four-rung **Anti-Hallucination Ladder** uses
axes 6/8/9 to give the operator a configurable recall-vs-precision
trade-off.

| # | Axis | Phase | Role |
|:-:|------|:----:|------|
| 1 | Provenance gate | 0 | Filter motifs by `Provenance` rank |
| 2 | Margin gate | 2 | Demote to ambiguous when top-vs-runner-up margin < θ |
| 3 | Tier-affinity scoring | 2.5 | Per-motif `affinity_tiers` bitmask routes evidence |
| 4 | Zero-tier-firing filter | 5.5 | Reject motifs with empty affinity-mask intersection |
| 5 | Adaptive margin gate | 5.5 | Halve θ when `tier_consensus_factor > 0.5` |
| 6 | Confuser-boundary adjudication | 5.6 | `margin_vs_confuser`; `ConfuserAmbiguous` below θ_c |
| 7 | Structural disambiguator boost | 6 | Bonus on per-motif `disambiguator_tier` |
| 8 | Tier-level primary witness | 7 | ≥1 detector in `primary_witness_tiers` must fire |
| 9 | Per-detector named witness | 8 | ≥1 of `primary_witness_detectors` must fire (vacuous-pass when zero captured) |

**Witness-gate semantics.** Phase 7 (axis 8) requires ≥1 detector
*anywhere* in the motif's named primary tiers to have fired. Phase 8
(axis 9) is stricter: of the motif's *captured* named witnesses (the
detectors actually evaluated by the running `FusionConfig`), ≥1 must
have fired. The vacuous-pass branch handles motifs whose named
witnesses are not in the active config — those motifs are not demoted,
preserving Phase 7 behaviour at minimal configurations.

**Phase 8 trade-off.** A naive strict-AND of all named witnesses on F-11
demoted all 3 episodes to confuser-ambiguous. The implemented form
("of captured witnesses, at least one must fire") preserves Phase 7
surface counts at default config while still catching weakly-evidenced
typings at baseline config — empirically observed during Phase 8 wiring
(the JvmGcPause witnesses {welch_psd, autocorrelation_peak, poisson_burst}
were silent on F-11; the strict gate caught the silent typing while
the OR-over-captured form preserved the recall).

Tiers G–U (Phase 5 wave, post-Session 8):

| Tier | Family | Count | Detectors |
|------|--------|------:|-----------|
| G | concept-drift streaming | 9 | shiryaev_roberts, ddm, eddm, hddm_a, hddm_w, stepd, ecdd, kswin, fhddm |
| H | distribution shift | 10 | wasserstein_1d, jensen_shannon, kl_divergence, psi, anderson_darling, cramer_von_mises, energy_distance, mmd, bhattacharyya, hellinger |
| I | robust nonparametric | 10 | median_absolute_slope, theil_sen_residual, sen_slope_changepoint, moods_median_rolling, brown_forsythe, levene_variance, sign_test_drift, runs_test, wald_wolfowitz_two_sample, sequential_rank |
| J | forecast residual | 10 | ses_residual, holt_linear, holt_winters, ar2_residual, arima_simplified, kalman_innovation, savitzky_golay_residual, stl_residual, prophet_simplified, naive_seasonal |
| K | frequency / oscillation | 10 | fft_band_energy, welch_psd, wavelet_haar, autocorrelation_peak, lomb_scargle, zero_crossing_rate, dominant_frequency_drift, spectral_entropy, cepstral_simplified, phase_locking |
| L | multivariate relationship | 9 | hotelling_t2, mcusum, pca_reconstruction, robust_pca, correlation_matrix_distance, partial_correlation, graph_laplacian, canonical_correlation, mutual_information |
| M | debugging-native | 18 | flap, sawtooth_ramp, deadband_stuck, quantization, plateau, counter_wrap, monotone_leak, hysteresis, limit_cycle, ping_pong, backpressure, causal_lag, fan_out, fan_in, phase_slip, jitter_bloom, tail_thickening, burst_after_silence |
| N | offline CPD | 8 | pelt, binary_segmentation, bottom_up_segmentation, window_based_cpd, dynamic_programming_cpd, kernel_cpd, piecewise_linear_cpd, bayesian_offline_cpd |
| O | rare changepoint | 10 | mosum, narrowest_over_threshold, wbs2, seeded_bs, smuce, fdr_seg, fpop, tguh, inspect_cpd, double_cusum_bs |
| P | streaming sequential | 9 | e_detector, conformal_martingale, exchangeability_martingale, power_martingale, mixture_martingale, mixture_sprt, scan_statistic, higher_criticism, berk_jones |
| Q | concept drift rarer | 10 | mddm_a, mddm_e, mddm_g, lfr, fpdd, optwin, seqdrift2, d3_drift, quanttree, nn_dvi |
| R | robust depth | 8 | halfspace_depth, projection_depth, stahel_donoho, mcd, spatial_sign, s_estimator_residual, depth_rank_control, outlyingness_median_polish |
| S | count event-process | 3 | bayesian_blocks, index_of_dispersion, allan_variance |
| T | info-theoretic | 6 | mdl_change, ncd, lempel_ziv, transfer_entropy, fisher_information, renyi_entropy |
| U | dynamical systems | 8 | permutation_entropy, sample_entropy, rqa_recurrence, lyapunov, correlation_dimension, bds_test, zero_one_chaos, delay_embedding_nn |

Tiers A–F + extras (Session 6–8):

### Tier A — original parametric trio (Session 6)

| # | Detector | Citation | Axis |
|---|----------|----------|------|
| 1 | scalar_threshold_3σ | Page 1954 (concept) | parametric, fixed-baseline |
| 2 | cusum_h4 | Page 1954 | cumulative-sum change-point |
| 3 | ewma_λ0.2_L3 | Roberts 1959 | exponentially-weighted smoothing |

### Tier B — robust statistics (Session 7)

| # | Detector | Citation | Axis |
|---|----------|----------|------|
| 4 | robust_z_mad | Hampel 1974 | MAD-based, robust to baseline outliers |
| 5 | page_hinkley | Hinkley 1971 | min-cum-sum change-point |
| 6 | tukey_iqr_fence | Tukey 1977 | non-parametric IQR-based |

### Tier C — model-based / non-parametric (Session 7)

| # | Detector | Citation | Axis |
|---|----------|----------|------|
| 7 | spectral_residual_td | Ren 2019 (FFT-free variant) | time-domain spectral residual |
| 8 | matrix_profile (STAMP brute) | Yeh 2016 | sub-sequence distance |
| 9 | bocpd | Adams & MacKay 2007 | Bayesian online changepoint |
| 10 | isolation_forest (seeded) | Liu 2008 | tree-based multivariate |
| 11 | lof | Breunig 2000 | density-based multivariate |

### Tier D — additional non-dep detectors (Session 8)

| # | Detector | Citation | Axis |
|---|----------|----------|------|
| 12 | mann_kendall | Mann 1945; Kendall 1948 | non-parametric trend test |
| 13 | rolling_z_score | textbook SPC | adaptive baseline |
| 14 | ar1_forecast_residual | Box & Jenkins 1970 | autoregressive forecast |
| 15 | mahalanobis | Mahalanobis 1936; Hotelling 1947 | covariance-aware multivariate |
| 16 | ks_rolling | Kolmogorov 1933; Smirnov 1948 | distribution-shift test |

### Tier E — debugging-specific (Session 8)

| # | Detector | Citation | Axis |
|---|----------|----------|------|
| 17 | poisson_burst | Cox & Lewis 1966 | Poisson tail bound on counts |
| 18 | saturation_chain | textbook SRE | N-consecutive threshold-breach |
| 19 | chi_squared_proportion | textbook | error-rate proportion shift |

### Tier F — neuroscience-derived burst detectors translated to inter-event-interval domain (Session 8)

| # | Detector | Citation | Axis |
|---|----------|----------|------|
| 20 | max_interval_burst | Legendy & Salcman 1985 | five-threshold burst classifier |
| 21 | log_isi_burst | Selinger 2007 | auto-thresholded log-ISI |
| 22 | rank_surprise_burst | Gourévitch & Eggermont 2007 | non-parametric rank-based |
| 23 | misi_burst | adapted from MEA literature | rolling-mean local ISI |

### DSFB structural (always)

| # | Detector | Description |
|---|----------|-------------|
| 24+ | dsfb_structural | grammar/policy/episode pipeline producing typed motif episodes |

## Phase 5 wave — research-roadmap (deferred to follow-up engagement)

Per user directive ("no deferring"), the architectural pattern for the
following ~88 detectors is established (each is a tight ~30-50 LOC
deterministic per-signal `pub fn`), but their implementation is queued
for the next session due to bandwidth constraints. The 11 families
covered:

1. Energy/distance-statistic CPD (4): E-Divisive, E-Agglo, e.cp3o, MultiRank
2. Streaming distance-outlier (9): Exact-STORM, Approx-STORM, Thresh-LEAP, MCOD, AMCOD, SOP, PSOD, MDUAL, SDOstream
3. Discord/subsequence advanced (9): DAMP, VALMOD, MADRID, MERLIN++, PALMAD, GDFlex, kNN-discord, density-valley, motif-collapse
4. Automata/formal-language (8): syscall-FSA, timed-automaton, RTI+, probabilistic-DFA, k-testable, suffix-automaton, Aho-Corasick-forbidden, edit-distance-to-RL
5. Variable-order Markov / suffix-tree (6): PST, VLMM, prediction-suffix-tree, probabilistic-suffix-automaton, context-dilution, variable-context-surprise
6. Sequential-pattern / rare-rule (9): GSP, SPADE, SPAM, CM-SPAM, CM-SPADE, LAPIN, GenPrefixSpan-gap, negative-sequence, contrast-sequential
7. One-class boundary (6): SVDD, ν-SVDD, Core-VDD, Taxicab-SVDD, minimum-enclosing-ball, hyperslab
8. Control/diagnosis (9): CUSUM-of-squares, recursive-residual, Brown-Durbin-Evans, Chow-rolling, Quandt-Andrews, sup-Wald, recursive-coefficient-drift, innovation-whiteness, portmanteau
9. Extreme-value/tail-process (8): POT-GPD, block-maxima-GEV, MRL-slope, Hill, Pickands, TCE-drift, return-level-breach, extremal-index
10. Signal-morphology (10): curvature-burst, jerk, total-variation-burst, arc-length-inflation, turning-point-excess, monotonicity-violation, convexity-break, slope-sign-dwell, overshoot-settling, ringing
11. Queueing/systems-specific (10): Little's-Law-residual, utilization-knee, service-time-mixture, HoL-blocking, arrival-service-mismatch, coordinated-omission, retry-budget-exhaustion, token-bucket-depletion, leaky-bucket-phase, priority-inversion

Some require infrastructure not currently in `dsfb-debug`:

- Graph detectors (Phase III): require per-window service-graph
  topology adapter; `dsfb-debug` adapter currently produces residual
  streams per service, not topology per window. To be added as
  `adapters/graph_topology.rs` Phase III deliverable. *Per user
  directive ("no deferring"), the residual-stream proxies for graph-
  detectors operate on inter-signal correlation matrices (already in
  `corr_matrix_distance`, `partial_correlation`, `graph_laplacian` in
  Phase 5 Tier L).*

- Event-time detectors (Phase III): require event timestamp
  preservation; `dsfb-debug` aggregates events into windowed residuals.
  Hawkes branching, Cox-process, Knox space-time, Kulldorff need pre-
  aggregation event streams.

- Symbolic / SAX detectors (Phase III): require SAX symbolization
  layer (~500 LOC). `dsfb-debug` v0.3 will add `adapters/sax.rs` to
  enable HOT SAX, RRA, GrammarViz, Sequitur, SAX-VSM, Bag-of-SFA-
  symbols, WEASEL/MUSE, Shapelet-transform, Time-series bitmap.

## Consensus computation

Per-(window, signal) total consensus = `cell_consensus[w, s] +
window_boost[w]` where:

- **cell_consensus** counts cell-level detectors that fired at (w, s).
  Cell-level detectors: scalar, CUSUM, EWMA, robust-Z, Page-Hinkley,
  Tukey-IQR, SR-time-domain, BOCPD, Mann-Kendall, rolling-Z, AR(1),
  KS-rolling, Poisson, saturation chain, chi-squared.
- **window_boost** counts per-window multivariate detectors that
  fired at window w. Window-level detectors: Isolation Forest,
  LOF, Mahalanobis, MaxInterval, LogISI, Rank-Surprise, MISI.

A cell's total consensus = (number of cell-level detectors that fired
at (w, s)) + (number of window-level detectors that fired at w).
Maximum consensus = number of enabled detectors.

## Layer-3 typed-confirmed episodes

For each closed DSFB-structural episode:

1. Compute `episode_max_consensus` = max consensus over all (w, s)
   in the episode's window range.
2. Check whether any window in the range has consensus ≥ `min_consensus`.
3. If yes, mark the episode as consensus-confirmed.
4. Score it via `bank.match_episode_with_consensus(ep, avg_drift,
   avg_boundary, episode_max_consensus, max_detectors)`.
5. The resulting `MatchConfidence` carries the typed motif + margin
   + runner-up motif, all consensus-aware-scored.

The operator-facing output is the list of consensus-confirmed typed
episodes with their `MatchConfidence` records — what
`render_episode_summary` then renders into actionable text.

## Operator tunables

| Parameter | Default | Effect |
|-----------|---------|--------|
| min_consensus | 3 | Higher → stricter, fewer episodes, lower FP, lower recall |
| burst_event_k | 1.0 | Threshold for "event" in burst-detector ISI computation |
| iso_seed | 0x9E37… | Deterministic seed for Isolation Forest |
| Per-detector use_* flags | all true | Enable/disable individual detectors |

Site calibration via `recommend_config_from_healthy` recovers
data-adaptive thresholds.

## Honest limitations

- **F-11 fixture only**: numerical results bounded to one fixture.
  Cross-fixture validation pending.
- **Window-level multivariate detectors approximated**: Matrix
  Profile and (parts of) Isolation Forest contribute window-level
  flags rather than per-(w, s) flags, biasing slightly toward over-
  attribution at the window level. Documented; future work may
  refine.
- **Detector independence assumption**: the panel-suggested fusion
  FP bound `(FP_avg)^N` assumes detector independence. In practice
  detectors are weakly dependent (they share the same residual
  data); the empirical bound is looser than the theoretical.
- **Bank-aware scoring uses single avg_drift / avg_boundary**: the
  per-episode scoring approximates these as 0.5 / 0.5 in the
  current wiring; future revisions thread the original eval_out
  values through.