dsfb-debug
DSFB-Debug — Structural Semiotics Engine for Software Debugging
A deterministic, read-only, observer-only augmentation layer that turns the residuals every observability stack already discards into typed, human-readable debugging episodes.
Invariant Forge LLC — Riaan de Beer — ORCID: 0009-0006-1155-027X
Citation
If you use DSFB-Debug, cite the Zenodo software record:
de Beer, R. (2026). DSFB-Debug Structural Detector-Field Residual Semiotics Engine for Software Debugging: A Deterministic Augmentation Layer for Typed Residual Interpretation of Execution Traces, Log Streams, and Observability Telemetry in Production Software Systems (v1.0). Zenodo. https://doi.org/10.5281/zenodo.20088863
Zenodo record: https://zenodo.org/records/20088863
Paper Version 1.0 and internal Phase labels are research-publication identifiers; crate semver remains 0.1.0 until crates.io release stabilization.
What It Does
- Reads residuals from existing observability tools (OpenTelemetry, Jaeger, Datadog, ELK, Prometheus, Sentry) and from code-defect catalogs (Defects4J, BugsInPy, PROMISE).
- Computes typed structural interpretation: drift, slew, grammar state, motif, episode.
- Performs Trace Event Collapse: raw alerts → few typed episodes (RSCR ranges from 3.67× on the F-11 production-trace fixture to 62× on the Defects4J bug-catalog fixture; verbatim test stdout, bounded to those fixtures).
- Fuses 205 deterministic detectors organised across 27 mathematical axes (Tiers A–U + EXTRA + V/X/Y/Z/AA) into a consensus-validated typed output via the heuristics bank.
- Operates the 9-axis bank-aware fusion with a 4-rung Anti-Hallucination Ladder — Phase 0 (raw) / Phase 5.6 (confuser- boundary) / Phase 7 (tier-witness) / Phase 8 (per-detector witness) — each rung a single config flag, each rung tested deterministically.
- Produces deterministic, reproducible, auditable outputs (Theorem 9 — empirically verified on every real-bytes fixture).
What It Does NOT Do
- Does NOT replace any existing observability tool.
- Does NOT claim faster detection or higher accuracy than existing methods. The framing is augmentation, not competition.
- Does NOT modify upstream data (compile-time enforced via
&[T]-only public surface). - The no_std operational core is zero-runtime-dependency,
#![forbid(unsafe_code)], and avoids unwrap/panic paths in public core evaluation APIs. std/demo/test/audit paths are separated from the no_std core.
Crate Properties
| Property | Enforcement |
|---|---|
#![no_std] |
No standard library dependency |
#![forbid(unsafe_code)] |
Zero unsafe blocks |
#![deny(clippy::unwrap_used)] |
Core APIs avoid unwrap paths; std/demo/test/audit paths are separated |
| Zero runtime deps | Hand-rolled SHA-256, DFT, matrix algebra |
| Observer-only | All APIs accept &[T] only |
| Deterministic | Theorem 9: identical inputs → identical outputs |
Mathematical Summary
The engine is a deterministic function from a residual matrix to a
list of typed episodes. The core construction at each (window w, signal s) cell:
| Concept | Definition | Source |
|---|---|---|
| Residual signature | σ(k) = (‖r(k)‖, ṙ(k), r̈(k)) |
per-(w,s) magnitude, slew, curvature |
| Admissibility envelope | ‖r(k)‖ ≤ ρ |
violation triggers grammar state Boundary or Violation |
| Drift persistence | Σ_{w' in [w-D, w]} I(‖r(k')‖ > μ) over rolling D windows |
drift dwell length |
| Slew density | mean ` | ṙ(k) |
| Grammar state | {Admissible, Boundary, Violation} from σ + thresholds |
per-(w,s) symbol |
| Motif | typed pattern from HeuristicsBank::lookup(reason_code, drift, slew) |
per-(w,s) typing |
| Episode | contiguous run of non-Admissible windows aggregated | per-signal collapse |
Fusion consensus at each (w, s) cell:
consensus(w, s) = Σ_{cell-level detectors d} I(d fires at (w, s))
+ Σ_{window-level detectors d} I(d fires at w)
fire(w, s) = consensus(w, s) ≥ min_consensus
A DSFB structural episode is consensus-confirmed when at least one
of its windows fires. The bank's match_episode_with_consensus then
scores the typed motif with consensus boost:
score(motif, ep) = w_drift · drift_score(ep)
+ w_slew · slew_score(ep)
+ w_bound · boundary_score(ep)
+ w_corr · correlation_score(ep)
+ w_dur · duration_score(ep)
+ α · (consensus_max(ep) / max_detectors)
The bank's per-motif weights w_* and consensus-factor α are documented
inline at src/heuristics_bank.rs.
Theorem 9 (Deterministic Replay). For any byte-stable residual matrix
input, two consecutive engine.run_evaluation(...) calls produce
byte-identical episode output. Mechanically proven by composition of
deterministic stages; empirically verified on F-11 (35 604 Jaeger spans).
Code Tour
use DsfbDebugEngine;
// Engine creation under paper-lock parameters.
let engine = paper_lock.unwrap;
// Per-signal evaluation (single signal).
let eval = engine.evaluate_signal;
// Full residual-matrix evaluation.
let mut evals = vec!;
let mut episodes = vec!;
let = engine.run_evaluation.unwrap;
// Causality (graph attribution): augments episodes with root_cause_signal_index.
let = engine.run_evaluation_with_graph.unwrap;
Multi-detector fusion
use ;
let cfg = FusionConfig ;
let metrics = run_fusion_evaluation.unwrap;
Operator-side helpers
use ;
use recommend_config_from_healthy;
// Render an episode for an operator dashboard.
let summary = render_episode_summary;
// Episode catalog: similarity matching against past closed episodes.
let mut catalog = new;
catalog.record;
let nearest = catalog.find_similar;
// Site calibration on a healthy slice.
let report = recommend_config_from_healthy;
Heuristics Bank Coverage
The bank ships 32 hand-curated motifs across six tiers, each with
provenance (FrameworkDesign / DatasetObserved / FieldValidated),
the upstream dataset name + DOI it was observed in, a dashboard hint
for production debug engineers, and an IEEE 24765 / Avizienis-Laprie-
Randell taxonomy anchor — no ad-hoc names.
| Tier | Source | Motifs | Provenance |
|---|---|---|---|
| 1 | Original framework | 10 | FrameworkDesign |
| 2 | TADBench fault cases | 5 | DatasetObserved |
| 3 | AIOps Challenge categories | 6 | DatasetObserved |
| 4 | MultiDim-Localization patterns | 3 | DatasetObserved |
| 5 | DeepTraLog log+trace fusion | 3 | DatasetObserved |
| 6 | Cross-cutting structural | 5 | FrameworkDesign |
| Total | 32 |
MAX_MOTIFS = 64 leaves 32 free slots for v0.3 / v0.4 expansion. The
bank exposes two lookup paths: per-signal lookup and per-episode
match_episode_with_consensus (multi-feature weighted scoring with
topology / duration / consensus gates). LO2 endoductive anomalies are
deliberately NOT in the bank — they validate the
SemanticDisposition::Unknown branch.
See docs/heuristics_bank.md for the
per-motif reference.
Detector Orthogonality
The fusion harness wires 205 deterministic detectors organised across 27 mathematical axes (Tiers A–U + EXTRA + V/X/Y/Z/AA). Each axis contributes a distinct mathematical foundation; orthogonal combinations form the consensus arithmetic and route through per-motif tier-affinity masks per the Routed Evidence Principle (paper §6.5).
| Tier | Family | Count | Examples | Mathematical Axis |
|---|---|---|---|---|
| A | Tier-A parametric trio | 3 | scalar, CUSUM, EWMA | fixed-baseline / cumulative / smoothed |
| B | Robust statistics | 3 | robust-Z, Page-Hinkley, Tukey-IQR | MAD-based / min-CUSUM / IQR fence |
| C | Model / non-parametric | 5 | spectral residual, matrix profile, BOCPD, IsoForest, LOF | spectral / sub-sequence / Bayesian / tree / density |
| D | Additional non-dep | 5 | Mann-Kendall, rolling-Z, AR(1), Mahalanobis, KS-rolling | trend / adaptive / autoregressive / multivariate / distribution |
| E | Debugging-specific stats | 3 | Poisson burst, saturation chain, chi-sq proportion | Poisson tail / N-consecutive / proportion shift |
| F | Neuroscience burst | 4 | MaxInterval, LogISI, Rank-Surprise, MISI | inter-event-interval translated detectors |
| Extra | Session 8.5 | 5 | GLR, ADWIN, MEWMA, retry-storm, correlation-break | LR-test / Hoeffding / multivariate EWMA / domain |
| G | Concept drift streaming | 9 | Shiryaev-Roberts, DDM, EDDM, HDDM-A/W, STEPD, ECDD, KSWIN, FHDDM | error-rate sequential change-point |
| H | Distribution shift | 10 | Wasserstein, JS, KL, PSI, AD, CvM, Energy, MMD, Bhattacharyya, Hellinger | distributional distance |
| I | Robust / nonparametric | 10 | Median-abs-slope, Theil-Sen, Sen-CP, Mood, Brown-Forsythe, Levene, Sign, Runs, WW, Sequential-rank | rank-based / robust |
| J | Forecast residual | 10 | SES, Holt, Holt-Winters, AR(2), ARIMA, Kalman, SavGol, STL, Prophet, naive seasonal | forecast-residual envelope |
| K | Frequency / oscillation | 10 | FFT-band, Welch, Wavelet, autocorr, Lomb-Scargle, ZCR, dominant-freq, spectral-entropy, cepstral, phase-locking | spectral / temporal |
| L | Multivariate relationship | 9 | Hotelling T², MCUSUM, PCA-recon, Robust-PCA, corr-mat-dist, partial-corr, Laplacian, CCA, MI | covariance / subspace |
| M | Debugging-native | 18 | flap, sawtooth, deadband, quantization, plateau, counter-wrap, monotone-leak, hysteresis, limit-cycle, ping-pong, backpressure, causal-lag, fan-out, fan-in, phase-slip, jitter-bloom, tail-thickening, burst-after-silence | named bug-shapes |
| N | Offline CPD | 8 | PELT, BinSeg, BottomUp, Window, DP, Kernel, Piecewise-linear, Bayesian-offline | dynamic programming / segmentation |
| O | Rare changepoint | 10 | MOSUM, NOT, WBS2, Seeded-BS, SMUCE, FDRSeg, FPOP, TGUH, Inspect, Double-CUSUM-BS | multiscale / interval-randomized |
| P | Streaming / sequential | 9 | E-detector, conformal-martingale, exch-martingale, power-martingale, mixture-martingale, mixture-SPRT, scan-stat, higher-criticism, Berk-Jones | e-process / multi-test |
| Q | Concept drift rarer | 10 | MDDM-A/E/G, LFR, FPDD, OPTWIN, SeqDrift2, D3, QuantTree, NN-DVI | weighted Hoeffding / Fisher / discriminative |
| R | Robust depth | 8 | halfspace, projection, Stahel-Donoho, MCD, spatial-sign, S-est, depth-rank, median-polish | depth-based / direction-based |
| S | Count / event-process | 3 | Bayesian-blocks, Index-of-dispersion, Allan-variance | count-process / variance-time |
| T | Info-theoretic | 6 | MDL, NCD, Lempel-Ziv, transfer-entropy, Fisher-info, Renyi-entropy | description-length / surprise |
| U | Dynamical systems | 8 | permutation-entropy, sample-entropy, RQA, Lyapunov, correlation-dim, BDS, 0-1-chaos, delay-embedding-NN | nonlinear dynamics |
| V | Industrial fault-diagnosis (FDD) | ~8 | parity-space residual, observer-based, dependability-engineering | parity / observer |
| X | Climate homogeneity | ~8 | Buishand-range, SNHT, Pettitt-step, cumulative-deviation | cumulative-deviation tests |
| Y | Robust dispersion / rank | ~8 | median-of-means, U-statistic, rank-step | dispersion / rank-step |
| Z | Circular / directional | ~8 | R-bar, Rayleigh, circular-mean shift, phase-jump | phase / directional |
| AA | Higher-order nonlinear | ~11 | ARCH, kurtosis, second-moment volatility | non-Gaussian / heavy-tail |
| Total | ~205 | + DSFB structural | 27 orthogonal axes |
Each detector is implemented as a deterministic pub fn in
src/incumbent_baselines.rs, citation in
the doc-comment. Where literature requires FFT/DFT, MCMC, or non-
deterministic randomization, the implementation uses a documented
deterministic-seed reduction or time-domain proxy — the simplification
is named honestly in each detector's doc-comment. See
docs/fusion_design.md for the per-tier
notes and consensus contribution rules.
Real-World Dataset Coverage — 12 Real-Bytes Fixtures Across 9 Datasets
All Phase F + Phase G fixtures are vendored as real upstream byte
slices (no synthetic data, ever; paper-lock hard-errors on missing
real data with MissingRealData). Every fixture is DOI-pinned, SHA-256-
gated, and replayable under Theorem 9.
| # | Dataset | Source | Distinguishing regime | Test file |
|---|---|---|---|---|
| 1 | TADBench / TrainTicket F-11 | Zenodo 10.5281/zenodo.6979726 |
order-service deployment-regression (35 604 spans) | tests/eval_tadbench.rs |
| 2 | TADBench / TrainTicket F-11b | same | auth-mongo fault — cross-fixture validation (108 spans) | same |
| 3 | TADBench / TrainTicket F-04 | same | admin-service springstarter version-config (25 316 spans) | same |
| 4 | TADBench / TrainTicket F-19 | same | mongodb-driver-3.0.4 version-config (19 281 spans) | same |
| 5 | Illinois SocialNetwork | DataBank 10.13012/B2IDB-6738796_V1 |
unsampled DeathStarBench trace (160 000 traces) | tests/eval_illinois.rs |
| 6 | AIOps Challenge 2018 KPI | NetManAIOps/Bagel sample_data.csv |
KPI seasonal anomaly (Su et al., IPCCC 2018) | tests/eval_aiops_challenge.rs |
| 7 | LO2 (PROMISE 2025) | Zenodo 10.5281/zenodo.14257989 |
OAuth2 Go-runtime metrics; endoductive validator | tests/eval_lo2.rs |
| 8 | MultiDim-Localization | NetManAIOps/MultiDimension-Localization | 5-dim categorical aggregate (high-dim service graph) | tests/eval_multidim.rs |
| 9 | DeepTraLog F-01 | FudanSELab/DeepTraLog | combined log+trace ERROR span data (Zhang et al., ICSE 2022) | tests/eval_deeptralog.rs |
| 10 | Defects4J (Phase G) | rjust/defects4j | Java bug catalog — 6 projects × 30 bugs (Just et al., ISSTA 2014) | tests/eval_defects4j.rs |
| 11 | BugsInPy (Phase G) | soarsmu/BugsInPy | Python bug catalog — 6 projects × 30 bugs (Widyasari et al., FSE 2020) | tests/eval_bugsinpy.rs |
| 12 | PROMISE defect-prediction (Phase G) | ssea-lab/PROMISE | Per-module Chidamber-Kemerer OO metrics + bug counts (Menzies et al., 2003+) | tests/eval_promise.rs |
Verbatim metrics from the latest cargo test --features "std paper-lock" -- --nocapture run:
| Fixture | Episodes | Raw alerts | RSCR | Clean FP | Replay |
|---|---|---|---|---|---|
| TADBench F-11 (deployment-regression) | 3 | 11 | 3.67× | 0.0070 | ok |
| TADBench F-11b (auth-mongo) | 1 | 7 | 7.0× | 0.2500 | ok |
| TADBench F-04 (config regression) | 0 | 0 | 0 | 0 | ok |
| TADBench F-19 (mongodb-driver) | 0 | 0 | 0 | 0 | ok |
| Illinois SocialNetwork | 1 | 24 | 24.0× | 0.0313 | ok |
| AIOps Challenge KPI | 1 | 52 | 52.0× | 0.0323 | ok |
| LO2 OAuth2 (endoductive) | 0 | 0 | 0 | 0 | ok |
| MultiDim part1 | 1 | 19 | 19.0× | 0.0833 | ok |
| DeepTraLog F-01 | 0 | 0 | 0 | 0 | ok |
| Defects4J | 1 | 62 | 62.0× | 0.0333 | ok |
| BugsInPy | 0 | 0 | 0 | 0 | ok |
| PROMISE | 1 | 4 | 4.0× | 0.0333 | ok |
See data/MANIFEST.toml for per-dataset DOI,
license, and SHA-256 gates; docs/dataset_provenance.md
for the full provenance ledger;
data/upstream/project_*.py for each fixture's
deterministic projection script.
9-Axis Bank-Aware Fusion + Anti-Hallucination Ladder
The fusion harness is governed by a nine-axis configuration that
progressively tightens the typed-disposition decision. Each axis is a
single FusionConfig flag; each axis was introduced in a specific phase
to close a specific failure mode (paper §11.x).
| Axis # | Name | Phase | Role |
|---|---|---|---|
| 1 | Provenance gate | 0 | Filter motifs by Provenance rank |
| 2 | Margin gate | 2 | Demote to ambiguous when top-vs-runner-up margin below θ |
| 3 | Tier-affinity scoring | 2.5 | Per-motif affinity_tiers bitmask routes detector evidence |
| 4 | Zero-tier-firing filter | 5.5 | Reject motifs whose entire affinity mask saw no detector firings |
| 5 | Adaptive margin gate | 5.5 | Halve θ when tier_consensus_factor > 0.5 |
| 6 | Confuser-boundary adjudication | 5.6 | Compute margin_vs_confuser; emit ConfuserAmbiguous below θ_c |
| 7 | Structural disambiguator boost | 6 | Bonus on per-motif disambiguator_tier |
| 8 | Tier-level primary witness | 7 | At least one detector in primary_witness_tiers must fire |
| 9 | Per-detector named witness | 8 | At least one of primary_witness_detectors must fire (vacuous-pass when zero captured) |
The Anti-Hallucination Ladder uses axes 6/8/9 as four progressive strictness rungs (Phase 0 / 5.6 / 7 / 8). Operators choose the rung appropriate for their fault profile; recall trades against typing precision.
Fusion Sweep Results — F-11
Captured from the fusion_sweep_on_f11_fixture test (35 604 real Jaeger
spans, 16 services, 431 windows). Per-window FP rate computed against
the healthy slice. Run cargo test --features "std paper-lock" --test fusion_compare -- --nocapture for the verbatim matrix.
| Min consensus | Layer-2 episodes | Layer-2 FP rate | Layer-3 typed-confirmed | Determinism |
|---|---|---|---|---|
| N≥1 | 12 | 0.1230 | 3 | ok |
| N≥3 | 7 | 0.0302 | 3 | ok |
| N≥5 | 1 | 0.0093 | 1 | ok |
| N≥7 | 1 | 0.0023 | 1 | ok |
| N≥9 | 0 | 0.0000 | 0 | ok |
Single-detector minima for reference (tests/incumbent_compare.rs):
scalar-3σ 0.0139, CUSUM 0.0116, EWMA 0.0812, DSFB-structural 0.0070.
At N≥7, fusion FP rate (0.0023) is ~3× lower than DSFB-structural alone,
~5× lower than CUSUM, ~6× lower than scalar-3σ, ~35× lower than EWMA.
Honest claim: F-11 + F-11b numbers are bounded to those fixtures. Generalisation requires Phase II partner-data engagement.
Standards Alignment
NIST SP 800-53 (AU-2/3/6/12), NIST SP 800-92, NIST SP 800-171, DO-178C §6.3 (pathway-eligible foresight, not certification), IEEE 1012-2016, ISO/IEC 25010, OpenTelemetry, W3C Trace Context, SOC 2 Type II.
| Control | Mapped Component | Audit Path |
|---|---|---|
| AU-2 (Audit Events) | structured BenchmarkMetrics JSON output |
tests/fusion_compare.rs stdout |
| AU-3 (Record Content) | per-motif HeuristicEntry (provenance, DOI, taxonomy) |
src/heuristics_bank.rs |
| AU-6 (Review/Analysis) | episode catalog + similarity matching | src/episode_catalog.rs |
| AU-12 (Audit Generation) | verify_deterministic_replay mechanism |
src/lib.rs |
| CMMC AU.L2-3.3.1/2 | content-of-records discipline | docs/standards_alignment.md |
| DO-178C §6.3 | pathway-eligible architectural alignment | observer-only contract type-enforced |
See docs/standards_alignment.md for
control-level mapping.
Quick Verification
cargo build --no-default-features
cargo test --no-default-features
cargo test --features "std paper-lock" -- --nocapture
cargo clippy --features "std paper-lock" --all-targets -- -D warnings
License
Apache 2.0 (reference implementation). Background IP: Invariant Forge LLC. Commercial deployment requires separate written license. Contact: licensing@invariantforge.net