dsfb-debug

DSFB-Debug — Structural Semiotics Engine for Software Debugging

A deterministic, read-only, observer-only augmentation layer that turns the residuals every observability stack already discards into typed, human-readable debugging episodes.

Invariant Forge LLC — Riaan de Beer — ORCID: 0009-0006-1155-027X

Citation

If you use DSFB-Debug, cite the Zenodo software record:

de Beer, R. (2026). DSFB-Debug Structural Detector-Field Residual Semiotics Engine for Software Debugging: A Deterministic Augmentation Layer for Typed Residual Interpretation of Execution Traces, Log Streams, and Observability Telemetry in Production Software Systems (v1.0). Zenodo. https://doi.org/10.5281/zenodo.20088863

Zenodo record: https://zenodo.org/records/20088863

Paper Version 1.0 and internal Phase labels are research-publication identifiers; crate semver remains 0.1.0 until crates.io release stabilization.

What It Does

Reads residuals from existing observability tools (OpenTelemetry, Jaeger, Datadog, ELK, Prometheus, Sentry) and from code-defect catalogs (Defects4J, BugsInPy, PROMISE).
Computes typed structural interpretation: drift, slew, grammar state, motif, episode.
Performs Trace Event Collapse: raw alerts → few typed episodes (RSCR ranges from 3.67× on the F-11 production-trace fixture to 62× on the Defects4J bug-catalog fixture; verbatim test stdout, bounded to those fixtures).
Fuses 205 deterministic detectors organised across 27 mathematical axes (Tiers A–U + EXTRA + V/X/Y/Z/AA) into a consensus-validated typed output via the heuristics bank.
Operates the 9-axis bank-aware fusion with a 4-rung Anti-Hallucination Ladder — Phase 0 (raw) / Phase 5.6 (confuser- boundary) / Phase 7 (tier-witness) / Phase 8 (per-detector witness) — each rung a single config flag, each rung tested deterministically.
Produces deterministic, reproducible, auditable outputs (Theorem 9 — empirically verified on every real-bytes fixture).

What It Does NOT Do

Does NOT replace any existing observability tool.
Does NOT claim faster detection or higher accuracy than existing methods. The framing is augmentation, not competition.
Does NOT modify upstream data (compile-time enforced via &[T]-only public surface).
The no_std operational core is zero-runtime-dependency, #![forbid(unsafe_code)], and avoids unwrap/panic paths in public core evaluation APIs. std/demo/test/audit paths are separated from the no_std core.

Crate Properties

Property	Enforcement
`#![no_std]`	No standard library dependency
`#![forbid(unsafe_code)]`	Zero unsafe blocks
`#![deny(clippy::unwrap_used)]`	Core APIs avoid unwrap paths; std/demo/test/audit paths are separated
Zero runtime deps	Hand-rolled SHA-256, DFT, matrix algebra
Observer-only	All APIs accept `&[T]` only
Deterministic	Theorem 9: identical inputs → identical outputs

Mathematical Summary

The engine is a deterministic function from a residual matrix to a list of typed episodes. The core construction at each (window w, signal s) cell:

Concept	Definition	Source
Residual signature	`σ(k) = (‖r(k)‖, ṙ(k), r̈(k))`	per-(w,s) magnitude, slew, curvature
Admissibility envelope	`‖r(k)‖ ≤ ρ`	violation triggers grammar state Boundary or Violation
Drift persistence	`Σ_{w' in [w-D, w]} I(‖r(k')‖ > μ)` over rolling D windows	drift dwell length
Slew density	mean `	ṙ(k)
Grammar state	`{Admissible, Boundary, Violation}` from σ + thresholds	per-(w,s) symbol
Motif	typed pattern from `HeuristicsBank::lookup(reason_code, drift, slew)`	per-(w,s) typing
Episode	contiguous run of non-Admissible windows aggregated	per-signal collapse

Fusion consensus at each (w, s) cell:

consensus(w, s) = Σ_{cell-level detectors d}     I(d fires at (w, s))
                + Σ_{window-level detectors d}   I(d fires at w)
fire(w, s)      = consensus(w, s) ≥ min_consensus

A DSFB structural episode is consensus-confirmed when at least one of its windows fires. The bank's match_episode_with_consensus then scores the typed motif with consensus boost:

score(motif, ep) = w_drift   · drift_score(ep)
                 + w_slew    · slew_score(ep)
                 + w_bound   · boundary_score(ep)
                 + w_corr    · correlation_score(ep)
                 + w_dur     · duration_score(ep)
                 + α · (consensus_max(ep) / max_detectors)

The bank's per-motif weights w_* and consensus-factor α are documented inline at src/heuristics_bank.rs.

Theorem 9 (Deterministic Replay). For any byte-stable residual matrix input, two consecutive engine.run_evaluation(...) calls produce byte-identical episode output. Mechanically proven by composition of deterministic stages; empirically verified on F-11 (35 604 Jaeger spans).

Code Tour

use dsfb_debug::DsfbDebugEngine;

// Engine creation under paper-lock parameters.
let engine = DsfbDebugEngine::<32, 64>::paper_lock().unwrap();

// Per-signal evaluation (single signal).
let eval = engine.evaluate_signal(&residual_slice, num_windows);

// Full residual-matrix evaluation.
let mut evals    = vec![Default::default(); num_signals * num_windows];
let mut episodes = vec![Default::default(); 256];
let (count, metrics) = engine.run_evaluation(
    &data, num_signals, num_windows, &fault_labels,
    healthy_window_end, &mut evals, &mut episodes, "fixture_name",
).unwrap();

// Causality (graph attribution): augments episodes with root_cause_signal_index.
let (count, _) = engine.run_evaluation_with_graph(
    &data, num_signals, num_windows, &fault_labels,
    healthy_window_end, &service_graph, &mut evals, &mut episodes, "fixture",
).unwrap();

Multi-detector fusion

use dsfb_debug::fusion::{run_fusion_evaluation, FusionConfig};

let cfg = FusionConfig {
    min_consensus: 3,
    ..FusionConfig::ALL_DEFAULT  // enables all 205 detectors + 9-axis bank-aware fusion
};
let metrics = run_fusion_evaluation(
    &engine, &data, num_signals, num_windows, healthy_window_end,
    &fault_labels, &cfg, "fixture",
).unwrap();

Operator-side helpers

use dsfb_debug::{render_episode_summary, EpisodeCatalog};
use dsfb_debug::calibration::recommend_config_from_healthy;

// Render an episode for an operator dashboard.
let summary = render_episode_summary(&episode, &signal_names);

// Episode catalog: similarity matching against past closed episodes.
let mut catalog = EpisodeCatalog::<256>::new();
catalog.record(episode);
let nearest = catalog.find_similar(&query);

// Site calibration on a healthy slice.
let report = recommend_config_from_healthy(
    &healthy_data, num_signals, healthy_windows, /* percentile = */ 0.90);

Heuristics Bank Coverage

The bank ships 32 hand-curated motifs across six tiers, each with provenance (FrameworkDesign / DatasetObserved / FieldValidated), the upstream dataset name + DOI it was observed in, a dashboard hint for production debug engineers, and an IEEE 24765 / Avizienis-Laprie- Randell taxonomy anchor — no ad-hoc names.

Tier	Source	Motifs	Provenance
1	Original framework	10	`FrameworkDesign`
2	TADBench fault cases	5	`DatasetObserved`
3	AIOps Challenge categories	6	`DatasetObserved`
4	MultiDim-Localization patterns	3	`DatasetObserved`
5	DeepTraLog log+trace fusion	3	`DatasetObserved`
6	Cross-cutting structural	5	`FrameworkDesign`
Total		32

MAX_MOTIFS = 64 leaves 32 free slots for v0.3 / v0.4 expansion. The bank exposes two lookup paths: per-signal lookup and per-episode match_episode_with_consensus (multi-feature weighted scoring with topology / duration / consensus gates). LO2 endoductive anomalies are deliberately NOT in the bank — they validate the SemanticDisposition::Unknown branch.

See docs/heuristics_bank.md for the per-motif reference.

Detector Orthogonality

The fusion harness wires 205 deterministic detectors organised across 27 mathematical axes (Tiers A–U + EXTRA + V/X/Y/Z/AA). Each axis contributes a distinct mathematical foundation; orthogonal combinations form the consensus arithmetic and route through per-motif tier-affinity masks per the Routed Evidence Principle (paper §6.5).

Tier	Family	Count	Examples	Mathematical Axis
A	Tier-A parametric trio	3	scalar, CUSUM, EWMA	fixed-baseline / cumulative / smoothed
B	Robust statistics	3	robust-Z, Page-Hinkley, Tukey-IQR	MAD-based / min-CUSUM / IQR fence
C	Model / non-parametric	5	spectral residual, matrix profile, BOCPD, IsoForest, LOF	spectral / sub-sequence / Bayesian / tree / density
D	Additional non-dep	5	Mann-Kendall, rolling-Z, AR(1), Mahalanobis, KS-rolling	trend / adaptive / autoregressive / multivariate / distribution
E	Debugging-specific stats	3	Poisson burst, saturation chain, chi-sq proportion	Poisson tail / N-consecutive / proportion shift
F	Neuroscience burst	4	MaxInterval, LogISI, Rank-Surprise, MISI	inter-event-interval translated detectors
Extra	Session 8.5	5	GLR, ADWIN, MEWMA, retry-storm, correlation-break	LR-test / Hoeffding / multivariate EWMA / domain
G	Concept drift streaming	9	Shiryaev-Roberts, DDM, EDDM, HDDM-A/W, STEPD, ECDD, KSWIN, FHDDM	error-rate sequential change-point
H	Distribution shift	10	Wasserstein, JS, KL, PSI, AD, CvM, Energy, MMD, Bhattacharyya, Hellinger	distributional distance
I	Robust / nonparametric	10	Median-abs-slope, Theil-Sen, Sen-CP, Mood, Brown-Forsythe, Levene, Sign, Runs, WW, Sequential-rank	rank-based / robust
J	Forecast residual	10	SES, Holt, Holt-Winters, AR(2), ARIMA, Kalman, SavGol, STL, Prophet, naive seasonal	forecast-residual envelope
K	Frequency / oscillation	10	FFT-band, Welch, Wavelet, autocorr, Lomb-Scargle, ZCR, dominant-freq, spectral-entropy, cepstral, phase-locking	spectral / temporal
L	Multivariate relationship	9	Hotelling T², MCUSUM, PCA-recon, Robust-PCA, corr-mat-dist, partial-corr, Laplacian, CCA, MI	covariance / subspace
M	Debugging-native	18	flap, sawtooth, deadband, quantization, plateau, counter-wrap, monotone-leak, hysteresis, limit-cycle, ping-pong, backpressure, causal-lag, fan-out, fan-in, phase-slip, jitter-bloom, tail-thickening, burst-after-silence	named bug-shapes
N	Offline CPD	8	PELT, BinSeg, BottomUp, Window, DP, Kernel, Piecewise-linear, Bayesian-offline	dynamic programming / segmentation
O	Rare changepoint	10	MOSUM, NOT, WBS2, Seeded-BS, SMUCE, FDRSeg, FPOP, TGUH, Inspect, Double-CUSUM-BS	multiscale / interval-randomized
P	Streaming / sequential	9	E-detector, conformal-martingale, exch-martingale, power-martingale, mixture-martingale, mixture-SPRT, scan-stat, higher-criticism, Berk-Jones	e-process / multi-test
Q	Concept drift rarer	10	MDDM-A/E/G, LFR, FPDD, OPTWIN, SeqDrift2, D3, QuantTree, NN-DVI	weighted Hoeffding / Fisher / discriminative
R	Robust depth	8	halfspace, projection, Stahel-Donoho, MCD, spatial-sign, S-est, depth-rank, median-polish	depth-based / direction-based
S	Count / event-process	3	Bayesian-blocks, Index-of-dispersion, Allan-variance	count-process / variance-time
T	Info-theoretic	6	MDL, NCD, Lempel-Ziv, transfer-entropy, Fisher-info, Renyi-entropy	description-length / surprise
U	Dynamical systems	8	permutation-entropy, sample-entropy, RQA, Lyapunov, correlation-dim, BDS, 0-1-chaos, delay-embedding-NN	nonlinear dynamics
V	Industrial fault-diagnosis (FDD)	~8	parity-space residual, observer-based, dependability-engineering	parity / observer
X	Climate homogeneity	~8	Buishand-range, SNHT, Pettitt-step, cumulative-deviation	cumulative-deviation tests
Y	Robust dispersion / rank	~8	median-of-means, U-statistic, rank-step	dispersion / rank-step
Z	Circular / directional	~8	R-bar, Rayleigh, circular-mean shift, phase-jump	phase / directional
AA	Higher-order nonlinear	~11	ARCH, kurtosis, second-moment volatility	non-Gaussian / heavy-tail
Total		~205	+ DSFB structural	27 orthogonal axes

Each detector is implemented as a deterministic pub fn in src/incumbent_baselines.rs, citation in the doc-comment. Where literature requires FFT/DFT, MCMC, or non- deterministic randomization, the implementation uses a documented deterministic-seed reduction or time-domain proxy — the simplification is named honestly in each detector's doc-comment. See docs/fusion_design.md for the per-tier notes and consensus contribution rules.

Real-World Dataset Coverage — 12 Real-Bytes Fixtures Across 9 Datasets

All Phase F + Phase G fixtures are vendored as real upstream byte slices (no synthetic data, ever; paper-lock hard-errors on missing real data with MissingRealData). Every fixture is DOI-pinned, SHA-256- gated, and replayable under Theorem 9.

#	Dataset	Source	Distinguishing regime	Test file
1	TADBench / TrainTicket F-11	Zenodo `10.5281/zenodo.6979726`	order-service deployment-regression (35 604 spans)	`tests/eval_tadbench.rs`
2	TADBench / TrainTicket F-11b	same	auth-mongo fault — cross-fixture validation (108 spans)	same
3	TADBench / TrainTicket F-04	same	admin-service springstarter version-config (25 316 spans)	same
4	TADBench / TrainTicket F-19	same	mongodb-driver-3.0.4 version-config (19 281 spans)	same
5	Illinois SocialNetwork	DataBank `10.13012/B2IDB-6738796_V1`	unsampled DeathStarBench trace (160 000 traces)	`tests/eval_illinois.rs`
6	AIOps Challenge 2018 KPI	NetManAIOps/Bagel `sample_data.csv`	KPI seasonal anomaly (Su et al., IPCCC 2018)	`tests/eval_aiops_challenge.rs`
7	LO2 (PROMISE 2025)	Zenodo `10.5281/zenodo.14257989`	OAuth2 Go-runtime metrics; endoductive validator	`tests/eval_lo2.rs`
8	MultiDim-Localization	NetManAIOps/MultiDimension-Localization	5-dim categorical aggregate (high-dim service graph)	`tests/eval_multidim.rs`
9	DeepTraLog F-01	FudanSELab/DeepTraLog	combined log+trace ERROR span data (Zhang et al., ICSE 2022)	`tests/eval_deeptralog.rs`
10	Defects4J (Phase G)	rjust/defects4j	Java bug catalog — 6 projects × 30 bugs (Just et al., ISSTA 2014)	`tests/eval_defects4j.rs`
11	BugsInPy (Phase G)	soarsmu/BugsInPy	Python bug catalog — 6 projects × 30 bugs (Widyasari et al., FSE 2020)	`tests/eval_bugsinpy.rs`
12	PROMISE defect-prediction (Phase G)	ssea-lab/PROMISE	Per-module Chidamber-Kemerer OO metrics + bug counts (Menzies et al., 2003+)	`tests/eval_promise.rs`

Verbatim metrics from the latest cargo test --features "std paper-lock" -- --nocapture run:

Fixture	Episodes	Raw alerts	RSCR	Clean FP	Replay
TADBench F-11 (deployment-regression)	3	11	3.67×	0.0070	ok
TADBench F-11b (auth-mongo)	1	7	7.0×	0.2500	ok
TADBench F-04 (config regression)	0	0	0	0	ok
TADBench F-19 (mongodb-driver)	0	0	0	0	ok
Illinois SocialNetwork	1	24	24.0×	0.0313	ok
AIOps Challenge KPI	1	52	52.0×	0.0323	ok
LO2 OAuth2 (endoductive)	0	0	0	0	ok
MultiDim part1	1	19	19.0×	0.0833	ok
DeepTraLog F-01	0	0	0	0	ok
Defects4J	1	62	62.0×	0.0333	ok
BugsInPy	0	0	0	0	ok
PROMISE	1	4	4.0×	0.0333	ok

See data/MANIFEST.toml for per-dataset DOI, license, and SHA-256 gates; docs/dataset_provenance.md for the full provenance ledger; data/upstream/project_*.py for each fixture's deterministic projection script.

9-Axis Bank-Aware Fusion + Anti-Hallucination Ladder

The fusion harness is governed by a nine-axis configuration that progressively tightens the typed-disposition decision. Each axis is a single FusionConfig flag; each axis was introduced in a specific phase to close a specific failure mode (paper §11.x).

Axis #	Name	Phase	Role
1	Provenance gate	0	Filter motifs by `Provenance` rank
2	Margin gate	2	Demote to ambiguous when top-vs-runner-up margin below θ
3	Tier-affinity scoring	2.5	Per-motif `affinity_tiers` bitmask routes detector evidence
4	Zero-tier-firing filter	5.5	Reject motifs whose entire affinity mask saw no detector firings
5	Adaptive margin gate	5.5	Halve θ when `tier_consensus_factor > 0.5`
6	Confuser-boundary adjudication	5.6	Compute `margin_vs_confuser`; emit `ConfuserAmbiguous` below θ_c
7	Structural disambiguator boost	6	Bonus on per-motif `disambiguator_tier`
8	Tier-level primary witness	7	At least one detector in `primary_witness_tiers` must fire
9	Per-detector named witness	8	At least one of `primary_witness_detectors` must fire (vacuous-pass when zero captured)

The Anti-Hallucination Ladder uses axes 6/8/9 as four progressive strictness rungs (Phase 0 / 5.6 / 7 / 8). Operators choose the rung appropriate for their fault profile; recall trades against typing precision.

Fusion Sweep Results — F-11

Captured from the fusion_sweep_on_f11_fixture test (35 604 real Jaeger spans, 16 services, 431 windows). Per-window FP rate computed against the healthy slice. Run cargo test --features "std paper-lock" --test fusion_compare -- --nocapture for the verbatim matrix.

Min consensus	Layer-2 episodes	Layer-2 FP rate	Layer-3 typed-confirmed	Determinism
N≥1	12	0.1230	3	ok
N≥3	7	0.0302	3	ok
N≥5	1	0.0093	1	ok
N≥7	1	0.0023	1	ok
N≥9	0	0.0000	0	ok

Single-detector minima for reference (tests/incumbent_compare.rs): scalar-3σ 0.0139, CUSUM 0.0116, EWMA 0.0812, DSFB-structural 0.0070. At N≥7, fusion FP rate (0.0023) is ~3× lower than DSFB-structural alone, ~5× lower than CUSUM, ~6× lower than scalar-3σ, ~35× lower than EWMA.

Honest claim: F-11 + F-11b numbers are bounded to those fixtures. Generalisation requires Phase II partner-data engagement.

Standards Alignment

NIST SP 800-53 (AU-2/3/6/12), NIST SP 800-92, NIST SP 800-171, DO-178C §6.3 (pathway-eligible foresight, not certification), IEEE 1012-2016, ISO/IEC 25010, OpenTelemetry, W3C Trace Context, SOC 2 Type II.

Control	Mapped Component	Audit Path
AU-2 (Audit Events)	structured `BenchmarkMetrics` JSON output	`tests/fusion_compare.rs` stdout
AU-3 (Record Content)	per-motif `HeuristicEntry` (provenance, DOI, taxonomy)	`src/heuristics_bank.rs`
AU-6 (Review/Analysis)	episode catalog + similarity matching	`src/episode_catalog.rs`
AU-12 (Audit Generation)	`verify_deterministic_replay` mechanism	`src/lib.rs`
CMMC AU.L2-3.3.1/2	content-of-records discipline	`docs/standards_alignment.md`
DO-178C §6.3	pathway-eligible architectural alignment	observer-only contract type-enforced

See docs/standards_alignment.md for control-level mapping.

Quick Verification

cargo build --no-default-features
cargo test  --no-default-features
cargo test  --features "std paper-lock" -- --nocapture
cargo clippy --features "std paper-lock" --all-targets -- -D warnings

License

Apache 2.0 (reference implementation). Background IP: Invariant Forge LLC. Commercial deployment requires separate written license. Contact: licensing@invariantforge.net

dsfb-debug 0.1.0