deep_causality_discovery
Introduction
deep_causality_discovery is a Rust crate that provides a Causal Discovery Language (CDL) for the DeepCausality
project. It offers a modular, type-safe pipeline to move from raw observational data to actionable causal insights. By
abstracting the statistical and algorithmic steps, it lets you define and run causal discovery workflows that ultimately
inform the construction of causal models.
Algorithms
CDL hosts two discovery algorithms as peer pipelines:
- SURD (Synergistic, Unique, Redundant Decomposition): an information-theoretic decomposition of how a set of source variables drive a target, computed from a single dataset.
- BRCD (Bayesian Root-Cause Discovery): ranks the variables whose conditional mechanism changed between a normal and an anomalous regime, given a causal graph over the variables. The graph can be supplied as a CPDAG, or learned from the normal data via BOSS when none is given.
Workflow
The CDL is a builder over Rust's typestate pattern: the pipeline's state is encoded in the type system, so the compiler guarantees the stages run in a valid order. The two algorithms are compile-time-isolated sub-pipelines that converge on a shared analyze/finalize tail. Calling a BRCD stage on a SURD pipeline (or the reverse) does not compile.
1. Build the run config (the single source of truth)
CdlConfigBuilder is a staged typestate builder. Required fields are enforced at compile time (build() only exists
once they are all set), and build() additionally verifies that the referenced files exist:
CdlConfigBuilder::build_surd_config::<T>()→SurdLoaderConfig<T>: the dataset path, target index, MRMR feature count, max interaction order, and analysis thresholds (optional: exclude indices, CSV options).CdlConfigBuilder::build_brcd_config()→BrcdLoaderConfig<T>: the normal-dataset path, anomalous-dataset path, and the reused algorithmBrcdConfig<T>(optional: CPDAG path, CSV options). No CPDAG path means the structure is learned via BOSS.
2. Run a sub-pipeline
CdlBuilder::build_surd(&cfg) / CdlBuilder::build_brcd(&cfg) seed the pipeline with the config. Every stage reads its
parameters from the config, so the chain itself is parameterless:
- SURD:
surd_load_input → clean_data → feature_select → surd_discover → surd_analyze → finalize - BRCD:
brcd_load_input → brcd_discover → brcd_analyze → finalize
Each stage is a method on the pipeline effect, so the chain reads top to bottom with no per-line wrapper. The CdlEffect
monad short-circuits on the first error and threads warnings through; print_results() renders the final CdlReport
(or the error). The discovery result is carried as a CdlDiscoveryOutcome (Surd or Brcd) and the report's Display
renders the matching section.
Installation
Add deep_causality_discovery to your Cargo.toml file:
Usage
SURD: information-theoretic decomposition
use *;
BRCD: root-cause ranking from two regimes
use *;
The CPDAG file is the typed-endpoint CSV format load_cpdag_csv / save_cpdag_csv read and write: a # … vertices=N
header followed by src,dst,mark_src,mark_dst rows, where each mark is Tail, Arrow, or Circle (Tail,Arrow is a
directed arc, Tail,Tail an undirected edge).
Error Handling
The crate defines a specific error type for each stage of the pipeline (for example DataLoadingError,
FeatureSelectError, CausalDiscoveryError, CpdagError, BrcdLoadError), all funneled into CdlError. This allows
precise identification and handling of issues, and the CdlEffect monad short-circuits on the first error.
From Discovery to Model: Connecting CDL to DeepCausality
The deep_causality_discovery crate acts as a bridge, transforming observational data into the foundational elements for
building executable causal models with the DeepCausality library.
- SURD →
CausaloidGraphstructure and logic. Strong unique influences suggest direct causal links (Causaloid(Source) -> Causaloid(Target)). Synergistic influences indicate that multiple sources are jointly required to cause an effect, guiding many-to-one connections and the choice ofAggregateLogicwithin aCausaloidCollection(strong synergy →AggregateLogic::All; unique/redundant →AggregateLogic::Any). State-dependent maps from the SURD analysis provide conditional logic for aCausaloid'scausal_fn. - BRCD → fault localization. Given a normal and an anomalous window over a known service/dependency graph, BRCD ranks which node's mechanism changed, pointing the operator at the root cause of an incident rather than the collateral.
👨💻👩💻 Contribution
Contributions are welcomed especially related to documentation, example code, and fixes. If unsure where to start, just open an issue and ask.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in deep_causality by you, shall be licensed under the MIT licence, without any additional terms or conditions.
📜 Licence
This project is licensed under the MIT license.
👮️ Security
For details about security, please read the security policy.
💻 Author
- Marvin Hansen.
- Github GPG key ID: 369D5A0B210D39BC
- GPG Fingerprint: 4B18 F7B2 04B9 7A72 967E 663E 369D 5A0B 210D 39BC