🔍 deep_causality_discovery 🔍
Introduction
deep_causality_discovery is a Rust crate that provides a Causal Discovery Language (CDL) for the DeepCausality
project. It offers a powerful, modular, and type-safe pipeline to move from raw observational data to actionable causal
insights. By abstracting complex statistical and algorithmic steps, it enables users to define and execute causal
discovery workflows with ease, ultimately informing the construction of causal models.
Workflow
The core of the CDL is a builder pattern that uses Rust's typestate pattern. This means the pipeline's state is encoded in the type system, which guarantees at compile-time that the steps are executed in a valid sequence.
The workflow consists of the following sequential stages:
- Configuration (
CdlConfig):
- The entire pipeline is configured using the CdlConfig struct.
- This struct uses a builder pattern (with_* methods) to set up configurations for each stage, such as data loading, feature selection, the discovery algorithm, and analysis thresholds.
- Initialization (
CDL<NoData>):
- The pipeline starts in the NoData state, created via CDL::new() or CDL::with_config(config).
- Data Loading (
load_data):
- Transition: NoData -> WithData
- Action: Loads data from a source (e.g., CSV, Parquet) into a CausalTensor.
- Implementations: CsvDataLoader, ParquetDataLoader.
- Data Cleaning & Feature Selection (
feature_select):
- Transition: WithData -> WithFeatures
- Action: This is a mandatory step that prepares the data and selects the
most relevant features for analysis.
- First, it internally uses OptionNoneDataCleaner to convert the tensor to CausalTensor<Option>, which handles missing or NaN values by converting them to None. This is crucial for robust statistical analysis in the subsequent steps.
- Then, it applies a feature selection algorithm to reduce dimensionality.
- Implementation: MrmrFeatureSelector (Minimum Redundancy Maximum Relevance).
- Causal Discovery (
causal_discovery):
- Transition: WithFeatures -> WithCausalResults
- Action: Executes the core causal discovery algorithm on the selected features.
- Implementation: SurdCausalDiscovery, which uses the surd_states_cdl algorithm to decompose causal influences into Synergistic, Unique, and Redundant (SURD) components. The output is a SurdResult.
- Analysis (
analyze):
- Transition: WithCausalResults -> WithAnalysis
- Action: Interprets the raw numerical output from the discovery algorithm into a human-readable analysis. It uses thresholds from AnalyzeConfig to classify the strength of causal influences.
- Implementation: SurdResultAnalyzer, which generates a report with recommendations (e.g., "Strong unique influence... Recommended: Direct edge in CausaloidGraph").
- Finalization (
finalize):
- Transition: WithAnalysis -> Finalized
- Action: Formats the analysis report into a final output string.
- Implementation: ConsoleFormatter, which prepares the text for printing.
- Execution (
buildandrun):
- The build() method is called on a Finalized pipeline to create an executable CDLRunner.
- The run() method on the CDLRunner executes the process and returns the final ProcessFormattedResult.
Installation
Add deep_causality_discovery to your Cargo.toml file:
Usage
Here's a basic example demonstrating how to use the CDL pipeline to discover causal relationships from a CSV file:
use *;
use File;
use Write;
Error Handling
The crate employs a comprehensive error handling strategy, defining specific error types for each stage of the CDL
pipeline (e.g., DataError, FeatureSelectError, CausalDiscoveryError). This allows for precise identification and
handling of issues, ensuring robust and reliable causal discovery workflows.
From Discovery to Model: Connecting CDL to DeepCausality
The deep_causality_discovery crate acts as a crucial bridge, transforming observational data into the foundational
elements for building executable causal models with the DeepCausality library. The insights gained from the SURD-states
algorithm directly inform the design of your CausaloidGraph and the internal logic of individual Causaloids:
- Structuring the
CausaloidGraph: Strong unique influences suggest direct causal links (Causaloid(Source) -> Causaloid(Target)). Significant synergistic influences indicate that multiple sources are jointly required to cause an effect, guiding the creation of many-to-one connections. - Defining
CausaloidLogic: State-dependent maps from the SURD analysis provide precise conditional logic for aCausaloid'scausal_fn, allowing you to programmatically capture how causal influences vary with system states. - Modeling Multi-Causal Interactions: The detection of synergistic, unique, and redundant influences directly
informs the choice of
AggregateLogicwithinCausaloidCollections. For instance, strong synergy might map toAggregateLogic::All(conjunction), while unique or redundant influences could suggestAggregateLogic::Any( disjunction).
👨💻👩💻 Contribution
Contributions are welcomed especially related to documentation, example code, and fixes. If unsure where to start, just open an issue and ask.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in deep_causality by you, shall be licensed under the MIT licence, without any additional terms or conditions.
📜 Licence
This project is licensed under the MIT license.
👮️ Security
For details about security, please read the security policy.
💻 Author
- Marvin Hansen.
- Github GPG key ID: 369D5A0B210D39BC
- GPG Fingerprint: 4B18 F7B2 04B9 7A72 967E 663E 369D 5A0B 210D 39BC