Skip to main content

Module statistical

Module statistical 

Source
Expand description

Statistical quality evaluation module.

Provides statistical tests and analyses for validating that generated synthetic data follows expected distributions.

§Modules

  • amount_distribution: Log-normal amount distribution analysis
  • benford: Benford’s Law compliance testing
  • line_item: Line item distribution analysis
  • temporal: Temporal pattern analysis
  • correlation: Cross-field correlation analysis
  • anderson_darling: Anderson-Darling goodness-of-fit test
  • chi_squared: Chi-squared goodness-of-fit test
  • drift_detection: Drift detection evaluation and ground truth validation

Structs§

AmountDistributionAnalysis
Results of amount distribution analysis.
AmountDistributionAnalyzer
Analyzer for amount distributions.
AndersonDarlingAnalysis
Anderson-Darling test results.
AndersonDarlingAnalyzer
Analyzer for Anderson-Darling goodness-of-fit tests.
AnomalyData
Anomaly data for realism validation.
AnomalyRealismEvaluation
Results of anomaly realism evaluation.
AnomalyRealismEvaluator
Evaluator for anomaly injection realism.
AnomalyRealismThresholds
Thresholds for anomaly realism.
BenfordAnalysis
Results of Benford’s Law analysis.
BenfordAnalyzer
Analyzer for Benford’s Law compliance.
BinFrequency
Bin frequency information.
ChiSquaredAnalysis
Chi-squared test results.
ChiSquaredAnalyzer
Analyzer for chi-squared goodness-of-fit tests.
CorrelationAnalysis
Full correlation matrix analysis results.
CorrelationAnalyzer
Analyzer for correlation analysis.
CorrelationCheckResult
Result of correlation check for a pair of fields.
CriticalValues
Critical values for Anderson-Darling test at standard significance levels.
DriftDetectionAnalysis
Results from drift detection analysis.
DriftDetectionAnalyzer
Analyzer for drift detection evaluation.
DriftDetectionEntry
A single data point for drift detection analysis.
DriftDetectionMetrics
Drift detection performance metrics.
ExpectedCorrelation
Expected correlation between two fields.
FlowEdge
A directed account-flow edge (money flows srcdst) with a magnitude weight.
LabeledDriftEvent
A labeled drift event from ground truth data.
LabeledEventAnalysis
Analysis of labeled drift events.
LineItemAnalysis
Results of line item distribution analysis.
LineItemAnalyzer
Analyzer for line item distributions.
LineItemEntry
Input for line item analysis.
RelationalFidelityAnalyzer
Computes the RelationalFidelityReport.
RelationalFidelityReport
Relational-structure summary of an account-flow graph.
RelationalFidelityThresholds
Optional reference bands; a metric below its band is flagged “too clean”. All None → no flags.
SecondDigitAnalysis
Results of second-digit Benford’s Law analysis.
StatisticalEvaluation
Combined statistical evaluation results.
TemporalAnalysis
Results of temporal pattern analysis.
TemporalAnalyzer
Analyzer for temporal patterns.
TemporalEntry
Input for temporal analysis.

Enums§

BenfordConformity
Conformity level based on Mean Absolute Deviation (MAD).
BinningStrategy
Binning strategy for continuous data.
DetectionDifficulty
Detection difficulty levels.
DriftEventCategory
Categories of drift events.
ExpectedDistribution
Expected distribution type for comparison.
FittedParameters
Fitted distribution parameters.
TargetDistribution
Target distribution types for Anderson-Darling test.

Functions§

flow_edges_from_entries
Build dominant account-flow edges from journal entries: one directed credit → debit edge per entry (first credit and first debit line), weighted by the entry’s total debit. Entries without both a debit and a credit line are skipped.
pearson_correlation
Calculate Pearson correlation coefficient.
spearman_correlation
Calculate Spearman rank correlation coefficient.