1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
//! Data quality assessment for ML pipelines
//!
//! Detects data quality issues including missing values, outliers,
//! duplicates, and schema problems.
//!
//! # 100-Point Quality Scoring System (GH-6)
//!
//! Based on the Toyota Way principles of Jidoka (built-in quality) and
//! the Doctest Corpus QA Checklist for Publication.
//!
//! ## Severity Weights
//! - **Critical (2.0x)**: Blocks publication - data integrity failures
//! - **High (1.5x)**: Major issues requiring immediate attention
//! - **Medium (1.0x)**: Standard issues to address before publication
//! - **Low (0.5x)**: Minor issues, informational
//!
//! ## Letter Grades
//! - **A (95-100)**: Publish immediately
//! - **B (85-94)**: Publish with documented caveats
//! - **C (70-84)**: Remediation required before publication
//! - **D (50-69)**: Major rework needed
//! - **F (<50)**: Do not publish
//!
//! # Example
//!
//! ```ignore
//! use alimentar::quality::{QualityChecker, QualityScore};
//!
//! let checker = QualityChecker::new()
//! .max_null_ratio(0.1)
//! .max_duplicate_ratio(0.05);
//!
//! let report = checker.check(&dataset)?;
//! let score = QualityScore::from_report(&report);
//! println!("Grade: {} ({})", score.grade, score.score);
//! ```
//!
//! # References
//! - [1] Batini & Scannapieco (2016). Data and Information Quality.
//! - [6] Hynes et al. (2017). The Data Linter. NIPS Workshop on ML Systems.
// Statistical computation and internal methods
// Re-export scoring types
// Re-export check types
pub use ;
// Re-export decontamination types
pub use ;
// Re-export profile types
pub use QualityProfile;
pub use ;