pub struct QualityProfile {
pub name: String,
pub description: String,
pub expected_constant_columns: HashSet<String>,
pub nullable_columns: HashSet<String>,
pub max_null_ratio: f64,
pub max_duplicate_ratio: f64,
pub min_cardinality: usize,
pub max_outlier_ratio: f64,
pub max_duplicate_row_ratio: f64,
pub penalize_unexpected_constants: bool,
pub require_signature: bool,
}Expand description
Quality profile for customizing scoring rules per data type.
Different data types (doctest corpora, ML training sets, time series, etc.) have different expectations. For example:
- Doctest corpus:
sourceandversioncolumns are expected to be constant - ML training: features should have high variance, labels can be categorical
- Time series: timestamps should be unique and sequential
§Example
let profile = QualityProfile::doctest_corpus();
let score = profile.score_report(&report);Fields§
§name: StringProfile name for display
description: StringDescription of what this profile is for
expected_constant_columns: HashSet<String>Columns that are expected to be constant (not penalized)
nullable_columns: HashSet<String>Columns where high null ratio is acceptable
max_null_ratio: f64Maximum acceptable null ratio (default: 0.1)
max_duplicate_ratio: f64Maximum acceptable duplicate ratio (default: 0.5)
min_cardinality: usizeMinimum cardinality before flagging as low (default: 2)
max_outlier_ratio: f64Maximum outlier ratio to report (default: 0.05)
max_duplicate_row_ratio: f64Maximum duplicate row ratio (default: 0.01)
penalize_unexpected_constants: boolWhether to penalize constant columns not in expected list
require_signature: boolWhether this profile requires a signature column (for doctest)
Implementations§
Source§impl QualityProfile
impl QualityProfile
Sourcepub fn available_profiles() -> Vec<&'static str>
pub fn available_profiles() -> Vec<&'static str>
List available profile names
Sourcepub fn doctest_corpus() -> Self
pub fn doctest_corpus() -> Self
Doctest corpus profile - for Python doctest extraction datasets.
Expects:
sourceandversioncolumns to be constant (single crate/version)signaturecolumn may have nulls (module-level doctests)input,expected,functionshould be non-null
Sourcepub fn ml_training() -> Self
pub fn ml_training() -> Self
ML training profile - for machine learning datasets.
Expects:
- Features to have reasonable variance
- Labels can be categorical (low cardinality OK)
- No null values in features or labels
Sourcepub fn time_series() -> Self
pub fn time_series() -> Self
Time series profile - for temporal data.
Expects:
- Timestamp column should be unique
- Data should have temporal patterns
Sourcepub fn with_description(self, desc: impl Into<String>) -> Self
pub fn with_description(self, desc: impl Into<String>) -> Self
Set description
Sourcepub fn with_expected_constant(self, column: impl Into<String>) -> Self
pub fn with_expected_constant(self, column: impl Into<String>) -> Self
Add an expected constant column
Sourcepub fn with_nullable(self, column: impl Into<String>) -> Self
pub fn with_nullable(self, column: impl Into<String>) -> Self
Add a nullable column
Sourcepub fn with_max_null_ratio(self, ratio: f64) -> Self
pub fn with_max_null_ratio(self, ratio: f64) -> Self
Set max null ratio
Sourcepub fn with_max_duplicate_ratio(self, ratio: f64) -> Self
pub fn with_max_duplicate_ratio(self, ratio: f64) -> Self
Set max duplicate ratio
Sourcepub fn is_expected_constant(&self, column: &str) -> bool
pub fn is_expected_constant(&self, column: &str) -> bool
Check if a column is expected to be constant
Sourcepub fn is_nullable(&self, column: &str) -> bool
pub fn is_nullable(&self, column: &str) -> bool
Check if a column is allowed to have nulls
Sourcepub fn null_threshold_for(&self, column: &str) -> f64
pub fn null_threshold_for(&self, column: &str) -> f64
Get effective null threshold for a column
Trait Implementations§
Source§impl Clone for QualityProfile
impl Clone for QualityProfile
Source§fn clone(&self) -> QualityProfile
fn clone(&self) -> QualityProfile
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreSource§impl Debug for QualityProfile
impl Debug for QualityProfile
Auto Trait Implementations§
impl Freeze for QualityProfile
impl RefUnwindSafe for QualityProfile
impl Send for QualityProfile
impl Sync for QualityProfile
impl Unpin for QualityProfile
impl UnsafeUnpin for QualityProfile
impl UnwindSafe for QualityProfile
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more