invr 0.2.1

Declarative data validation engine using invariants executed on Polars DataFrames.
docs.rs failed to build invr-0.2.1
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.

invr

Declarative data validation engine for Rust.

Define invariants (validation rules) and evaluate them against a dataset using a typed execution engine.

Features

  • 33 built-in invariant types (nullability, uniqueness, numeric, string, date, relational, statistical, …)
  • Lazy Polars execution backend
  • Load specs from YAML
  • Engine-agnostic core — bring your own backend
  • Fully typed: no stringly-typed rule names

Installation

[dependencies]
invr = { version = "0.2", features = ["polars"] }

To also load specs from YAML:

invr = { version = "0.2", features = ["polars", "yaml"] }

Quick start

Programmatic spec

use invars::prelude::*;
use polars::prelude::*;

let df = df![
    "age" => [25, 30, 45],
    "email" => ["a@b.com", "c@d.com", "e@f.com"],
]?;

let spec = Spec::from_invariants(vec![
    Invariant::new(
        InvariantId::new("age_not_null")?,
        PolarsKind::NotNull,
        Scope::Column { name: "age".into() },
    ),
    Invariant::new(
        InvariantId::new("row_count_min")?,
        PolarsKind::RowCountMin,
        Scope::Dataset,
    )
    .with_param_value("min", "1"),
]);

let runner = RunSpec::new(EnginePolarsDataFrame);
let report = runner.run(&df, &spec)?;

if report.failed() {
    for v in report.errors() {
        eprintln!("violation: {}", v.reason());
    }
}

YAML spec

# spec.yaml
invariants:
  - id: age_not_null
    kind: not_null
    scope:
      type: column
      name: age

  - id: email_unique
    kind: unique
    scope:
      type: column
      name: email
    severity: error

  - id: row_count_check
    kind: row_count_min
    scope:
      type: dataset
    params:
      min: "10"
use invars::prelude::*;

let yaml = std::fs::read_to_string("spec.yaml")?;
let spec = spec_from_str(&yaml)?;

let runner = RunSpec::new(EnginePolarsDataFrame);
let report = runner.run(&df, &spec)?;

Invariant types

Category Kinds
Nullability not_null, null_ratio_max
Uniqueness unique, composite_unique, duplicate_ratio_max
Row count row_count_min, row_count_max, row_count_between
Structure column_exists, column_missing, dtype_is, schema_equals
Numeric value_min, value_max, value_between, mean_between, stddev_max, sum_between
Date / Time date_between, no_future_dates, monotonic_increasing, no_gaps_in_sequence
String regex_match, string_length_min, string_length_max, string_length_between
Domain allowed_values, forbidden_values
Statistical outlier_ratio_max, percentile_between
Relational foreign_key, column_equals, conditional_not_null
Custom custom_expr

Report API

report.failed()          // true if any Error-severity violation exists
report.violations()      // all violations
report.errors()          // iterator over Error violations
report.warnings()        // iterator over Warn violations
report.error_count()     // number of Error violations
report.metrics()         // execution_time_ms, total_invariants, violations

Severity

Each invariant defaults to Error. Override with:

invariant.with_severity(Severity::Warn)

Or in YAML:

severity: warn   # info | warn | error

Feature flags

Feature Description
polars Enables the Polars execution engine
yaml Enables loading specs from YAML strings

License

Apache-2.0