data-protocol-validator 0.1.0

Rust validator for Data Protocol schemas - validates versioned bioinformatics analysis output against JSON Schema-based protocol definitions
Documentation

data-protocol-validator

Crates.io Documentation License: MIT

Rust validator for Data Protocol schemas - validates versioned bioinformatics analysis output against JSON Schema-based protocol definitions.

Overview

Bioinformatics analysis programs evolve frequently, producing structural changes in their output data across versions. The Data Protocol format provides a versioned, machine-readable definition that describes the expected shape of output data from specific analysis pipelines.

This Rust implementation validates data against Data Protocol schemas, which are based on a strict subset of JSON Schema (draft 2020-12) augmented with domain-specific extensions for bioinformatics use cases.

Features

  • ✅ Full conformance with Data Protocol 1.0 specification
  • ✅ Validates data against versioned protocol schemas
  • ✅ Supports all standard JSON Schema validation keywords (type, properties, required, etc.)
  • ✅ Format validation (date, date-time, email, uri, uuid)
  • ✅ Composition keywords (allOf, anyOf, oneOf)
  • ✅ Reference resolution ($ref, $defs)
  • ✅ Custom extensions (x-display-name, x-unit, x-deprecated, etc.)
  • ✅ Detailed error messages with suggestions for fixes
  • ✅ Partial validation mode for validating specific paths
  • ✅ Validation statistics (fields checked, valid, invalid)

Installation

Add this to your Cargo.toml:

[dependencies]
data-protocol-validator = "0.1"

Quick Start

use data_protocol_validator::{validate, ValidationOptions};
use serde_json::json;

fn main() {
    // Define a protocol
    let protocol = json!({
        "$protocol": "data-protocol/1.0",
        "name": "gut-microbiome-report",
        "version": "1.0.0",
        "schema": {
            "type": "object",
            "properties": {
                "sample_id": { "type": "string" },
                "abundance": { "type": "number", "minimum": 0, "maximum": 100 }
            },
            "required": ["sample_id", "abundance"]
        }
    });

    // Data to validate
    let data = json!({
        "sample_id": "S001",
        "abundance": 42.5
    });

    // Validate
    let result = validate(&data, &protocol, None);

    if result.valid {
        println!("✓ Data is valid!");
        println!("Stats: {} fields checked, {} valid", 
            result.stats.fields_checked, 
            result.stats.fields_valid);
    } else {
        println!("✗ Validation failed:");
        for error in result.errors {
            println!("  - {}: {}", error.code, error.message);
        }
    }
}

Usage Examples

Basic Validation

use data_protocol_validator::validate;
use serde_json::json;

let protocol = json!({
    "$protocol": "data-protocol/1.0",
    "name": "example",
    "version": "1.0.0",
    "schema": {
        "type": "object",
        "properties": {
            "name": { "type": "string", "minLength": 1 },
            "age": { "type": "integer", "minimum": 0 }
        },
        "required": ["name"]
    }
});

let data = json!({ "name": "Alice", "age": 30 });
let result = validate(&data, &protocol, None);
assert!(result.valid);

Partial Validation

Validate only specific paths in your data:

use data_protocol_validator::{validate, ValidationOptions};

let options = Some(ValidationOptions {
    mode: "partial".to_string(),
    paths: vec!["/species".to_string(), "/genus".to_string()],
});

let result = validate(&data, &protocol, options);

Schema-Only Validation

Validate against a bare schema without a protocol envelope:

use data_protocol_validator::validate_schema;

let schema = json!({
    "type": "string",
    "format": "email"
});

let data = json!("user@example.com");
let result = validate_schema(&data, &schema, None);
assert!(result.valid);

Error Handling with Suggestions

let result = validate(&data, &protocol, None);

for error in result.errors {
    println!("Error: {} at {}", error.message, error.path);
    
    if let Some(suggestion) = error.suggestion {
        println!("Suggestion: {}", suggestion.message);
        if let Some(fix) = suggestion.fix {
            println!("Suggested fix: {}", fix);
        }
    }
}

Validation Options

pub struct ValidationOptions {
    /// Validation mode: "full" or "partial"
    pub mode: String,
    
    /// Paths to validate in partial mode (e.g., ["/species", "/genus"])
    pub paths: Vec<String>,
}

Validation Result

pub struct ValidationResult {
    /// Whether the data is valid (no errors)
    pub valid: bool,
    
    /// Validation mode used
    pub mode: String,
    
    /// List of validation errors and warnings
    pub errors: Vec<ValidationError>,
    
    /// Validation statistics
    pub stats: ValidationStats,
}

pub struct ValidationStats {
    /// Total number of fields checked
    pub fields_checked: u64,
    
    /// Number of valid fields
    pub fields_valid: u64,
    
    /// Number of invalid fields
    pub fields_invalid: u64,
}

Error Codes

The validator produces standardized error codes:

  • E001: Type mismatch
  • E002: Missing required property
  • E003: Additional property not allowed
  • E004: String constraint violation
  • E005: Number constraint violation
  • E006: Array constraint violation
  • E007: Object constraint violation
  • E008: Format validation failure
  • E009: Enum/const violation
  • E010: Composition failure (allOf/anyOf/oneOf)
  • E011: Reference resolution failure
  • W001: Deprecated field warning

Supported JSON Schema Keywords

Core

  • type, properties, required, additionalProperties
  • items, $ref, $defs

String Constraints

  • minLength, maxLength, pattern, format

Numeric Constraints

  • minimum, maximum, exclusiveMinimum, exclusiveMaximum, multipleOf

Array Constraints

  • minItems, maxItems, uniqueItems

Object Constraints

  • minProperties, maxProperties

Enum and Const

  • enum, const

Composition

  • allOf, anyOf, oneOf

Custom Extensions

  • x-display-name, x-unit, x-sort-key, x-sort-order
  • x-deprecated, x-tags

Format Validation

Supports the following format validators:

  • date: ISO 8601 date (YYYY-MM-DD)
  • date-time: RFC 3339 date-time
  • email: RFC 5322 email address
  • uri: RFC 3986 URI
  • uuid: UUID (case-insensitive)

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Links

Related Projects