data-protocol-validator 0.1.0

Rust validator for Data Protocol schemas - validates versioned bioinformatics analysis output against JSON Schema-based protocol definitions
Documentation
# data-protocol-validator

[![Crates.io](https://img.shields.io/crates/v/data-protocol-validator.svg)](https://crates.io/crates/data-protocol-validator)
[![Documentation](https://docs.rs/data-protocol-validator/badge.svg)](https://docs.rs/data-protocol-validator)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

Rust validator for Data Protocol schemas - validates versioned bioinformatics analysis output against JSON Schema-based protocol definitions.

## Overview

Bioinformatics analysis programs evolve frequently, producing structural changes in their output data across versions. The Data Protocol format provides a versioned, machine-readable definition that describes the expected shape of output data from specific analysis pipelines.

This Rust implementation validates data against Data Protocol schemas, which are based on a strict subset of JSON Schema (draft 2020-12) augmented with domain-specific extensions for bioinformatics use cases.

## Features

- ✅ Full conformance with Data Protocol 1.0 specification
- ✅ Validates data against versioned protocol schemas
- ✅ Supports all standard JSON Schema validation keywords (type, properties, required, etc.)
- ✅ Format validation (date, date-time, email, uri, uuid)
- ✅ Composition keywords (allOf, anyOf, oneOf)
- ✅ Reference resolution ($ref, $defs)
- ✅ Custom extensions (x-display-name, x-unit, x-deprecated, etc.)
- ✅ Detailed error messages with suggestions for fixes
- ✅ Partial validation mode for validating specific paths
- ✅ Validation statistics (fields checked, valid, invalid)

## Installation

Add this to your `Cargo.toml`:

```toml
[dependencies]
data-protocol-validator = "0.1"
```

## Quick Start

```rust
use data_protocol_validator::{validate, ValidationOptions};
use serde_json::json;

fn main() {
    // Define a protocol
    let protocol = json!({
        "$protocol": "data-protocol/1.0",
        "name": "gut-microbiome-report",
        "version": "1.0.0",
        "schema": {
            "type": "object",
            "properties": {
                "sample_id": { "type": "string" },
                "abundance": { "type": "number", "minimum": 0, "maximum": 100 }
            },
            "required": ["sample_id", "abundance"]
        }
    });

    // Data to validate
    let data = json!({
        "sample_id": "S001",
        "abundance": 42.5
    });

    // Validate
    let result = validate(&data, &protocol, None);

    if result.valid {
        println!("✓ Data is valid!");
        println!("Stats: {} fields checked, {} valid", 
            result.stats.fields_checked, 
            result.stats.fields_valid);
    } else {
        println!("✗ Validation failed:");
        for error in result.errors {
            println!("  - {}: {}", error.code, error.message);
        }
    }
}
```

## Usage Examples

### Basic Validation

```rust
use data_protocol_validator::validate;
use serde_json::json;

let protocol = json!({
    "$protocol": "data-protocol/1.0",
    "name": "example",
    "version": "1.0.0",
    "schema": {
        "type": "object",
        "properties": {
            "name": { "type": "string", "minLength": 1 },
            "age": { "type": "integer", "minimum": 0 }
        },
        "required": ["name"]
    }
});

let data = json!({ "name": "Alice", "age": 30 });
let result = validate(&data, &protocol, None);
assert!(result.valid);
```

### Partial Validation

Validate only specific paths in your data:

```rust
use data_protocol_validator::{validate, ValidationOptions};

let options = Some(ValidationOptions {
    mode: "partial".to_string(),
    paths: vec!["/species".to_string(), "/genus".to_string()],
});

let result = validate(&data, &protocol, options);
```

### Schema-Only Validation

Validate against a bare schema without a protocol envelope:

```rust
use data_protocol_validator::validate_schema;

let schema = json!({
    "type": "string",
    "format": "email"
});

let data = json!("user@example.com");
let result = validate_schema(&data, &schema, None);
assert!(result.valid);
```

### Error Handling with Suggestions

```rust
let result = validate(&data, &protocol, None);

for error in result.errors {
    println!("Error: {} at {}", error.message, error.path);
    
    if let Some(suggestion) = error.suggestion {
        println!("Suggestion: {}", suggestion.message);
        if let Some(fix) = suggestion.fix {
            println!("Suggested fix: {}", fix);
        }
    }
}
```

## Validation Options

```rust
pub struct ValidationOptions {
    /// Validation mode: "full" or "partial"
    pub mode: String,
    
    /// Paths to validate in partial mode (e.g., ["/species", "/genus"])
    pub paths: Vec<String>,
}
```

## Validation Result

```rust
pub struct ValidationResult {
    /// Whether the data is valid (no errors)
    pub valid: bool,
    
    /// Validation mode used
    pub mode: String,
    
    /// List of validation errors and warnings
    pub errors: Vec<ValidationError>,
    
    /// Validation statistics
    pub stats: ValidationStats,
}

pub struct ValidationStats {
    /// Total number of fields checked
    pub fields_checked: u64,
    
    /// Number of valid fields
    pub fields_valid: u64,
    
    /// Number of invalid fields
    pub fields_invalid: u64,
}
```

## Error Codes

The validator produces standardized error codes:

- `E001`: Type mismatch
- `E002`: Missing required property
- `E003`: Additional property not allowed
- `E004`: String constraint violation
- `E005`: Number constraint violation
- `E006`: Array constraint violation
- `E007`: Object constraint violation
- `E008`: Format validation failure
- `E009`: Enum/const violation
- `E010`: Composition failure (allOf/anyOf/oneOf)
- `E011`: Reference resolution failure
- `W001`: Deprecated field warning

## Supported JSON Schema Keywords

### Core
- `type`, `properties`, `required`, `additionalProperties`
- `items`, `$ref`, `$defs`

### String Constraints
- `minLength`, `maxLength`, `pattern`, `format`

### Numeric Constraints
- `minimum`, `maximum`, `exclusiveMinimum`, `exclusiveMaximum`, `multipleOf`

### Array Constraints
- `minItems`, `maxItems`, `uniqueItems`

### Object Constraints
- `minProperties`, `maxProperties`

### Enum and Const
- `enum`, `const`

### Composition
- `allOf`, `anyOf`, `oneOf`

### Custom Extensions
- `x-display-name`, `x-unit`, `x-sort-key`, `x-sort-order`
- `x-deprecated`, `x-tags`

## Format Validation

Supports the following format validators:

- `date`: ISO 8601 date (YYYY-MM-DD)
- `date-time`: RFC 3339 date-time
- `email`: RFC 5322 email address
- `uri`: RFC 3986 URI
- `uuid`: UUID (case-insensitive)

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## Links

- [Repository]https://github.com/oonxt/data-schema
- [Documentation]https://docs.rs/data-protocol-validator
- [Crates.io]https://crates.io/crates/data-protocol-validator

## Related Projects

- [TypeScript Validator]https://github.com/oonxt/data-schema/tree/main/packages/validator-ts
- [Protocol Diff Tool]https://github.com/oonxt/data-schema/tree/main/packages/protocol-diff
- [Protocol Inference]https://github.com/oonxt/data-schema/tree/main/packages/protocol-infer