magi-core 0.3.1

LLM-agnostic multi-perspective analysis system inspired by MAGI
Documentation
# Section 03: Validation (`validate.rs`)

## Overview

This section implements the `Validator` struct and `ValidationLimits` config, responsible for validating `AgentOutput` fields before they reach the consensus engine. The validator checks confidence ranges, text field lengths, finding counts, and strips zero-width Unicode characters from finding titles. All validation failures return `MagiError::Validation` with descriptive messages including the field name for diagnostics. The validator is stateless per call -- it holds only precompiled state (regex, limits).

## Dependencies

- **External crates**: `regex = "1"` (must be in `Cargo.toml` under `[dependencies]`)
- **Internal sections**:
  - Section 01 (`error.rs`) -- `MagiError::Validation` for error returns
  - Section 02 (`schema.rs`) -- `AgentOutput`, `Finding` types being validated
- **Standard library**: None beyond what is already used

## Files to Create or Modify

| File | Action |
|------|--------|
| `magi-core/Cargo.toml` | Ensure `regex = "1"` is in `[dependencies]` |
| `magi-core/src/validate.rs` | Create -- contains `Validator` and `ValidationLimits` |
| `magi-core/src/lib.rs` | Add `pub mod validate;` |

## File Header

Every new source file in this project must start with:

```rust
// Author: Julian Bolivar
// Version: 1.0.0
// Date: 2026-04-05
```

## Tests (Write First -- Red Phase)

All tests go in `src/validate.rs` inside a `#[cfg(test)] mod tests` block. Write these tests before any implementation. They must fail (or not compile) until the Green phase.

```rust
#[cfg(test)]
mod tests {
    use super::*;
    use crate::schema::*;

    // Helper: build a valid AgentOutput for baseline tests
    // fn valid_agent_output() -> AgentOutput { ... }

    /// Validator::new creates instance with default limits and compiled regex.
    #[test]
    fn test_validator_new_creates_with_default_limits() {
        // Construct Validator::new()
        // Assert limits match ValidationLimits::default()
    }

    /// Validator::with_limits uses custom limits.
    #[test]
    fn test_validator_with_limits_uses_custom_limits() {
        // Construct custom ValidationLimits with max_findings=5
        // Create Validator::with_limits(custom)
        // Assert limits.max_findings == 5
    }

    // -- BDD Scenario 10: confidence out of range --

    /// Validate rejects confidence > 1.0 with MagiError::Validation.
    #[test]
    fn test_validate_rejects_confidence_above_one() {
        // Build AgentOutput with confidence=1.5
        // Assert validator.validate(&output) is Err(MagiError::Validation(_))
        // Assert error message contains "confidence"
    }

    /// Validate rejects confidence < 0.0 with MagiError::Validation.
    #[test]
    fn test_validate_rejects_confidence_below_zero() {
        // Build AgentOutput with confidence=-0.1
        // Assert validator.validate(&output) is Err(MagiError::Validation(_))
    }

    /// Validate accepts confidence at boundaries (0.0 and 1.0).
    #[test]
    fn test_validate_accepts_confidence_at_boundaries() {
        // Build AgentOutput with confidence=0.0, validate, assert Ok
        // Build AgentOutput with confidence=1.0, validate, assert Ok
    }

    // -- BDD Scenario 11: empty title after strip zero-width --

    /// Validate rejects finding with title composed entirely of zero-width chars.
    #[test]
    fn test_validate_rejects_finding_with_only_zero_width_title() {
        // Build Finding with title = "\u{200B}\u{FEFF}\u{200C}"
        // Build AgentOutput with that finding
        // Assert validator.validate(&output) is Err(MagiError::Validation(_))
        // Assert error message contains "title"
    }

    /// Validate accepts finding with normal title.
    #[test]
    fn test_validate_accepts_finding_with_normal_title() {
        // Build Finding with title = "Security vulnerability"
        // Build AgentOutput with that finding
        // Assert validator.validate(&output) is Ok(())
    }

    // -- BDD Scenario 12: text field exceeds max_text_len --

    /// Validate rejects reasoning exceeding max_text_len.
    #[test]
    fn test_validate_rejects_reasoning_exceeding_max_text_len() {
        // Build AgentOutput with reasoning = "x".repeat(50_001) (default max is 50_000)
        // Assert validator.validate(&output) is Err(MagiError::Validation(_))
        // Assert error message contains "reasoning"
    }

    /// Validate rejects summary exceeding max_text_len.
    #[test]
    fn test_validate_rejects_summary_exceeding_max_text_len() {
        // Build AgentOutput with summary = "x".repeat(50_001)
        // Assert Err with message containing "summary"
    }

    /// Validate rejects recommendation exceeding max_text_len.
    #[test]
    fn test_validate_rejects_recommendation_exceeding_max_text_len() {
        // Build AgentOutput with recommendation = "x".repeat(50_001)
        // Assert Err with message containing "recommendation"
    }

    // -- Additional validation tests --

    /// Validate rejects findings count exceeding max_findings.
    #[test]
    fn test_validate_rejects_findings_count_exceeding_max_findings() {
        // Build AgentOutput with 101 findings (default max is 100)
        // Assert Err with message containing "findings"
    }

    /// Validate rejects finding title exceeding max_title_len.
    #[test]
    fn test_validate_rejects_finding_title_exceeding_max_title_len() {
        // Build Finding with title = "x".repeat(501) (default max is 500)
        // Assert Err
    }

    /// Validate rejects finding detail exceeding max_detail_len.
    #[test]
    fn test_validate_rejects_finding_detail_exceeding_max_detail_len() {
        // Build Finding with detail = "x".repeat(10_001) (default max is 10_000)
        // Assert Err
    }

    /// Validate accepts valid AgentOutput with all fields within limits.
    #[test]
    fn test_validate_accepts_valid_agent_output() {
        // Build valid AgentOutput with reasonable values for all fields
        // Assert validator.validate(&output) is Ok(())
    }

    /// strip_zero_width removes Unicode category Cf characters.
    #[test]
    fn test_strip_zero_width_removes_cf_category_characters() {
        // Create Validator, call strip_zero_width on string with mixed content
        // Assert zero-width chars removed, normal chars preserved
    }
}
```

## Implementation Details (Green Phase)

### `ValidationLimits` Struct

A `#[non_exhaustive]` configuration struct with default values for all validation thresholds.

- **Fields**:
  - `max_findings: usize` -- maximum number of findings per agent output (default: `100`)
  - `max_title_len: usize` -- maximum character length for finding titles (default: `500`)
  - `max_detail_len: usize` -- maximum character length for finding details (default: `10_000`)
  - `max_text_len: usize` -- maximum character length for text fields (summary, reasoning, recommendation) (default: `50_000`)
  - `confidence_min: f64` -- minimum valid confidence value, inclusive (default: `0.0`)
  - `confidence_max: f64` -- maximum valid confidence value, inclusive (default: `1.0`)
- **Derives**: `Debug`, `Clone`
- **Implements**: `Default` with the values specified above

### `Validator` Struct

Holds validation state that is expensive to construct (precompiled regex) and reused across calls.

- **Fields**:
  - `limits: ValidationLimits` -- the active validation limits
  - `zero_width_pattern: Regex` -- precompiled regex for Unicode category Cf characters
- **Constructors**:
  - `new() -> Self` -- creates with `ValidationLimits::default()` and compiles the zero-width regex once
  - `with_limits(limits: ValidationLimits) -> Self` -- creates with custom limits and compiles the regex
- **Public method**:
  - `validate(&self, output: &AgentOutput) -> Result<(), MagiError>` -- calls sub-validators in order: confidence, summary, reasoning, recommendation, findings. Returns on first failure.
- **Private methods**:
  - `validate_confidence(&self, confidence: f64) -> Result<(), MagiError>` -- checks `confidence_min <= confidence <= confidence_max`
  - `validate_text_field(&self, field_name: &str, value: &str) -> Result<(), MagiError>` -- checks `value.len() <= limits.max_text_len`
  - `validate_findings(&self, findings: &[Finding]) -> Result<(), MagiError>` -- checks count <= max_findings, then validates each finding
  - `validate_finding(&self, finding: &Finding) -> Result<(), MagiError>` -- checks title length, detail length, and that stripped title is not empty
  - `strip_zero_width(&self, text: &str) -> String` -- uses `self.zero_width_pattern.replace_all(text, "")` to remove zero-width characters

### Regex Pattern

The zero-width character regex should match Unicode category Cf (format) characters. The pattern covers at minimum:

- U+00AD (soft hyphen)
- U+0600-U+0605, U+061C, U+06DD, U+070F, U+08E2 (Arabic format chars)
- U+180E (Mongolian vowel separator)
- U+200B-U+200F (zero-width space, ZWNJ, ZWJ, directional marks)
- U+202A-U+202E (directional formatting)
- U+2060-U+2064, U+2066-U+206F (invisible operators)
- U+FEFF (byte order mark / zero-width no-break space)
- U+FFF9-U+FFFB (interlinear annotation anchors)

### Validation Order and Error Messages

The `validate` method calls sub-validators in this order:
1. `validate_confidence(output.confidence)` -- error: `"confidence {value} is out of range [{min}, {max}]"`
2. `validate_text_field("summary", &output.summary)` -- error: `"summary exceeds maximum length of {max} characters"`
3. `validate_text_field("reasoning", &output.reasoning)` -- same pattern
4. `validate_text_field("recommendation", &output.recommendation)` -- same pattern
5. `validate_findings(&output.findings)` -- error for count: `"findings count {count} exceeds maximum of {max}"`, for title length: `"finding title exceeds maximum length of {max} characters"`, for empty stripped title: `"finding title is empty after removing zero-width characters"`, for detail length: `"finding detail exceeds maximum length of {max} characters"`

The method returns on the first validation failure (fail-fast). Each error message includes the field name to aid diagnostics.

### `lib.rs` Module Declaration

Add `pub mod validate;` to `src/lib.rs`.

## Constraints Checklist

- No `panic!`, `unwrap()`, `expect()` outside `#[cfg(test)]`
- No `unsafe`
- All public types and methods have `///` Rustdoc
- Error messages include field name and actual/max values for diagnostics
- `Regex` is compiled once in the constructor, not per validation call
- `#[non_exhaustive]` on `ValidationLimits`
- `rustfmt` and `clippy --tests -- -D warnings` clean
- File header present on all new files

## Refactor Phase Notes

After Green phase passes all tests:

- Verify error messages are consistent in style (lowercase after field name, include actual and limit values)
- Consider whether `strip_zero_width` should be public (currently private -- only needed by `validate_finding`)
- Add Rustdoc `///` on the struct, constructor, and validate method
- Confirm `cargo doc --no-deps` generates clean documentation
- Verify that the regex pattern is comprehensive for Unicode Cf category