forge-guardrails 0.1.2

Foundation types for an LLM-agent workflow framework
Documentation
# Guardrails Development & Safety Guidelines

When modifying files under [src/guardrails](file:///Users/whit3rabbit/Documents/GitHub/forge-rs/src/guardrails), you must respect our safety gates and model training structures:

## Gotchas & Rules

1. **Conservative Verifier Promotion**:
   - The classifier has three runtime modes: `shadow`, `advisory`, and `enforce`.
   - Never skip steps when promoting new models or label thresholds. Validate new models in `shadow` first, move to `advisory` only when replay telemetry matches performance targets, and transition to `enforce` only after explicit confirmation.
2. **Authoritative Deterministic Validation**:
   - Probabilistic predictions (e.g., `deterministic_invalid` label predictions or threshold checks) must never bypass or replace deterministic code validation.
   - If a step is blocked by deterministic rules in [step_enforcer.rs]file:///Users/whit3rabbit/Documents/GitHub/forge-rs/src/guardrails/step_enforcer.rs or fails JSON schema validation in [response_validator.rs]file:///Users/whit3rabbit/Documents/GitHub/forge-rs/src/guardrails/response_validator.rs, those failures are authoritative.
3. **Six-Label Schema Layout**:
   - Tool-call classifier artifacts must strictly map to these six labels in this exact order:
     1. `valid`
     2. `wrong_tool_semantic`
     3. `wrong_arguments_semantic`
     4. `tool_not_needed`
     5. `needs_clarification`
     6. `deterministic_invalid`
   - Modifying this layout breaks compatibility with upstream serialized ONNX models.
4. **Recovery Errors as wrong_arguments_semantic**:
   - When a model retries/recovers a tool invocation and calls the *correct* tool but fails semantic checks, that violation must be classified as `wrong_arguments_semantic` (not `wrong_tool_semantic`).

## Testing Target

When modifying guardrails files, run:

```bash
cargo test --package forge-guardrails --lib guardrails
```