versa_semval 0.12.0

Cross-platform module for semantic validation of Versa data
Documentation
# CLAUDE.md - Semval Repository Context

This file provides comprehensive context about the semval repository for AI assistants and future maintainers.

## Repository Overview

**Semval** is a semantic validation library for Versa receipts, written in Rust with Node.js bindings via NAPI-RS. It validates that receipt data conforms to business logic rules beyond what JSON schema validation can achieve.

### Architecture

- **Language**: Rust (core) + TypeScript (tests/bindings)
- **Distribution**: npm package with native modules for multiple platforms
- **Purpose**: Semantic validation of Versa receipt data structures
- **Integration**: Used by Versa custodial service and official Docker images, as well as in NextJS serverless functions for JavaScript validation in the Interactive Studio

## Core Components

### src/lib.rs
- Main entry point exposing `run_semantic_validation()` function
- Handles Node.js feature gating with `#[cfg(feature = "nodejs")]`
- Returns `SemanticValidationOutput` with violations and optional errors

### src/validation.rs
- Orchestrates rule execution
- Applies all registered rules to input data
- Collects and returns validation violations

### src/rules.rs
- Central registry of all validation rules
- Each rule has: name, description, and validation function
- Currently implements 7 rules:
  1. `header_paid_should_be_sum_of_payments`,
  2. `all_headers_should_have_tz`,
  3. `flight_fare_at_ticket_or_segment_exclusively`,
  4. `header_total_should_be_sum_of_all_line_items_taxes_and_adjustments`,
  5. `itemization_should_should_not_be_empty_if_totals_greater_than_zero`,
  6. `schema_version_should_be_current`,
  7. `subtotal_should_equal_sum_of_line_items`,

### src/model.rs
- Data structures for validation output
- `ViolationDetails` and `SemanticValidationOutput` types
- Shared between Rust and TypeScript via serialization

## Validation Rules Detail

### Rule: subtotal_should_equal_sum_of_line_items
**File**: `src/subtotal.rs`
**Purpose**: Validates that header.subtotal equals sum of all line item amounts
**Complexity**: Handles all 7 itemization types with different field structures

**Itemization Types Supported**:
- `general`: items[].amount
- `ecommerce`: shipments[].items[].amount + invoice_level_line_items[].amount  
- `lodging`: items[].amount
- `car_rental`: items[].amount
- `subscription`: subscription_items[].amount
- `flight`: tickets[].fare OR sum of segments[].fare
- `transit_route`: transit_route_items[].fare

### Rule: header_total_should_be_sum_of_all_line_items_taxes_and_adjustments  
**File**: `src/total.rs`
**Purpose**: Validates header.total = line_items + taxes + adjustments + invoice_level_adjustments
**Complexity**: Most complex rule, handles all itemization types + tax/adjustment summation

**Key Schema Compliance Notes**:
- Uses `amount` field for most items (NOT `total`)
- Uses `fare` field for flight tickets/segments and transit_route_items
- Ecommerce items nested under `shipments[].items[]`
- Transit route uses `transit_route_items` field (NOT `items`)
- Subscription uses `subscription_items` field
- Tax/adjustment amounts always use `amount` field

### Rule: schema_version_is_current
**File**: `src/schema_version.rs`  
**Purpose**: Validates schema_version field matches current expected version
**Implementation**: Simple string comparison

## Schema Compliance

**Critical**: All validation logic MUST conform to `/Users/thomas/versa/schema/data/receipt.schema.json`

### Common Schema Gotchas:
1. **Item amounts**: Use `amount` field, NOT `total` (except for fares)
2. **Ecommerce structure**: Items nested under `shipments[].items[]`
3. **Transit route**: Uses `transit_route_items[]` with `fare` field
4. **Flight tickets**: No `subtotal` field exists - use `fare` only
5. **Subscription**: Uses `subscription_items[]` field

## Testing Strategy

### Rust Unit Tests (`#[cfg(test)]`)
- Each rule module has comprehensive unit tests
- Tests cover valid/invalid cases for all itemization types
- Use `serde_json::json!` macro for test data

### TypeScript Integration Tests (`__test__/`)
- Test Node.js bindings via NAPI-RS
- Files: `index.spec.ts`, `flight_rules.spec.ts`, `subtotal.spec.ts`, etc.
- Use AVA test framework with TypeScript
- You will need to run `pnpm build` before `pnpm test`

### Test Data
- `__test__/test_data/` contains fixture files
- `flight_good.json` - canonical example of proper schema structure
- Always ensure test data conforms to actual Versa schema

## Build & Distribution

### Rust Build
```bash
cargo test          # Run Rust tests
cargo build         # Build Rust library
```

### Node.js Build  
```bash
npm test           # Run TypeScript tests + Rust compilation
npm run build      # Build release native modules
```

### Platform Support
- Extensive platform matrix via NAPI-RS
- Pre-built binaries for: macOS (ARM64/x64), Linux (multiple architectures), Windows, Android, FreeBSD
- Published to npm as `@versaprotocol/semval`

## Development Workflow

### Code Style
- Rust: Uses `rustfmt.toml` configuration
- Auto-formatting via `fmt_on_commit.sh` git hook
- Clippy linting enabled with `#![deny(clippy::all)]`

### Git Hooks
- Pre-commit formatting via `rusty-hook.toml`
- TODO: Conventional commit enforcement (mentioned in README)

## Common Patterns

### Adding New Validation Rules

1. **Create rule module**: `src/my_rule.rs`
2. **Implement function**: `pub fn my_rule(data: &serde_json::Value) -> Option<ViolationDetails>`
3. **Add to lib.rs**: `mod my_rule;` 
4. **Register in rules.rs**: Add to `get_rules()` function
5. **Add tests**: Both Rust unit tests and TypeScript integration tests

### Schema Field Access Pattern
```rust
// Safe field access with defaults
fn get_i64(value: &serde_json::Value, key: &str) -> i64 {
  value.get(key).and_then(|v| v.as_i64()).unwrap_or(0)
}

// Check itemization type
if let Some(general) = itemization.get("general") {
  if let Some(items) = general.get("items").and_then(|i| i.as_array()) {
    // Process items...
  }
}
```

### Error Handling Philosophy
- Return `Option<ViolationDetails>` from rule functions
- `None` = validation passed
- `Some(ViolationDetails { details: Some(msg) })` = validation failed with details
- Use `unwrap_or(0)` for missing numeric fields (treat as zero)

## Known Issues & Technical Debt

1. **DRY Violation**: Itemization processing logic duplicated across rules
2. **Limited Error Context**: `unwrap_or(0)` may hide data issues  
3. **Missing Schema Validation**: No runtime schema validation before semantic rules
4. **Hardcoded Constants**: Field names scattered throughout code
5. **Test Coverage**: No automated coverage reporting

## Future Improvements (from README)

- Schema version awareness and stricter use of types
- Rule-test mapping validation in CI
- Conventional commit enforcement  
- Standardized violation detail formatting
- Optional quality scoring for data completeness
- WASM support for browser usage

## Integration Context

This library is used by:
- Versa custodial service (Rust backend)
- Official Versa Docker images  
- Node.js applications processing Versa receipts

The validation runs after JSON schema validation but before business logic processing.

## Debugging Tips

1. **Test with real schema**: Always validate test data against `/Users/thomas/versa/schema/data/receipt.schema.json`
2. **Check field names**: Common mistakes involve `total` vs `amount` vs `fare`
3. **Itemization paths**: Verify correct nesting (e.g., ecommerce shipments)
4. **Run both test suites**: Rust and TypeScript tests catch different issues
5. **Use violation details**: Error messages include calculated vs expected values


## Dependencies

- `serde_json`: JSON parsing and manipulation
- `versa`: Versa protocol types and utilities  
- `napi`: Node.js native module bindings
- Development: `ava`, `typescript`, `prettier`

---

*Last updated: June 2025*
*Maintainer context: This repository has been actively developed with proper semantic validation rules that are schema-compliant and thoroughly tested.*