nautilus-orm-schema 0.1.4

# nautilus-schema

Schema language parser, validator, IR builder, and formatter for the Nautilus ORM.

## Overview

This crate implements the full processing pipeline for `.nautilus` schema files:

1. **Lex** — Converts source text into typed tokens with byte-offset span tracking.
2. **Parse** — Recursive descent parser builds a strongly-typed [`ast::Schema`] AST.
3. **Validate** — Multi-pass semantic validator resolves types, relations, and constraints, emitting a fully resolved [`ir::SchemaIr`].
4. **Format** — Renders an AST back to canonical source text (idempotent).

Editor tooling (LSP, CLI) can use the one-shot [`analyze`] function which runs all four stages and returns structured diagnostics.

## Features

- **Parser** — datasources, generators, models, enums, all field types, all attributes, error recovery
- **Validator** — duplicate name detection, unknown type resolution, relation integrity, default value type checking, physical name collision detection
- **IR** — fully resolved intermediate representation with logical and physical names
- **Analysis API** — `analyze()`, `completion()`, `hover()`, `goto_definition()` for editor integration
- **Formatter** — canonical output, column-aligned, idempotent
- **Visitor** — trait-based AST traversal with default walk implementations
- **Span Tracking** — every AST node carries byte-offset spans for precise diagnostics
- **Error Recovery** — parser continues after errors to surface multiple issues at once

## Quick Start

Add to your `Cargo.toml`:

```toml
[dependencies]
nautilus-schema = { path = "../nautilus-schema" }
```

### One-shot analysis

The recommended entry point.  Runs lex → parse → validate in a single call and
collects every diagnostic:

```rust
use nautilus_schema::analyze;

let result = analyze(source);
for diag in &result.diagnostics {
    eprintln!("{:?} — {}", diag.severity, diag.message);
}
if let Some(ir) = &result.ir {
    println!("{} model(s) validated", ir.models.len());
}
```

### Validate and obtain the IR directly

```rust
use nautilus_schema::{Lexer, Parser, validate_schema, TokenKind};

let mut lexer = Lexer::new(source);
let mut tokens = Vec::new();
loop {
    let token = lexer.next_token()?;
    let eof = matches!(token.kind, TokenKind::Eof);
    tokens.push(token);
    if eof { break; }
}
let ast = Parser::new(&tokens).parse_schema()?;
let ir  = validate_schema(ast)?;
println!("{} model(s)", ir.models.len());
```

### Editor tooling API

The `analysis` module exposes higher-level functions for LSP servers and CLI tools:

```rust
use nautilus_schema::{analyze, completion, hover, goto_definition};

// Completions at a byte offset
let items = completion(source, offset);

// Hover documentation at a byte offset
if let Some(info) = hover(source, offset) {
    println!("{}", info.content);
}

// Jump to the declaration a symbol at a byte offset refers to
if let Some(span) = goto_definition(source, offset) {
    println!("definition at {}..{}", span.start, span.end);
}
```

### Formatting

`format_schema` renders an AST back to canonical, column-aligned source text:

```rust
use nautilus_schema::{analyze, format_schema};

let result = analyze(source);
if let Some(ast) = &result.ast {
    let formatted = format_schema(ast, source);
    std::fs::write("schema.nautilus", formatted)?;
}
```

## Visitor Pattern

Implement custom visitors for AST traversal and analysis:

```rust
use nautilus_schema::{
    ast::*,
    visitor::{Visitor, walk_model},
    Result,
};

struct ModelCounter {
    count: usize,
}

impl Visitor for ModelCounter {
    fn visit_model(&mut self, model: &ModelDecl) -> Result<()> {
        self.count += 1;
        walk_model(self, model) // Continue traversing children
    }
}

fn count_models(schema: &Schema) -> Result<usize> {
    let mut visitor = ModelCounter { count: 0 };
    visitor.visit_schema(schema)?;
    Ok(visitor.count)
}
```

### Common Visitor Patterns

**Collecting Information:**
```rust
struct FieldCollector {
    fields: Vec<String>,
}

impl Visitor for FieldCollector {
    fn visit_field(&mut self, field: &FieldDecl) -> Result<()> {
        self.fields.push(field.name.value.clone());
        Ok(())
    }
}
```

**Finding Specific Patterns:**
```rust
struct RelationFinder {
    relations: Vec<String>,
}

impl Visitor for RelationFinder {
    fn visit_field(&mut self, field: &FieldDecl) -> Result<()> {
        if field.has_relation_attribute() {
            self.relations.push(field.name.value.clone());
        }
        Ok(())
    }
}
```

**Validation:**
```rust
struct NameValidator {
    errors: Vec<String>,
}

impl Visitor for NameValidator {
    fn visit_model(&mut self, model: &ModelDecl) -> Result<()> {
        if !model.name.value.chars().next().unwrap().is_uppercase() {
            self.errors.push(format!("Model {} should start with uppercase", model.name));
        }
        walk_model(self, model)
    }
}
```

## Schema Language

### Supported Declarations

**Datasource:**
```prisma
datasource db {
  provider = "postgresql"
  url      = env("DATABASE_URL")
}
```

**Generator:**
```prisma
generator client {
  provider = "nautilus-client-rs"
  output   = "../generated" // Optional
  interface = "async" // default: sync
}
```

**Enum:**
```prisma
enum Role {
  USER
  ADMIN
  MODERATOR
}
```

**Model:**
```prisma
model User {
  id        Int      @id @default(autoincrement())
  email     String   @unique
  username  String   @map("user_name")
  role      Role     @default(USER)
  createdAt DateTime @default(now())
  
  posts     Post[]
  
  @@map("users")
  @@index([email])
}
```

### Field Types

**Scalar Types:**
- `String`, `Boolean`, `Int`, `BigInt`, `Float`
- `DateTime`, `Bytes`, `Json`, `Uuid`
- `Decimal(precision, scale)` - e.g., `Decimal(10, 2)`

**User Types:**
- Model references: `Post`, `User`
- Enum references: `Role`, `Status`

**Modifiers:**
- `?` - optional/nullable field
- `[]` - array/one-to-many relation

### Field Attributes

- `@id` - Primary key
- `@unique` - Unique constraint
- `@default(expr)` - Default value
  - Functions: `autoincrement()`, `uuid()`, `now()`
  - Literals: `0`, `"DEFAULT"`, `true`
  - Enum values: `USER`, `ACTIVE`
- `@map("physical_name")` - Physical column name
- `@relation(...)` - Foreign key relationship
  ```prisma
  user User @relation(
    fields: [userId],
    references: [id],
    onDelete: Cascade,
    onUpdate: Restrict
  )
  ```

### Model Attributes

- `@@map("table_name")` - Physical table name
- `@@id([field1, field2])` - Composite primary key
- `@@unique([field1, field2])` - Composite unique constraint
- `@@index([field1, field2])` - Database index, you can also specify the index types

## Error Handling

The `analyze` function collects all diagnostics in one pass.  Each `Diagnostic`
carries a byte-offset `span` that can be converted to line/column:

```rust
use nautilus_schema::{analyze, Severity};

let result = analyze(source);
for diag in &result.diagnostics {
    let (pos, _) = diag.span.to_positions(source);
    let label = match diag.severity {
        Severity::Error   => "error",
        Severity::Warning => "warning",
    };
    eprintln!("{}:{}: {}", pos, label, diag.message);
}
```

For the lower-level `SchemaError` type, each variant with a span exposes
`format_with_file(filepath, source)` which emits the standard
`filepath:line:column: message` format recognised by VS Code.

## Examples

Run the bundled examples:

```bash
cargo run --package nautilus-schema --example parse_schema
cargo run --package nautilus-schema --example tokenize_schema
cargo run --package nautilus-schema --example visitor_demo
```

## Grammar

See [GRAMMAR.md](GRAMMAR.md) for the complete EBNF grammar specification.

Key grammar structures:

```ebnf
Schema ::= Declaration* EOF

Declaration ::= DatasourceDecl | GeneratorDecl | ModelDecl | EnumDecl

ModelDecl ::= 'model' Ident '{' (FieldDecl | ModelAttribute)* '}'

FieldDecl ::= Ident FieldType FieldModifier? FieldAttribute*

FieldType ::= ScalarType | 'Decimal' '(' Number ',' Number ')' | UserType

Expr ::= Literal | FunctionCall | Array | Ident
```

## Testing

Run all tests:

```bash
cargo test --package nautilus-schema
```

Run specific test suites:

```bash
cargo test --package nautilus-schema --test parser_tests
cargo test --package nautilus-schema --test visitor_tests
cargo test --package nautilus-schema --lib
```

Test coverage:
- **57 unit tests** embedded in each module
- **17** integration tests for the analysis API (`analysis_tests.rs`)
- **14** integration tests for the IR builder (`ir_tests.rs`)
- **23** integration tests for the lexer (`lexer_tests.rs`)
- **16** integration tests for the parser (`parser_tests.rs`)
- **24** integration tests for the validator (`validation_tests.rs`)
- **12** integration tests for the visitor (`visitor_tests.rs`)

## Documentation

Generate and view documentation:

```bash
cargo doc --package nautilus-schema --open
```

Documentation includes:
- Module-level overviews
- Comprehensive API docs
- Usage examples
- Grammar reference

## Architecture

```
nautilus-schema/
├── src/
│   ├── lib.rs         # Public API re-exports
│   ├── span.rs        # Byte-offset source location types
│   ├── token.rs       # Token types
│   ├── lexer.rs       # Tokenizer
│   ├── ast.rs         # Syntax AST node definitions
│   ├── parser.rs      # Recursive descent parser with error recovery
│   ├── error.rs       # SchemaError and Result alias
│   ├── diagnostic.rs  # Severity + Diagnostic (stable public contract)
│   ├── validator.rs   # Multi-pass semantic validator
│   ├── ir.rs          # Validated intermediate representation
│   ├── visitor.rs     # Visitor trait and walk helpers
│   ├── formatter.rs   # Canonical source formatter
│   └── analysis.rs    # analyze / completion / hover / goto_definition
├── tests/
│   ├── lexer_tests.rs       # 23 lexer integration tests
│   ├── parser_tests.rs      # 16 parser integration tests
│   ├── validation_tests.rs  # 24 validator integration tests
│   ├── ir_tests.rs          # 14 IR builder integration tests
│   ├── visitor_tests.rs     # 12 visitor integration tests
│   └── analysis_tests.rs    # 17 analysis API integration tests
├── examples/
│   ├── tokenize_schema.rs
│   ├── parse_schema.rs
│   └── visitor_demo.rs
├── GRAMMAR.md        # EBNF grammar specification
└── README.md
```

## Usage within the project

- **`nautilus-lsp`** — uses `analyze`, `completion`, `hover`, `goto_definition`, and `format_schema` to implement the language server.
- **`nautilus-codegen`** — calls `validate_schema` to obtain the `SchemaIr` from which it generates Rust types and SQL.
- **`nautilus-cli`** — calls `analyze` to surface diagnostics and `format_schema` for the format command.

## License

Licensed under either of:

- Apache License, Version 2.0 ([LICENSE-APACHE](../../LICENSE-APACHE))
- MIT License ([LICENSE-MIT](../../LICENSE-MIT))

at your option.

## Contributing

This is part of the Nautilus ORM project. See the main repository for contribution guidelines.