nautilus-schema

Schema language parser, validator, IR builder, and formatter for the Nautilus ORM.

Overview

This crate implements the full processing pipeline for .nautilus schema files:

Lex — Converts source text into typed tokens with byte-offset span tracking.
Parse — Recursive descent parser builds a strongly-typed [ast::Schema] AST.
Validate — Multi-pass semantic validator resolves types, relations, and constraints, emitting a fully resolved [ir::SchemaIr].
Format — Renders an AST back to canonical source text (idempotent).

Editor tooling (LSP, CLI) can use the one-shot [analyze] function which runs all four stages and returns structured diagnostics.

Features

Parser — datasources, generators, models, enums, all field types, all attributes, error recovery
Validator — duplicate name detection, unknown type resolution, relation integrity, default value type checking, physical name collision detection
IR — fully resolved intermediate representation with logical and physical names
Analysis API — analyze(), completion(), hover(), goto_definition() for editor integration
Formatter — canonical output, column-aligned, idempotent
Visitor — trait-based AST traversal with default walk implementations
Span Tracking — every AST node carries byte-offset spans for precise diagnostics
Error Recovery — parser continues after errors to surface multiple issues at once

Quick Start

Add to your Cargo.toml:

[dependencies]
nautilus-schema = { path = "../nautilus-schema" }

One-shot analysis

The recommended entry point. Runs lex → parse → validate in a single call and collects every diagnostic:

use nautilus_schema::analyze;

let result = analyze(source);
for diag in &result.diagnostics {
    eprintln!("{:?} — {}", diag.severity, diag.message);
}
if let Some(ir) = &result.ir {
    println!("{} model(s) validated", ir.models.len());
}

Validate and obtain the IR directly

use nautilus_schema::{Lexer, Parser, validate_schema, TokenKind};

let mut lexer = Lexer::new(source);
let mut tokens = Vec::new();
loop {
    let token = lexer.next_token()?;
    let eof = matches!(token.kind, TokenKind::Eof);
    tokens.push(token);
    if eof { break; }
}
let ast = Parser::new(&tokens).parse_schema()?;
let ir  = validate_schema(ast)?;
println!("{} model(s)", ir.models.len());

Editor tooling API

The analysis module exposes higher-level functions for LSP servers and CLI tools:

use nautilus_schema::{analyze, completion, hover, goto_definition};

// Completions at a byte offset
let items = completion(source, offset);

// Hover documentation at a byte offset
if let Some(info) = hover(source, offset) {
    println!("{}", info.content);
}

// Jump to the declaration a symbol at a byte offset refers to
if let Some(span) = goto_definition(source, offset) {
    println!("definition at {}..{}", span.start, span.end);
}

Formatting

format_schema renders an AST back to canonical, column-aligned source text:

use nautilus_schema::{analyze, format_schema};

let result = analyze(source);
if let Some(ast) = &result.ast {
    let formatted = format_schema(ast, source);
    std::fs::write("schema.nautilus", formatted)?;
}

Visitor Pattern

Implement custom visitors for AST traversal and analysis:

use nautilus_schema::{
    ast::*,
    visitor::{Visitor, walk_model},
    Result,
};

struct ModelCounter {
    count: usize,
}

impl Visitor for ModelCounter {
    fn visit_model(&mut self, model: &ModelDecl) -> Result<()> {
        self.count += 1;
        walk_model(self, model) // Continue traversing children
    }
}

fn count_models(schema: &Schema) -> Result<usize> {
    let mut visitor = ModelCounter { count: 0 };
    visitor.visit_schema(schema)?;
    Ok(visitor.count)
}

Common Visitor Patterns

Collecting Information:

struct FieldCollector {
    fields: Vec<String>,
}

impl Visitor for FieldCollector {
    fn visit_field(&mut self, field: &FieldDecl) -> Result<()> {
        self.fields.push(field.name.value.clone());
        Ok(())
    }
}

Finding Specific Patterns:

struct RelationFinder {
    relations: Vec<String>,
}

impl Visitor for RelationFinder {
    fn visit_field(&mut self, field: &FieldDecl) -> Result<()> {
        if field.has_relation_attribute() {
            self.relations.push(field.name.value.clone());
        }
        Ok(())
    }
}

Validation:

struct NameValidator {
    errors: Vec<String>,
}

impl Visitor for NameValidator {
    fn visit_model(&mut self, model: &ModelDecl) -> Result<()> {
        if !model.name.value.chars().next().unwrap().is_uppercase() {
            self.errors.push(format!("Model {} should start with uppercase", model.name));
        }
        walk_model(self, model)
    }
}

Schema Language

Supported Declarations

Datasource:

datasource db {
  provider = "postgresql"
  url      = env("DATABASE_URL")
}

Generator:

generator client {
  provider = "nautilus-client-rs"
  output   = "../generated" // Optional
  interface = "async" // default: sync
}

Enum:

enum Role {
  USER
  ADMIN
  MODERATOR
}

Model:

model User {
  id        Int      @id @default(autoincrement())
  email     String   @unique
  username  String   @map("user_name")
  role      Role     @default(USER)
  createdAt DateTime @default(now())
  
  posts     Post[]
  
  @@map("users")
  @@index([email])
}

Field Types

Scalar Types:

String, Boolean, Int, BigInt, Float
DateTime, Bytes, Json, Uuid
Decimal(precision, scale) - e.g., Decimal(10, 2)

User Types:

Model references: Post, User
Enum references: Role, Status

Modifiers:

? - optional/nullable field
[] - array/one-to-many relation

Field Attributes

@id - Primary key
@unique - Unique constraint
@default(expr) - Default value
- Functions: autoincrement(), uuid(), now()
- Literals: 0, "DEFAULT", true
- Enum values: USER, ACTIVE
@map("physical_name") - Physical column name

@relation(...) - Foreign key relationship

user User @relation(
  fields: [userId],
  references: [id],
  onDelete: Cascade,
  onUpdate: Restrict
)

Model Attributes

@@map("table_name") - Physical table name
@@id([field1, field2]) - Composite primary key
@@unique([field1, field2]) - Composite unique constraint
@@index([field1, field2]) - Database index, you can also specify the index types

Error Handling

The analyze function collects all diagnostics in one pass. Each Diagnostic carries a byte-offset span that can be converted to line/column:

use nautilus_schema::{analyze, Severity};

let result = analyze(source);
for diag in &result.diagnostics {
    let (pos, _) = diag.span.to_positions(source);
    let label = match diag.severity {
        Severity::Error   => "error",
        Severity::Warning => "warning",
    };
    eprintln!("{}:{}: {}", pos, label, diag.message);
}

For the lower-level SchemaError type, each variant with a span exposes format_with_file(filepath, source) which emits the standard filepath:line:column: message format recognised by VS Code.

Examples

Run the bundled examples:

cargo run --package nautilus-schema --example parse_schema
cargo run --package nautilus-schema --example tokenize_schema
cargo run --package nautilus-schema --example visitor_demo

Grammar

See GRAMMAR.md for the complete EBNF grammar specification.

Key grammar structures:

Schema ::= Declaration* EOF

Declaration ::= DatasourceDecl | GeneratorDecl | ModelDecl | EnumDecl

ModelDecl ::= 'model' Ident '{' (FieldDecl | ModelAttribute)* '}'

FieldDecl ::= Ident FieldType FieldModifier? FieldAttribute*

FieldType ::= ScalarType | 'Decimal' '(' Number ',' Number ')' | UserType

Expr ::= Literal | FunctionCall | Array | Ident

Testing

Run all tests:

cargo test --package nautilus-schema

Run specific test suites:

cargo test --package nautilus-schema --test parser_tests
cargo test --package nautilus-schema --test visitor_tests
cargo test --package nautilus-schema --lib

Test coverage:

57 unit tests embedded in each module
17 integration tests for the analysis API (analysis_tests.rs)
14 integration tests for the IR builder (ir_tests.rs)
23 integration tests for the lexer (lexer_tests.rs)
16 integration tests for the parser (parser_tests.rs)
24 integration tests for the validator (validation_tests.rs)
12 integration tests for the visitor (visitor_tests.rs)

Documentation

Generate and view documentation:

cargo doc --package nautilus-schema --open

Documentation includes:

Module-level overviews
Comprehensive API docs
Usage examples
Grammar reference

Architecture

nautilus-schema/
├── src/
│   ├── lib.rs         # Public API re-exports
│   ├── span.rs        # Byte-offset source location types
│   ├── token.rs       # Token types
│   ├── lexer.rs       # Tokenizer
│   ├── ast.rs         # Syntax AST node definitions
│   ├── parser.rs      # Recursive descent parser with error recovery
│   ├── error.rs       # SchemaError and Result alias
│   ├── diagnostic.rs  # Severity + Diagnostic (stable public contract)
│   ├── validator.rs   # Multi-pass semantic validator
│   ├── ir.rs          # Validated intermediate representation
│   ├── visitor.rs     # Visitor trait and walk helpers
│   ├── formatter.rs   # Canonical source formatter
│   └── analysis.rs    # analyze / completion / hover / goto_definition
├── tests/
│   ├── lexer_tests.rs       # 23 lexer integration tests
│   ├── parser_tests.rs      # 16 parser integration tests
│   ├── validation_tests.rs  # 24 validator integration tests
│   ├── ir_tests.rs          # 14 IR builder integration tests
│   ├── visitor_tests.rs     # 12 visitor integration tests
│   └── analysis_tests.rs    # 17 analysis API integration tests
├── examples/
│   ├── tokenize_schema.rs
│   ├── parse_schema.rs
│   └── visitor_demo.rs
├── GRAMMAR.md        # EBNF grammar specification
└── README.md

Usage within the project

nautilus-lsp — uses analyze, completion, hover, goto_definition, and format_schema to implement the language server.
nautilus-codegen — calls validate_schema to obtain the SchemaIr from which it generates Rust types and SQL.
nautilus-cli — calls analyze to surface diagnostics and format_schema for the format command.

License

Licensed under either of:

Apache License, Version 2.0 (LICENSE-APACHE)
MIT License (LICENSE-MIT)

at your option.

Contributing

This is part of the Nautilus ORM project. See the main repository for contribution guidelines.

nautilus-orm-schema 0.1.3