# nautilus-schema
Schema language parser, validator, IR builder, and formatter for the Nautilus ORM.
## Overview
This crate implements the full processing pipeline for `.nautilus` schema files:
1. **Lex** — Converts source text into typed tokens with byte-offset span tracking.
2. **Parse** — Recursive descent parser builds a strongly-typed [`ast::Schema`] AST.
3. **Validate** — Multi-pass semantic validator resolves types, relations, and constraints, emitting a fully resolved [`ir::SchemaIr`].
4. **Format** — Renders an AST back to canonical source text (idempotent).
Editor tooling (LSP, CLI) can use the one-shot [`analyze`] function which runs all four stages and returns structured diagnostics.
## Features
- **Parser** — datasources, generators, models, enums, all field types, all attributes, error recovery
- **Validator** — duplicate name detection, unknown type resolution, relation integrity, default value type checking, physical name collision detection
- **IR** — fully resolved intermediate representation with logical and physical names
- **Analysis API** — `analyze()`, `completion()`, `hover()`, `goto_definition()` for editor integration
- **Formatter** — canonical output, column-aligned, idempotent
- **Visitor** — trait-based AST traversal with default walk implementations
- **Span Tracking** — every AST node carries byte-offset spans for precise diagnostics
- **Error Recovery** — parser continues after errors to surface multiple issues at once
## Quick Start
Add to your `Cargo.toml`:
```toml
[dependencies]
nautilus-schema = { path = "../nautilus-schema" }
```
### One-shot analysis
The recommended entry point. Runs lex → parse → validate in a single call and
collects every diagnostic:
```rust
use nautilus_schema::analyze;
let result = analyze(source);
for diag in &result.diagnostics {
eprintln!("{:?} — {}", diag.severity, diag.message);
}
if let Some(ir) = &result.ir {
println!("{} model(s) validated", ir.models.len());
}
```
### Validate and obtain the IR directly
```rust
use nautilus_schema::{Lexer, Parser, validate_schema, TokenKind};
let mut lexer = Lexer::new(source);
let mut tokens = Vec::new();
loop {
let token = lexer.next_token()?;
let eof = matches!(token.kind, TokenKind::Eof);
tokens.push(token);
if eof { break; }
}
let ast = Parser::new(&tokens).parse_schema()?;
let ir = validate_schema(ast)?;
println!("{} model(s)", ir.models.len());
```
### Editor tooling API
The `analysis` module exposes higher-level functions for LSP servers and CLI tools:
```rust
use nautilus_schema::{analyze, completion, hover, goto_definition};
// Completions at a byte offset
let items = completion(source, offset);
// Hover documentation at a byte offset
if let Some(info) = hover(source, offset) {
println!("{}", info.content);
}
// Jump to the declaration a symbol at a byte offset refers to
if let Some(span) = goto_definition(source, offset) {
println!("definition at {}..{}", span.start, span.end);
}
```
### Formatting
`format_schema` renders an AST back to canonical, column-aligned source text:
```rust
use nautilus_schema::{analyze, format_schema};
let result = analyze(source);
if let Some(ast) = &result.ast {
let formatted = format_schema(ast, source);
std::fs::write("schema.nautilus", formatted)?;
}
```
## Visitor Pattern
Implement custom visitors for AST traversal and analysis:
```rust
use nautilus_schema::{
ast::*,
visitor::{Visitor, walk_model},
Result,
};
struct ModelCounter {
count: usize,
}
impl Visitor for ModelCounter {
fn visit_model(&mut self, model: &ModelDecl) -> Result<()> {
self.count += 1;
walk_model(self, model) // Continue traversing children
}
}
fn count_models(schema: &Schema) -> Result<usize> {
let mut visitor = ModelCounter { count: 0 };
visitor.visit_schema(schema)?;
Ok(visitor.count)
}
```
### Common Visitor Patterns
**Collecting Information:**
```rust
struct FieldCollector {
fields: Vec<String>,
}
impl Visitor for FieldCollector {
fn visit_field(&mut self, field: &FieldDecl) -> Result<()> {
self.fields.push(field.name.value.clone());
Ok(())
}
}
```
**Finding Specific Patterns:**
```rust
struct RelationFinder {
relations: Vec<String>,
}
impl Visitor for RelationFinder {
fn visit_field(&mut self, field: &FieldDecl) -> Result<()> {
if field.has_relation_attribute() {
self.relations.push(field.name.value.clone());
}
Ok(())
}
}
```
**Validation:**
```rust
struct NameValidator {
errors: Vec<String>,
}
impl Visitor for NameValidator {
fn visit_model(&mut self, model: &ModelDecl) -> Result<()> {
if !model.name.value.chars().next().unwrap().is_uppercase() {
self.errors.push(format!("Model {} should start with uppercase", model.name));
}
walk_model(self, model)
}
}
```
## Schema Language
### Supported Declarations
**Datasource:**
```prisma
datasource db {
provider = "postgresql"
url = env("DATABASE_URL")
}
```
**Generator:**
```prisma
generator client {
provider = "nautilus-client-rs"
output = "../generated" // Optional
interface = "async" // default: sync
}
```
**Enum:**
```prisma
enum Role {
USER
ADMIN
MODERATOR
}
```
**Model:**
```prisma
model User {
id Int @id @default(autoincrement())
email String @unique
username String @map("user_name")
role Role @default(USER)
createdAt DateTime @default(now())
posts Post[]
@@map("users")
@@index([email])
}
```
### Field Types
**Scalar Types:**
- `String`, `Boolean`, `Int`, `BigInt`, `Float`
- `DateTime`, `Bytes`, `Json`, `Uuid`
- `Decimal(precision, scale)` - e.g., `Decimal(10, 2)`
**User Types:**
- Model references: `Post`, `User`
- Enum references: `Role`, `Status`
**Modifiers:**
- `?` - optional/nullable field
- `[]` - array/one-to-many relation
### Field Attributes
- `@id` - Primary key
- `@unique` - Unique constraint
- `@default(expr)` - Default value
- Functions: `autoincrement()`, `uuid()`, `now()`
- Literals: `0`, `"DEFAULT"`, `true`
- Enum values: `USER`, `ACTIVE`
- `@map("physical_name")` - Physical column name
- `@relation(...)` - Foreign key relationship
```prisma
user User @relation(
fields: [userId],
references: [id],
onDelete: Cascade,
onUpdate: Restrict
)
```
### Model Attributes
- `@@map("table_name")` - Physical table name
- `@@id([field1, field2])` - Composite primary key
- `@@unique([field1, field2])` - Composite unique constraint
- `@@index([field1, field2])` - Database index, you can also specify the index types
## Error Handling
The `analyze` function collects all diagnostics in one pass. Each `Diagnostic`
carries a byte-offset `span` that can be converted to line/column:
```rust
use nautilus_schema::{analyze, Severity};
let result = analyze(source);
for diag in &result.diagnostics {
let (pos, _) = diag.span.to_positions(source);
let label = match diag.severity {
Severity::Error => "error",
Severity::Warning => "warning",
};
eprintln!("{}:{}: {}", pos, label, diag.message);
}
```
For the lower-level `SchemaError` type, each variant with a span exposes
`format_with_file(filepath, source)` which emits the standard
`filepath:line:column: message` format recognised by VS Code.
## Examples
Run the bundled examples:
```bash
cargo run --package nautilus-schema --example parse_schema
cargo run --package nautilus-schema --example tokenize_schema
cargo run --package nautilus-schema --example visitor_demo
```
## Grammar
See [GRAMMAR.md](GRAMMAR.md) for the complete EBNF grammar specification.
Key grammar structures:
```ebnf
Schema ::= Declaration* EOF
ModelDecl ::= 'model' Ident '{' (FieldDecl | ModelAttribute)* '}'
FieldDecl ::= Ident FieldType FieldModifier? FieldAttribute*
Expr ::= Literal | FunctionCall | Array | Ident
```
## Testing
Run all tests:
```bash
cargo test --package nautilus-schema
```
Run specific test suites:
```bash
cargo test --package nautilus-schema --test parser_tests
cargo test --package nautilus-schema --test visitor_tests
cargo test --package nautilus-schema --lib
```
Test coverage:
- **57 unit tests** embedded in each module
- **17** integration tests for the analysis API (`analysis_tests.rs`)
- **14** integration tests for the IR builder (`ir_tests.rs`)
- **23** integration tests for the lexer (`lexer_tests.rs`)
- **16** integration tests for the parser (`parser_tests.rs`)
- **24** integration tests for the validator (`validation_tests.rs`)
- **12** integration tests for the visitor (`visitor_tests.rs`)
## Documentation
Generate and view documentation:
```bash
cargo doc --package nautilus-schema --open
```
Documentation includes:
- Module-level overviews
- Comprehensive API docs
- Usage examples
- Grammar reference
## Architecture
```
nautilus-schema/
├── src/
│ ├── lib.rs # Public API re-exports
│ ├── span.rs # Byte-offset source location types
│ ├── token.rs # Token types
│ ├── lexer.rs # Tokenizer
│ ├── ast.rs # Syntax AST node definitions
│ ├── parser.rs # Recursive descent parser with error recovery
│ ├── error.rs # SchemaError and Result alias
│ ├── diagnostic.rs # Severity + Diagnostic (stable public contract)
│ ├── validator.rs # Multi-pass semantic validator
│ ├── ir.rs # Validated intermediate representation
│ ├── visitor.rs # Visitor trait and walk helpers
│ ├── formatter.rs # Canonical source formatter
│ └── analysis.rs # analyze / completion / hover / goto_definition
├── tests/
│ ├── lexer_tests.rs # 23 lexer integration tests
│ ├── parser_tests.rs # 16 parser integration tests
│ ├── validation_tests.rs # 24 validator integration tests
│ ├── ir_tests.rs # 14 IR builder integration tests
│ ├── visitor_tests.rs # 12 visitor integration tests
│ └── analysis_tests.rs # 17 analysis API integration tests
├── examples/
│ ├── tokenize_schema.rs
│ ├── parse_schema.rs
│ └── visitor_demo.rs
├── GRAMMAR.md # EBNF grammar specification
└── README.md
```
## Usage within the project
- **`nautilus-lsp`** — uses `analyze`, `completion`, `hover`, `goto_definition`, and `format_schema` to implement the language server.
- **`nautilus-codegen`** — calls `validate_schema` to obtain the `SchemaIr` from which it generates Rust types and SQL.
- **`nautilus-cli`** — calls `analyze` to surface diagnostics and `format_schema` for the format command.
## License
Licensed under either of:
- Apache License, Version 2.0 ([LICENSE-APACHE](../../LICENSE-APACHE))
- MIT License ([LICENSE-MIT](../../LICENSE-MIT))
at your option.
## Contributing
This is part of the Nautilus ORM project. See the main repository for contribution guidelines.