laburnum 1.17.0

# laburnum::chumsky Module

Tools for integrating [chumsky](https://docs.rs/chumsky) parser combinators with laburnum's span management and CST infrastructure.

## Architecture Overview

```
┌──────────────────────────────────────────────────────────────────┐
│                         Source Code                               │
└───────────────────────────────┬──────────────────────────────────┘
                                │
                                ▼
┌──────────────────────────────────────────────────────────────────┐
│                    Lexer (define_tokens!)                         │
│  • Produces tokens with leading/trailing trivia                   │
│  • Uses wrap! macro for trivia handling                           │
│  • Two-pass: EmptyErr (fast) → Rich (if errors)                   │
└───────────────────────────────┬──────────────────────────────────┘
                                │
                                ▼
┌──────────────────────────────────────────────────────────────────┐
│                  TokenStream (stream.rs)                          │
│  • Arc<[Spanned<Token>]> for efficient sharing                    │
│  • Bridges lexer output to CST parser input                       │
└───────────────────────────────┬──────────────────────────────────┘
                                │
                                ▼
┌──────────────────────────────────────────────────────────────────┐
│                  CST Parser (define_node_db!)                     │
│  • Generates Node enum, NodeDb, State, etc.                       │
│  • Supports backtracking via Inspector trait                      │
│  • Two modes: Parser (Vec) and Query (IndexMap)                   │
└───────────────────────────────┬──────────────────────────────────┘
                                │
                                ▼
┌──────────────────────────────────────────────────────────────────┐
│                         CST/AST                                   │
│  • Nodes with spans, leading/trailing trivia                      │
│  • Ready for semantic analysis via symbolique                     │
└──────────────────────────────────────────────────────────────────┘
```

## Module Structure

| File | Purpose |
|------|---------|
| `mod.rs` | Core types: `State`, `LexExtra`, `SpanCreator`, `LaburnumSpanExt` |
| `stream.rs` | `TokenStream` wrapper for lexer output |
| `node_db.rs` | `define_node_db!` macro for CST infrastructure |
| `lexer/mod.rs` | Re-exports for lexer components |
| `lexer/trivia.rs` | `Trivia` type and parsers, `wrap!` macro |
| `lexer/define_tokens.rs` | `define_tokens!` macro for token generation |

## Quick Start

The key components are:

### 1. Define Token Types

```rust
// In lexer/keyword.rs
laburnum::chumsky::define_tokens! {
    #[chumsky::text::unicode::keyword]  // Use keyword matching
    Token::Keyword(Keyword -> [
        "fn" => Fn,
        "let" => Let,
        "async" => Async #[warn "reserved for future use: {}"],
    ])
}

// In lexer/control.rs
laburnum::chumsky::define_tokens! {
    #[just]  // Use exact matching
    Token::Ctrl(Control -> [
        "==" => EqEq,  // Multi-char operators first
        "=" => Eq,
        "+" => Plus,
    ])
}
```

### 2. Create Lexer State

```rust
pub enum LexerState<'src> {
    Detect {
        inner: laburnum::chumsky::State<'src>,
        did_see_token_error: bool,
    },
    Collect {
        inner: laburnum::chumsky::State<'src>,
        token_errors: Vec<(TokenError, laburnum::Span)>,
    },
}

impl laburnum::chumsky::SpanCreator for LexerState<'_> {
    fn span_from(&mut self, span: SimpleSpan) -> laburnum::Span {
        self.inner_mut().span_cache.create_span(span.start, span.end - span.start)
    }
}
```

### 3. Implement Two-Pass Lexing

```rust
pub fn lex<'src>(
    source_key: SourceKey,
    content: &'src str,
    span_cache: &'src mut SpanCache,
) -> (TokenList, Vec<Rich<'src, char, SimpleSpan>>) {
    // Fast pass: detect errors only
    let checkpoint = span_cache.checkpoint();
    {
        let mut state = LexerState::new_detect(source_key, span_cache);
        let result = lexer::<EmptyErr>()
            .parse_with_state(content, &mut state)
            .into_result();

        if let Ok(tokens) = result {
            return (tokens, Vec::new()); // No errors - fast path
        }
    }

    // Rollback and collect detailed errors
    span_cache.rollback(checkpoint);
    let mut state = LexerState::new_collect(source_key, span_cache);
    let (tokens, errors) = lexer::<Rich<'src, char>>()
        .parse_with_state(content, &mut state)
        .into_output_errors();

    (tokens.unwrap_or_default(), errors)
}
```

### 4. Define CST Node Database (Optional)

```rust
laburnum::chumsky::define_node_db! { Cst =>
    crate::errata::Error,
    crate::errata::Todo,
    crate::symbol::Ident,
    crate::expr::Expr,
    // ... more node types
}
```

This generates:
- `Node` enum with all node variants
- `CstNodeId` for strongly-typed references
- `CstNodeDb` with Parser/Query variants
- `CstState` with Detect/Collect modes
- `CstParserMapExtraExt` trait for node insertion

## Key Patterns

### Two-Pass Lexing/Parsing

The two-pass pattern optimizes for the common case of valid input:

1. **Detect Pass**: Use `EmptyErr` for fast parsing without error allocation
2. **Check Result**: If no errors, return immediately
3. **Collect Pass**: Only if errors detected, rollback and re-parse with `Rich` errors

This avoids expensive error formatting in the success path.

```
Input → Detect (EmptyErr) → Success? → Return tokens
              ↓ (errors)
        Rollback SpanCache
              ↓
        Collect (Rich) → Return tokens + errors
```

### LexerState Enum Pattern

The `Detect`/`Collect` enum pattern prevents accessing wrong fields:

- **Detect mode**: Only sets `did_see_token_error: bool` flag
- **Collect mode**: Accumulates `Vec<(TokenError, Span)>`

This provides compile-time safety and clear intent.

### Trivia Handling

Every token carries optional leading and trailing trivia:

```rust
Token::Keyword(
    Option<Trivia>,              // Leading whitespace
    laburnum::Spanned<Keyword>,  // The token itself
    Option<Trivia>,              // Trailing whitespace
)
```

The `wrap!` macro handles this automatically:

```rust
wrap!(
    { your_token_parser } -> |((leading, inner), trailing), e| {
        YourToken::Variant(leading, inner, trailing)
    }
)
```

### Span Management

- **Lexer spans**: Use `laburnum::Span` via `SpanCreator` trait
- **Parser spans**: Use chumsky's `SimpleSpan` internally
- **Conversion**: `LaburnumSpanExt::create_span()` in `.map_with()` closures

The `SpanCache` enables:
- Efficient span creation during parsing
- Checkpoint/rollback for two-pass parsing
- Text recovery from spans

## Macro Relationships

```
define_tokens! ────uses────► wrap!
      │                         │
      │                         ▼
      │                    trivia parsers
      │                    (leading, trailing)
      │
      ▼
 Token enum
 Lexer function
 Match macros (just!, spanned!, etc.)
```

```
define_node_db! ─────────► generates all CST infrastructure
      │
      ├── Node enum (with leading/trailing trivia)
      ├── NodeId (span + key)
      ├── NodeDb (Parser/Query variants)
      ├── State (Detect/Collect variants)
      ├── Checkpoint (for backtracking)
      ├── ParserMapExtraExt (insert_* methods)
      └── Printer (bluegum visualization)
```

## Match Macro Variants

Generated by `define_tokens!`:

| Macro | Returns | Trivia Handling |
|-------|---------|-----------------|
| `just!(Variant)` | Enum variant | Ignores trivia |
| `spanned!(Variant)` | `VariantSpan` struct | Ignores trivia |
| `spanned_unboxed!(Variant)` | `VariantSpan` (unboxed) | Ignores trivia |
| `spanned_with_trivia!(Variant)` | `(leading, variant, trailing, span)` | Exposes trivia |
| `spanned_no_trivia!(Variant)` | `VariantSpan` | Fails if trivia present |
| `spanned_no_trailing_trivia!(Variant)` | `(leading, VariantSpan)` | Fails if trailing |

## Required Dependencies

When using these macros in your crate's `Cargo.toml`:

```toml
[dependencies]
laburnum = { path = "..." }
chumsky = "0.10"
bluegum = { path = "..." }
indexmap = "2"
owo-colors = "4"
paste = "1"
```

## Common Pitfalls

### 1. Future-Compat Warning for Generated Macros

The `define_tokens!` macro generates match macros using `#[macro_export]` which triggers a
future-compat warning (`macro_expanded_macro_exports_accessed_by_absolute_paths`).

This is a known Rust limitation - see [rust-lang/rust#52234](https://github.com/rust-lang/rust/issues/52234).
The warning will remain until Rust provides a better pattern for macro re-exports from
macro-generated code. The macros work correctly; this is just a warning about potential
future Rust changes.

### 2. Multi-Character Operators First

In `define_tokens!`, put longer operators before shorter ones:

```rust
Token::Ctrl(Control -> [
    "==" => EqEq,  // Must come before "="
    "!=" => NotEq,
    "=" => Eq,
    // ...
])
```

### 3. Keyword vs Just

- Use `#[chumsky::text::unicode::keyword]` for language keywords to prevent matching `letx` as `let`
- Use `#[just]` for operators and delimiters

### 4. SpanCache Lifetime

The `SpanCache` must outlive parsing. Pass it by mutable reference to the state:

```rust
let mut span_cache = SpanCache::default();
let mut state = LexerState::new_detect(source_key, &mut span_cache);
```

### 5. Inspector Trait

For backtracking support with `define_node_db!`, your state must implement `chumsky::inspector::Inspector`. The macro generates this automatically for the `*State` type.

## Related ADRs

- **ADR0002**: Rope-based span storage - explains the `Span` design
- **ADR0001**: Content-addressed storage - broader architectural context
- **ADR0003**: Symbolique - how CST nodes feed into symbol analysis

## Feature Adoption Guide

| Feature | Required? | When to Use |
|---------|-----------|-------------|
| `define_tokens!` | Yes | All lexers - generates token enums and match macros |
| `State`/`SpanCreator` | Yes | All parsers - manages spans |
| Two-pass lexing | Recommended | Performance optimization for LSP use cases |
| `LexerState` enum | Recommended | Type-safe state for two-pass lexing |
| `wrap!` macro | Optional | Simplifies trivia handling in token parsers |
| `define_node_db!` | Optional | Full CST with backtracking, trivia preservation |