jsonata-core 2.1.5

High-performance Rust implementation of JSONata query and transformation language
Documentation
# JSONata Parser Implementation

This directory contains the complete implementation of the JSONata parser in Rust.

## File Structure

```
parser.rs           - Main parser implementation (1,242 lines)
  ├── Lexer         - Tokenizer (lines 88-469)
  ├── Parser        - Pratt parser (lines 471-811)
  └── Tests         - Comprehensive tests (lines 821-1241)
```

## Quick Start

```rust
use jsonatapy::parser::parse;

// Parse an expression
let ast = parse("orders[price > 100].product")?;

// The AST can then be passed to the evaluator
```

## Features

### Lexer

The lexer converts JSONata expressions into tokens:

```rust
let mut lexer = Lexer::new("1 + 2".to_string());
let token1 = lexer.next_token()?; // Token::Number(1.0)
let token2 = lexer.next_token()?; // Token::Plus
let token3 = lexer.next_token()?; // Token::Number(2.0)
```

**Supported tokens:**
- Literals: numbers, strings, booleans, null
- Identifiers: `foo`, `bar_baz`, `` `with spaces` ``
- Variables: `$var`
- Operators: `+`, `-`, `*`, `/`, `%`, `=`, `!=`, `<`, `<=`, `>`, `>=`, `and`, `or`, `in`, `&`, `.`, `..`, `:=`
- Delimiters: `()`, `[]`, `{}`, `,`, `;`, `:`, `?`
- Comments: `/* ... */`

### Parser

The parser uses **Pratt parsing** (Top-Down Operator Precedence) to build an AST:

```rust
let mut parser = Parser::new("1 + 2 * 3".to_string())?;
let ast = parser.parse()?;
// Produces: Binary { op: Add, lhs: 1, rhs: Binary { op: Multiply, lhs: 2, rhs: 3 } }
```

**Operator precedence:**
1. Function call `()`, array access `[]` - 80
2. Dot operator `.` - 75
3. Multiplication `*`, division `/`, modulo `%` - 60
4. Addition `+`, subtraction `-`, concatenation `&` - 50
5. Comparison `=`, `!=`, `<`, `>`, etc. - 40
6. Logical AND `and` - 30
7. Logical OR `or` - 25
8. Range `..`, ternary `?` - 20
9. Assignment `:=` - 10 (right-associative)

**Supported constructs:**
- Literals: `42`, `"hello"`, `true`, `false`, `null`
- Variables: `$myVar`
- Paths: `user.profile.name`
- Binary operations: `a + b`, `x > 10`, `"a" & "b"`
- Unary operations: `-x`
- Function calls: `sum(1, 2, 3)`
- Arrays: `[1, 2, 3]`
- Objects: `{"key": "value"}`
- Conditionals: `x > 0 ? "positive" : "negative"`
- Blocks: `(x := 5; y := 10; x + y)`

## Examples

### Example 1: Arithmetic with precedence

```rust
let ast = parse("1 + 2 * 3")?;
// Result: 1 + (2 * 3)
```

### Example 2: Path expressions

```rust
let ast = parse("user.address.city")?;
// Result: Path { steps: ["user", "address", "city"] }
```

### Example 3: Function calls

```rust
let ast = parse("sum(items[].price)")?;
// Result: Function { name: "sum", args: [...] }
```

### Example 4: Complex expressions

```rust
let ast = parse(r#"
    orders[price > 100] {
        "product": product,
        "total": price * quantity
    }
"#)?;
```

## Error Handling

The parser provides detailed error messages:

```rust
match parse("1 +") {
    Ok(ast) => println!("Success: {:?}", ast),
    Err(e) => eprintln!("Error: {}", e),
}
// Output: Error: Unexpected end of expression
```

**Error types:**
- `UnexpectedToken` - Invalid character
- `UnexpectedEnd` - Premature end of input
- `InvalidSyntax` - Malformed expression
- `InvalidNumber` - Invalid number format
- `UnclosedString` - Missing closing quote
- `InvalidEscape` - Invalid escape sequence
- `UnclosedComment` - Missing `*/`
- `Expected` - Wrong token type

## Testing

Run the comprehensive test suite:

```bash
cargo test --lib parser
```

The test suite includes:
- **Lexer tests** (17 tests): Token recognition, escape sequences, comments
- **Parser tests** (18 tests): All constructs, precedence, complex expressions

### Test Examples

```rust
#[test]
fn test_parse_precedence() {
    let ast = parse("1 + 2 * 3").unwrap();
    // Verifies: 1 + (2 * 3), not (1 + 2) * 3
}

#[test]
fn test_parse_path() {
    let ast = parse("foo.bar.baz").unwrap();
    // Verifies path flattening
}
```

## Performance

The parser is designed for efficiency:

- **Single-pass lexing**: O(n) time complexity
- **No backtracking**: Pratt parsing is predictable
- **Minimal allocations**: Reuses strings where possible
- **Zero-copy** where feasible

Benchmarks (on typical expressions):
- Simple: `1 + 2` - ~1-2 microseconds
- Medium: `user.orders[price > 100].total` - ~5-10 microseconds
- Complex: Nested expressions - ~20-50 microseconds

## API Reference

### `parse(expression: &str) -> Result<AstNode, ParserError>`

Main entry point for parsing.

**Parameters:**
- `expression`: JSONata expression string

**Returns:**
- `Ok(AstNode)`: Parsed AST
- `Err(ParserError)`: Parse error with details

**Example:**
```rust
let ast = parse("1 + 2")?;
```

### `Lexer::new(input: String) -> Lexer`

Creates a new lexer.

**Example:**
```rust
let mut lexer = Lexer::new("1 + 2".to_string());
```

### `Lexer::next_token(&mut self) -> Result<Token, ParserError>`

Gets the next token.

**Example:**
```rust
let token = lexer.next_token()?;
```

### `Parser::new(input: String) -> Result<Parser, ParserError>`

Creates a new parser.

**Example:**
```rust
let mut parser = Parser::new("1 + 2".to_string())?;
```

### `Parser::parse(&mut self) -> Result<AstNode, ParserError>`

Parses the input into an AST.

**Example:**
```rust
let ast = parser.parse()?;
```

## Implementation Details

### Pratt Parsing Algorithm

The parser uses Vaughan Pratt's Top-Down Operator Precedence algorithm:

1. **Left binding power (lbp)**: How strongly an operator binds to the left
2. **Right binding power (rbp)**: How strongly it binds to the right
3. **Null denotation (nud)**: How to parse as a prefix
4. **Left denotation (led)**: How to parse as an infix

Example:
```rust
fn parse_expression(&mut self, min_bp: u8) -> Result<AstNode, ParserError> {
    let mut lhs = self.parse_primary()?;

    while let Some((left_bp, right_bp)) = self.binding_power(&self.current_token) {
        if left_bp < min_bp { break; }

        // Handle operator
        let op = self.current_token;
        self.advance()?;
        let rhs = self.parse_expression(right_bp)?;

        lhs = AstNode::Binary { op, lhs, rhs };
    }

    Ok(lhs)
}
```

### Lexer State Machine

The lexer is a simple state machine:

```
Start -> Whitespace -> Token -> Start
      -> Comment -> Start
```

Special handling:
- String escapes: `\n`, `\t`, `\uXXXX`
- Number formats: Integer, decimal, scientific
- Multi-character operators: `..`, `:=`, `!=`, `<=`, `>=`

## Comparison with JavaScript Implementation

This Rust implementation mirrors the JavaScript reference:

**Similarities:**
- Token types and operators
- Operator precedence values
- AST structure
- Parsing algorithms

**Differences:**
- Strong typing (Rust enums vs JS objects)
- Error handling (Result vs exceptions)
- Memory management (ownership vs GC)
- Performance (compiled vs interpreted)

## Future Enhancements

Planned additions:
- [ ] Lambda functions: `function($x) { $x * 2 }`
- [ ] Regex literals: `/pattern/flags`
- [ ] Transform operator: `|`
- [ ] Wildcards: `*` and `**`
- [ ] Order-by: `^`
- [ ] Focus/index operators: `@`, `#`
- [ ] Parent operator: `%`

## Contributing

When modifying the parser:

1. **Maintain compatibility** with the JavaScript reference
2. **Add tests** for new features
3. **Update documentation**
4. **Run benchmarks** to ensure no performance regression
5. **Follow Rust conventions** (rustfmt, clippy)

## License

MIT License - See LICENSE file for details

## References

- [JSONata Documentation]https://jsonata.org/
- [Pratt Parsing]http://journal.stuffwithstuff.com/2011/03/19/pratt-parsers-expression-parsing-made-easy/
- [Reference Implementation]https://github.com/jsonata-js/jsonata