sqlexpr-rust 1.0.2

A SQL expression parser and evaluator in Rust
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
# SQL Expression Parser & Evaluator

A Rust library for parsing and evaluating SQL-like boolean expressions with full support for comparisons, arithmetic, pattern matching, and logical operators.  Also see *CLAUDE.md* and *SqlExprParser-EBNF-Final.ebnf* for related documentation.

**NOTE:**  This project has been superceded by [sqlexpr-congo-rust](https://github.com/richcar58/sqlexpr-congo-rust), which was built using CongoCC-generated Rust code and delivers significantly improved performance.

There are two Java SqlExpr parser/evaluator implementations that accept basically the same language as this Rust parser.  See [sqlexpr-javacc](https://github.com/richcar58/sqlexpr-javacc) for a parser built using the [JavaCC](https://javacc.github.io/javacc/) parser generator; see [sqlexpr-congocc](https://github.com/richcar58/sqlexpr-congocc) for a parser built using the [CongoCC](https://parsers.org/) parser generator.

## Related Project Catalog

The project is a part of a suite of production-ready and experimental projects that provide SQL-based boolean expression parsers and evaluators that use different technologies in their construction and output.  Below is a summary of these projects.

| <big>**Project Name**</big> | <big>**Description**</big> |
|:---|:---|
| >>> **Parsers and Evaluators** <<< | |
| [sqlexpr-javacc](https://github.com/richcar58/sqlexpr-javacc) | Java parser and evaluator. JavaCC generated parser that incorporates Apache ActiveMQ JMS evaluator. |
| [sqlexpr-congocc](https://github.com/richcar58/sqlexpr-congocc) | Java parser and evaluator. CongoCC generated parser that incorporates Apache ActiveMQ JMS evaluator. |
| [sqlexpr-rust](https://github.com/richcar58/sqlexpr-rust) | Rust parser and evaluator generated from EBNF specification by Claude (see NOTE below). |
| [sqlexpr-congo-rust](https://github.com/richcar58/sqlexpr-congo-rust) | Rust parser, evaluator and tests generated using the Enhanced CongoCC generator (*congo-parser-generator*) and Claude. |
| >>> **Testers** <<< | |
| [sqlexpr-gen](https://github.com/richcar58/sqlexpr-gen) | Rust test data generator for sqlexpr-* parsers and evaluators. |
| [sqlexpr-load-congocc](https://github.com/richcar58/sqlexpr-load-congocc) | Java load tester for sqlexpr-congocc. |
| [sqlexpr-load-rust](https://github.com/richcar58/sqlexpr-load-rust) | Rust load tester for sqlexpr-rust. |
| [sqlexpr-load-congo-rust](https://github.com/richcar58/sqlexpr-load-congo-rust) | Rust load tester for sqlexpr-congo-rust. |
| [sqlexpr-javacc-test](https://github.com/richcar58/sqlexpr-congocc-test) | Java integration tests for sqlexpr-javacc parser. |
| [sqlexpr-congocc-test](https://github.com/richcar58/sqlexpr-gen) | Java integration tests for sqlexpr-congocc parser. |
| >>> **CongoCC Rust Extension** <<< | |
| [congo-parser-generator](https://github.com/richcar58/congo-parser-generator) | Experimental version of CongoCC enhanced by Claude to output Rust parsers. |

**NOTE:** The first Rust parser/evaluator, **sqlexpr-rust**, was generated by Claude directly from an EBNF grammar.  Load testing in *sqlexpr-load-rust* reveals that this implementation was actually **slower** than the handcrafted Java parser generated by CongoCC (**sqlexpr-congocc**).  The second Rust parser/evaluator, **sqlexpr-congo-rust**, was generated using CongoCC enhanced to directly output Rust parsers.  Load testing in *sqlexpr-load-congo-rust* results in an approximate 2.5% improvement over the CongoCC-generated Java parser (in non-scientific testing).  Further performance improvements can probably be wrung from the improved Rust parser, but it's clear that a well-designed, well-written Java parser is quite competitive in certain testing scenarios.   

## Features

### Parser
- **Grammar-enforced type safety**: All top-level expressions must be boolean-valued
- **Comprehensive operators**:
  - Logical: `AND`, `OR`, `NOT`
  - Comparison: `>`, `>=`, `<`, `<=`, `=`, `<>`, `!=`
  - Pattern matching: `LIKE`, `NOT LIKE` (with `%`, `_` wildcards and `ESCAPE`)
  - Range: `BETWEEN`, `NOT BETWEEN`
  - Membership: `IN`, `NOT IN`
  - Null testing: `IS NULL`, `IS NOT NULL`
  - Arithmetic: `+`, `-`, `*`, `/`, `%` (modulo)
  - Unary: `+`, `-`
- **Rich literals**:
  - Integers: decimal (`42`), hexadecimal (`0xFF`), octal (`0755`)
  - Floats: standard (`3.14`), scientific notation (`1.5e-10`)
  - Strings: single-quoted with escape sequences (`'hello\'world'`)
  - Booleans: `TRUE`, `FALSE`
  - Null: `NULL`
- **Comments**: Line comments (`--`) and block comments (`/* */`)
- **Case-insensitive keywords**: `AND`, `and`, `And` all work
- **Detailed error messages**: Parse errors include position and context

This parser implements a clean separation between boolean and value expressions at the grammar level, ensuring most type safety during parsing rather than evaluation.

### Evaluator
- **Variable substitution**: Bind runtime values to variables
- **Type system**: Integer, Float, String, Boolean, Null
- **Automatic type coercion**: Mixed int/float arithmetic automatically promotes to float
- **Division semantics**: Always returns float (e.g., `7/2 = 3.5`)
- **Null handling**: NULL disallowed in arithmetic/comparisons, only allowed with `IS NULL`
- **Short-circuit evaluation**: `AND` and `OR` operators evaluate efficiently
- **Pattern matching**: Full LIKE implementation with wildcards and escape sequences
- **Comprehensive error reporting**: Type errors, null violations, division by zero, etc.

## Quick Start

Add to your `Cargo.toml`:

```toml
[dependencies]
sqlexpr-rust = "0.1.0"
```

### Parsing Expressions

```rust
use sqlexpr_rust::parse;

fn main() {
    // Parse a simple comparison
    let ast = parse("age >= 18 AND status = 'active'").unwrap();
    println!("Parsed: {}", ast);

    // Parse complex expressions
    let expr = parse("(price * quantity) > 1000 AND customer_type IN ('gold', 'platinum')").unwrap();

    // Parse with LIKE pattern matching
    let pattern = parse("email LIKE '%@example.com' AND NOT deleted").unwrap();
}
```

### Evaluating Expressions

```rust
use std::collections::HashMap;
use sqlexpr_rust::{evaluate, RuntimeValue};

fn main() {
    // Create variable bindings
    let mut bindings = HashMap::new();
    bindings.insert("age".to_string(), RuntimeValue::Integer(25));
    bindings.insert("status".to_string(), RuntimeValue::String("active".to_string()));
    bindings.insert("premium".to_string(), RuntimeValue::Boolean(true));

    // Evaluate the expression
    let result = evaluate("age >= 18 AND status = 'active' AND premium", &bindings).unwrap();
    assert_eq!(result, true);

    // Arithmetic evaluation
    bindings.insert("price".to_string(), RuntimeValue::Float(99.99));
    bindings.insert("quantity".to_string(), RuntimeValue::Integer(10));

    let result = evaluate("(price * quantity) > 500", &bindings).unwrap();
    assert_eq!(result, true);

    // Pattern matching
    bindings.insert("email".to_string(), RuntimeValue::String("user@example.com".to_string()));

    let result = evaluate("email LIKE '%@example.com'", &bindings).unwrap();
    assert_eq!(result, true);
}
```

### Error Handling

```rust
use sqlexpr_rust::{evaluate, RuntimeValue, EvalError};
use std::collections::HashMap;

fn main() {
    let mut bindings = HashMap::new();
    bindings.insert("x".to_string(), RuntimeValue::String("hello".to_string()));

    // Type error: string in arithmetic
    let result = evaluate("(x + 10) > 0", &bindings);
    match result {
        Err(EvalError::TypeError { operation, expected, actual, context }) => {
            println!("Type error in {}: expected {}, got {} ({})",
                     operation, expected, actual, context);
        }
        _ => {}
    }

    // Type error: incompatible IN list
    bindings.insert("y".to_string(), RuntimeValue::Integer(42));
    let result = evaluate("y IN ('a', 'b', 'c')", &bindings);
    assert!(matches!(result, Err(EvalError::TypeError { .. })));
}
```

## Project Layout

```
sqlexpr-rust/
├── src/
│   ├── lib.rs           # Public API and re-exports
│   ├── lexer.rs         # Tokenization
│   ├── parser.rs        # Recursive descent parser
│   ├── ast.rs           # Abstract Syntax Tree definitions
│   └── evaluator.rs     # Expression evaluation engine
├── tests/
│   ├── parser_tests.rs  # Parser test suite (155 tests)
│   ├── parser_type_checking_tests.rs  # Parser type test suite (97 tests)
│   └── evaluator_tests.rs # Evaluator test suite (111 tests)
├── examples/
│   ├── showcase.rs      # Feature demonstration
│   └── ...              # Additional examples
├── docs/
│   ├── EVALUATION_DESIGN.md            # Design alternatives
│   ├── EVALUATOR_IMPLEMENTATION_PLAN.md # Implementation roadmap
│   └── command_prompts.md              # Development notes
├── SqlExprParser-EBNF-Final.ebnf  # Formal grammar specification
├── Cargo.toml
├── CLAUDE.md
├── LICENSE
├── README.md
└── SqlExprParser-EBNF-Final.ebnf
```

## Core Components

### Lexer (`src/lexer.rs`)
Tokenizes input strings into a stream of tokens. Handles:
- Keywords (case-insensitive)
- Identifiers and variables
- Numeric literals (int, float, hex, octal, scientific)
- String literals with escapes
- Operators and punctuation
- Comments (line and block)

### Parser (`src/parser.rs`)
Recursive descent parser implementing the EBNF grammar. Features:
- Operator precedence handling
- Type safety at grammar level
- Lookahead for disambiguation
- Detailed error messages with position info

### AST (`src/ast.rs`)
Hierarchical AST structure:
- `BooleanExpr`: AND, OR, NOT, literals, variables, relational expressions
- `RelationalExpr`: Comparisons, LIKE, BETWEEN, IN, IS NULL
- `ValueExpr`: Arithmetic operations, literals, variables

### Evaluator (`src/evaluator.rs`)
Evaluation engine with:
- Variable binding resolution
- Type checking and coercion
- Short-circuit boolean logic
- Pattern matching for LIKE
- Comprehensive error handling

## Grammar Overview

The grammar enforces type safety at parse time:

```ebnf
BooleanExpression = BooleanOrExpression ;
BooleanOrExpression = BooleanAndExpression { "OR" BooleanAndExpression } ;
BooleanAndExpression = BooleanTerm { "AND" BooleanTerm } ;
BooleanTerm = "NOT" BooleanTerm
            | "(" BooleanExpression ")"
            | BooleanLiteral
            | Variable
            | RelationalExpression ;

RelationalExpression = ValueExpression ComparisonOp ValueExpression
                     | ValueExpression "LIKE" Pattern
                     | ValueExpression "BETWEEN" ValueExpression "AND" ValueExpression
                     | ValueExpression "IN" "(" ValueList ")"
                     | ValueExpression "IS" ["NOT"] "NULL" ;

ValueExpression = AdditiveExpression ;
AdditiveExpression = MultiplicativeExpression { ("+" | "-") MultiplicativeExpression } ;
MultiplicativeExpression = UnaryExpression { ("*" | "/" | "%") UnaryExpression } ;
UnaryExpression = ["+" | "-"] PrimaryExpression ;
PrimaryExpression = Literal | Variable | "(" ValueExpression ")" ;
```

See `SqlExprParser-EBNF-Final.ebnf` for the complete formal grammar.

## Type System

### RuntimeValue Types
- `Integer(i64)`: 64-bit signed integers
- `Float(f64)`: 64-bit floating point
- `String(String)`: UTF-8 strings
- `Boolean(bool)`: true/false
- `Null`: SQL NULL value

### Type Coercion Rules
1. **Arithmetic**: Int + Int → Int, Float + Float → Float
2. **Mixed arithmetic**: Int + Float → Float (automatic promotion)
3. **Division**: Always returns Float (e.g., `7 / 2 = 3.5`)
4. **Comparisons**: Same types compared directly; Int/Float mixing allowed
5. **NULL handling**: NULL in arithmetic/comparisons raises error; use `IS NULL`

## Examples

### Boolean Logic
```sql
TRUE AND FALSE                          -- false
age >= 18 AND status = 'active'        -- depends on bindings
(x > 10 OR y > 10) AND NOT deleted     -- compound condition
```

### Arithmetic
```sql
(price * quantity) > 1000              -- arithmetic in comparison
(revenue - cost) / revenue >= 0.2      -- percentage calculation
amount % 100 = 0                       -- check divisibility
```

### Pattern Matching
```sql
email LIKE '%@example.com'             -- domain match
name LIKE 'J%n'                        -- starts with J, ends with n
code LIKE 'A___B'                      -- A + 3 chars + B
text LIKE '50\%' ESCAPE '\'            -- literal % character
```

### Range and Membership
```sql
age BETWEEN 18 AND 65                  -- inclusive range
status IN ('active', 'pending')        -- membership test
score NOT BETWEEN 0 AND 59             -- exclusion
role NOT IN ('admin', 'moderator')     -- negative membership
```

### Null Handling
```sql
middle_name IS NULL                    -- null check
email IS NOT NULL                      -- non-null check
-- x + NULL  would raise NullInOperation error
-- x > NULL  would raise NullInOperation error
```

## Running Examples

```bash
# Run the feature showcase
cargo run --example showcase

# Enable pretty-printing of AST
SQLEXPR_PRETTY=true cargo run --example showcase

# Run all tests
cargo test

# Run specific test suite
cargo test --test parser_tests
cargo test --test evaluator_tests

# Build documentation
cargo doc --open
```

## Testing

The project includes comprehensive test coverage:

- **Parser tests** (`tests/parser_tests.rs`): 155 tests covering all grammar features
- **Evaluator tests** (`tests/evaluator_tests.rs`): 111 tests covering all operations
- **Unit tests** (`src/lib.rs`, modules): 13 embedded tests
- **Doc tests**: 1 documentation example test

Total: **280 tests**

Run tests with:
```bash
cargo test                  # All tests
cargo test --verbose        # With output
cargo test <pattern>        # Specific tests
```

### Viewing Abstract Syntax Trees (ASTs)

Tell the parser to pretty print ASTs of parsed expressions using the *SQLEXPR_PRETTY* environment variable. For example, the following commands can be used to dump the ASTs generated by the *parser_tests* and *evaluator_tests* programs.  These commands should be run from the top-level project directory.  For easy reference, the output files from these test programs are shipped with the source code.

```bash
SQLEXPR_PRETTY=true cargo test --test parser_tests -- --nocapture --test-threads=1 > examples/output/parser_tests.out
SQLEXPR_PRETTY=true cargo test --test evaluator_tests -- --nocapture --test-threads=1 > examples/output/evaluator_tests.out
```

## Error Messages

The library provides detailed error messages:

### Parse Errors
```
Parse error: Unexpected token ')' near position 15 in:
  (x > 5 AND y < )
```

### Evaluation Errors
```
Type error in addition: expected numeric types, got string and integer
(context: arithmetic operation)

NULL value in GreaterThan operation (context: cannot compare NULL).
NULL is only allowed in IS NULL/IS NOT NULL

Division by zero in expression: x / 0 > 5
```

## Performance Considerations

- **Parser**: Single-pass recursive descent, O(n) complexity
- **Lexer**: Single-pass tokenization, O(n) complexity
- **Evaluator**: Direct evaluation without intermediate representation
- **Short-circuit**: AND/OR operators short-circuit for efficiency
- **Pattern matching**: Regex-based LIKE uses Rust's `regex` crate

## Limitations

1. **No subqueries**: Only standalone boolean expressions
2. **No aggregate functions**: No `SUM`, `COUNT`, etc.
3. **No date/time types**: Only basic types (int, float, string, bool, null)
4. **Case-sensitive strings**: String comparisons are case-sensitive
5. **No COLLATE**: String ordering uses Rust's string comparison

## License

See `LICENSE` file for details.

## Contributing

Contributions are welcome! Please ensure:
1. All tests pass: `cargo test`
2. Code follows Rust conventions: `cargo fmt`
3. No warnings: `cargo clippy`
4. Add tests for new features

## Documentation

- **Grammar**: See `SqlExprParser-EBNF-Final.ebnf`
- **API docs**: Run `cargo doc --open`
- **Design docs**: See `docs/` directory
- **Examples**: See `examples/` directory

## Acknowledgments

Anthopic's Claude Sonnet 4.5 was used to generate most of the code and documentation in this project.