ruvector-graph 2.0.6

Distributed Neo4j-compatible hypergraph database with SIMD optimization
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
# Cypher Query Language Parser for RuVector

A complete Cypher-compatible query language parser implementation for the RuVector graph database, built using the nom parser combinator library.

## Overview

This module provides a full-featured Cypher query parser that converts Cypher query text into an Abstract Syntax Tree (AST) suitable for execution. It includes:

- **Lexical Analysis** (`lexer.rs`): Tokenizes Cypher query strings
- **Syntax Parsing** (`parser.rs`): Recursive descent parser using nom
- **AST Definitions** (`ast.rs`): Complete type system for Cypher queries
- **Semantic Analysis** (`semantic.rs`): Type checking and validation
- **Query Optimization** (`optimizer.rs`): Query plan optimization

## Supported Cypher Features

### Pattern Matching
```cypher
MATCH (n:Person)
MATCH (a:Person)-[r:KNOWS]->(b:Person)
OPTIONAL MATCH (n)-[r]->()
```

### Hyperedges (N-ary Relationships)
```cypher
-- Transaction involving multiple parties
MATCH (person)-[r:TRANSACTION]->(acc1:Account, acc2:Account, merchant:Merchant)
WHERE r.amount > 1000
RETURN person, r, acc1, acc2, merchant
```

### Filtering
```cypher
WHERE n.age > 30 AND n.name = 'Alice'
WHERE n.age >= 18 OR n.verified = true
```

### Projections and Aggregations
```cypher
RETURN n.name, n.age
RETURN COUNT(n), AVG(n.age), MAX(n.salary), COLLECT(n.name)
RETURN DISTINCT n.department
```

### Mutations
```cypher
CREATE (n:Person {name: 'Bob', age: 30})
MERGE (n:Person {email: 'alice@example.com'})
  ON CREATE SET n.created = timestamp()
  ON MATCH SET n.accessed = timestamp()
DELETE n
DETACH DELETE n
SET n.age = 31, n.updated = timestamp()
```

### Query Chaining
```cypher
MATCH (n:Person)
WITH n, n.age AS age
WHERE age > 30
RETURN n.name, age
ORDER BY age DESC
LIMIT 10
```

### Path Patterns
```cypher
MATCH p = (a:Person)-[*1..5]->(b:Person)
RETURN p
```

### Advanced Expressions
```cypher
CASE
  WHEN n.age < 18 THEN 'minor'
  WHEN n.age < 65 THEN 'adult'
  ELSE 'senior'
END
```

## Architecture

### 1. Lexer (`lexer.rs`)

The lexer converts raw text into a stream of tokens:

```rust
use ruvector_graph::cypher::lexer::tokenize;

let tokens = tokenize("MATCH (n:Person) RETURN n")?;
// Returns: [MATCH, (, Identifier("n"), :, Identifier("Person"), ), RETURN, Identifier("n")]
```

**Features:**
- Full Cypher keyword support
- String literals (single and double quoted)
- Numeric literals (integers and floats with scientific notation)
- Operators and delimiters
- Position tracking for error reporting

### 2. Parser (`parser.rs`)

Recursive descent parser using nom combinators:

```rust
use ruvector_graph::cypher::parse_cypher;

let query = "MATCH (n:Person) WHERE n.age > 30 RETURN n.name";
let ast = parse_cypher(query)?;
```

**Features:**
- Error recovery and detailed error messages
- Support for all Cypher clauses
- Hyperedge pattern recognition
- Operator precedence handling
- Property map parsing

### 3. AST (`ast.rs`)

Complete Abstract Syntax Tree representation:

```rust
pub struct Query {
    pub statements: Vec<Statement>,
}

pub enum Statement {
    Match(MatchClause),
    Create(CreateClause),
    Merge(MergeClause),
    Delete(DeleteClause),
    Set(SetClause),
    Return(ReturnClause),
    With(WithClause),
}

// Hyperedge support for N-ary relationships
pub struct HyperedgePattern {
    pub variable: Option<String>,
    pub rel_type: String,
    pub properties: Option<PropertyMap>,
    pub from: Box<NodePattern>,
    pub to: Vec<NodePattern>,  // Multiple targets
    pub arity: usize,           // N-ary degree
}
```

**Key Types:**
- `Pattern`: Node, Relationship, Path, and Hyperedge patterns
- `Expression`: Full expression tree with operators and functions
- `AggregationFunction`: COUNT, SUM, AVG, MIN, MAX, COLLECT
- `BinaryOperator`: Arithmetic, comparison, logical, string operations

### 4. Semantic Analyzer (`semantic.rs`)

Type checking and validation:

```rust
use ruvector_graph::cypher::semantic::SemanticAnalyzer;

let mut analyzer = SemanticAnalyzer::new();
analyzer.analyze_query(&ast)?;
```

**Checks:**
- Variable scope and lifetime
- Type compatibility
- Aggregation context validation
- Hyperedge validity (minimum 2 target nodes)
- Pattern correctness

### 5. Query Optimizer (`optimizer.rs`)

Query plan optimization:

```rust
use ruvector_graph::cypher::optimizer::QueryOptimizer;

let optimizer = QueryOptimizer::new();
let plan = optimizer.optimize(query);

println!("Optimizations: {:?}", plan.optimizations_applied);
println!("Estimated cost: {}", plan.estimated_cost);
```

**Optimizations:**
- **Constant Folding**: Evaluate constant expressions at parse time
- **Predicate Pushdown**: Move filters closer to data access
- **Join Reordering**: Minimize intermediate result sizes
- **Selectivity Estimation**: Optimize pattern matching order

## Usage Examples

### Basic Query Parsing

```rust
use ruvector_graph::cypher::{parse_cypher, Query};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let query = r#"
        MATCH (person:Person)-[knows:KNOWS]->(friend:Person)
        WHERE person.age > 25 AND friend.city = 'NYC'
        RETURN person.name, friend.name, knows.since
        ORDER BY knows.since DESC
        LIMIT 10
    "#;

    let ast = parse_cypher(query)?;

    println!("Parsed {} statements", ast.statements.len());
    println!("Read-only query: {}", ast.is_read_only());

    Ok(())
}
```

### Hyperedge Queries

```rust
use ruvector_graph::cypher::parse_cypher;

// Parse a hyperedge pattern (N-ary relationship)
let query = r#"
    MATCH (buyer:Person)-[txn:PURCHASE]->(
        product:Product,
        seller:Person,
        warehouse:Location
    )
    WHERE txn.amount > 100
    RETURN buyer, product, seller, warehouse, txn.timestamp
"#;

let ast = parse_cypher(query)?;
assert!(ast.has_hyperedges());
```

### Semantic Analysis

```rust
use ruvector_graph::cypher::{parse_cypher, semantic::SemanticAnalyzer};

let query = "MATCH (n:Person) RETURN COUNT(n), AVG(n.age)";
let ast = parse_cypher(query)?;

let mut analyzer = SemanticAnalyzer::new();
match analyzer.analyze_query(&ast) {
    Ok(()) => println!("Query is semantically valid"),
    Err(e) => eprintln!("Semantic error: {}", e),
}
```

### Query Optimization

```rust
use ruvector_graph::cypher::{parse_cypher, optimizer::QueryOptimizer};

let query = r#"
    MATCH (a:Person), (b:Person)
    WHERE a.age > 30 AND b.name = 'Alice' AND 2 + 2 = 4
    RETURN a, b
"#;

let ast = parse_cypher(query)?;
let optimizer = QueryOptimizer::new();
let plan = optimizer.optimize(ast);

println!("Applied optimizations: {:?}", plan.optimizations_applied);
println!("Estimated execution cost: {:.2}", plan.estimated_cost);
```

## Hyperedge Support

Traditional graph databases represent relationships as binary edges (one source, one target). RuVector's Cypher parser supports **hyperedges** - relationships connecting multiple nodes simultaneously.

### Why Hyperedges?

- **Multi-party Transactions**: Model transfers involving multiple accounts
- **Complex Events**: Represent events with multiple participants
- **N-way Relationships**: Natural representation of real-world scenarios

### Hyperedge Syntax

```cypher
-- Create a 3-way transaction
CREATE (alice:Person)-[t:TRANSFER {amount: 100}]->(
    bob:Person,
    carol:Person
)

-- Match complex patterns
MATCH (author:Person)-[collab:AUTHORED]->(
    paper:Paper,
    coauthor1:Person,
    coauthor2:Person
)
RETURN author, paper, coauthor1, coauthor2

-- Hyperedge with properties
MATCH (teacher)-[class:TEACHES {semester: 'Fall2024'}]->(
    student1, student2, student3, course:Course
)
WHERE course.level = 'Graduate'
RETURN teacher, course, student1, student2, student3
```

### Hyperedge AST

```rust
pub struct HyperedgePattern {
    pub variable: Option<String>,    // Optional variable binding
    pub rel_type: String,             // Relationship type (required)
    pub properties: Option<PropertyMap>, // Optional properties
    pub from: Box<NodePattern>,       // Source node
    pub to: Vec<NodePattern>,         // Multiple target nodes (>= 2)
    pub arity: usize,                 // Total nodes (source + targets)
}
```

## Error Handling

The parser provides detailed error messages with position information:

```rust
use ruvector_graph::cypher::parse_cypher;

match parse_cypher("MATCH (n:Person WHERE n.age > 30") {
    Ok(ast) => { /* ... */ },
    Err(e) => {
        eprintln!("Parse error: {}", e);
        // Output: "Unexpected token: expected ), found WHERE at line 1, column 17"
    }
}
```

## Performance

- **Lexer**: ~500ns per token on average
- **Parser**: ~50-200μs for typical queries
- **Optimization**: ~10-50μs for plan generation

Benchmarks available in `benches/cypher_parser.rs`:

```bash
cargo bench --package ruvector-graph --bench cypher_parser
```

## Testing

Comprehensive test coverage across all modules:

```bash
# Run all Cypher tests
cargo test --package ruvector-graph --lib cypher

# Run parser integration tests
cargo test --package ruvector-graph --test cypher_parser_integration

# Run specific test
cargo test --package ruvector-graph test_hyperedge_pattern
```

## Implementation Details

### Nom Parser Combinators

The parser uses [nom](https://github.com/Geal/nom), a Rust parser combinator library:

```rust
fn parse_node_pattern(input: &str) -> IResult<&str, NodePattern> {
    preceded(
        char('('),
        terminated(
            parse_node_content,
            char(')')
        )
    )(input)
}
```

**Benefits:**
- Zero-copy parsing
- Composable parsers
- Excellent error handling
- Type-safe combinators

### Type System

The semantic analyzer implements a simple type system:

```rust
pub enum ValueType {
    Integer, Float, String, Boolean, Null,
    Node, Relationship, Path,
    List(Box<ValueType>),
    Map,
    Any,
}
```

Type compatibility checks ensure query correctness before execution.

### Cost-Based Optimization

The optimizer estimates query cost based on:

1. **Pattern Selectivity**: More specific patterns are cheaper
2. **Index Availability**: Indexed properties reduce scan cost
3. **Cardinality Estimates**: Smaller intermediate results are better
4. **Operation Cost**: Aggregations, sorts, and joins have inherent costs

## Future Enhancements

- [ ] Subqueries (CALL {...})
- [ ] User-defined functions
- [ ] Graph projections
- [ ] Pattern comprehensions
- [ ] JIT compilation for hot paths
- [ ] Parallel query execution
- [ ] Advanced cost-based optimization
- [ ] Query result caching

## References

- [Cypher Query Language Reference]https://neo4j.com/docs/cypher-manual/current/
- [openCypher]http://www.opencypher.org/ - Open specification
- [GQL Standard]https://www.gqlstandards.org/ - ISO graph query language

## License

MIT License - See LICENSE file for details