# Lance Graph Query Engine
A graph query engine for Lance datasets with Cypher syntax support. This crate enables querying Lance's columnar datasets using familiar graph query patterns, interpreting tabular data as property graphs.
## Features
- Cypher query parsing and AST construction
- Graph configuration for mapping Lance tables to nodes and relationships
- Semantic validation with typed `GraphError` diagnostics
- Pluggable execution strategies (DataFusion planner by default, simple executor, Lance Native placeholder)
- Async query execution that returns Arrow `RecordBatch` results
- JSON-serializable parameter binding for reusable query templates
- Logical plan debugging via `CypherQuery::explain`
## Quick Start
```rust
use std::collections::HashMap;
use std::sync::Arc;
use arrow_array::{ArrayRef, Int32Array, RecordBatch, StringArray};
use arrow_schema::{DataType, Field, Schema};
use lance_graph::{CypherQuery, ExecutionStrategy, GraphConfig};
let config = GraphConfig::builder()
.with_node_label("Person", "person_id")
.with_relationship("KNOWS", "src_person_id", "dst_person_id")
.build()?;
let schema = Arc::new(Schema::new(vec![
Field::new("person_id", DataType::Int32, false),
Field::new("name", DataType::Utf8, false),
Field::new("age", DataType::Int32, false),
]));
let batch = RecordBatch::try_new(
schema,
vec![
Arc::new(Int32Array::from(vec![1, 2])) as ArrayRef,
Arc::new(StringArray::from(vec!["Alice", "Bob"])) as ArrayRef,
Arc::new(Int32Array::from(vec![29, 35])) as ArrayRef,
],
)?;
let mut tables = HashMap::new();
tables.insert("Person".to_string(), batch);
let query = CypherQuery::new("MATCH (p:Person) WHERE p.age > $min RETURN p.name")?
.with_config(config)
.with_parameter("min", 30);
let runtime = tokio::runtime::Runtime::new()?;
// Use default DataFusion-based execution
let result = runtime.block_on(query.execute(tables.clone(), None))?;
// Opt in to the simple executor if you only need projection/filter support.
let simple = runtime.block_on(query.execute(tables, Some(ExecutionStrategy::Simple)))?;
```
The query expects a `HashMap<String, RecordBatch>` keyed by the labels and relationship types referenced in the Cypher text. Each record batch should expose the columns configured through `GraphConfig` (ID fields, property fields, etc.). Relationship mappings also expect a batch keyed by the relationship type (for example `KNOWS`) that contains the configured source/target ID columns and any optional property columns.
## Configuring Graph Mappings
Graph mappings are declared with `GraphConfig::builder()`:
```rust
use lance_graph::{GraphConfig, NodeMapping, RelationshipMapping};
let config = GraphConfig::builder()
.with_node_label("Person", "person_id")
.with_relationship("KNOWS", "src_person_id", "dst_person_id")
.build()?;
```
For finer control, build `NodeMapping` and `RelationshipMapping` instances explicitly:
```rust
let person = NodeMapping::new("Person", "person_id")
.with_properties(vec!["name".into(), "age".into()])
.with_filter("kind = 'person'");
let knows = RelationshipMapping::new("KNOWS", "src_person_id", "dst_person_id")
.with_properties(vec!["since".into()]);
let config = GraphConfig::builder()
.with_node_mapping(person)
.with_relationship_mapping(knows)
.build()?;
```
## Executing Cypher Queries
- `CypherQuery::new` parses Cypher text into the internal AST.
- `with_config` attaches the graph configuration used for validation and execution.
- `with_parameter` / `with_parameters` bind JSON-serializable values that can be referenced as `$param` in the Cypher text.
- `execute` is asynchronous and returns an Arrow `RecordBatch`. Pass `None` for the default DataFusion planner or `Some(ExecutionStrategy::Simple)` for the single-table executor. `ExecutionStrategy::LanceNative` is reserved for future native execution support and currently errors.
- `explain` is asynchronous and returns a formatted string containing the graph logical plan alongside the DataFusion logical and physical plans.
Queries with a single `MATCH` clause containing a path pattern are planned as joins using the provided mappings. Other queries can opt into the single-table projection/filter pipeline via `ExecutionStrategy::Simple` when DataFusion's planner is unnecessary.
A builder (`CypherQueryBuilder`) is also available for constructing queries programmatically without parsing text.
## Supported Cypher Surface
- Node patterns `(:Label)` with optional variables.
- Relationship patterns with fixed direction and type, including multi-hop paths.
- Property comparisons against literal values with `AND`/`OR`/`NOT`/`EXISTS`.
- RETURN lists of property accesses, optional `DISTINCT`, `ORDER BY`, `SKIP` (offset), and `LIMIT`.
- Positional and named parameters (e.g. `$min_age`).
Basic aggregations like `COUNT` are supported. Optional matches and subqueries are parsed but not executed yet.
## Crate Layout
- `ast` – Cypher AST definitions.
- `parser` – Nom-based Cypher parser.
- `semantic` – Lightweight semantic checks on the AST.
- `logical_plan` – Builders for graph logical plans.
- `datafusion_planner` – DataFusion-based execution planning.
- `simple_executor` – Simple single-table executor.
- `config` – Graph configuration types and builders.
- `query` – High level `CypherQuery` API and runtime.
- `error` – `GraphError` and result helpers.
- `source_catalog` – Helpers for looking up table metadata.
## Error Handling
Most APIs return `Result<T, GraphError>`. Errors include parsing failures, missing mappings, and execution issues surfaced from DataFusion.
## Testing
```bash
cargo test -p lance-graph
```
## Benchmarks
See the repository root `README.md` for benchmark setup, run commands, and report locations.
## Python Bindings
See the Python package docs for setup and development:
- Python package README: `python/README.md`
- Runnable examples (from repo root): `examples/README.md`
## License
Apache-2.0. See the top-level LICENSE file for details.