lance-graph 0.5.0

Graph query engine for Lance datasets with Cypher support
Documentation
# Lance Graph Query Engine

A graph query engine for Lance datasets with Cypher syntax support. This crate enables querying Lance's columnar datasets using familiar graph query patterns, interpreting tabular data as property graphs.

## Features

- Cypher query parsing and AST construction
- Graph configuration for mapping Lance tables to nodes and relationships
- Semantic validation with typed `GraphError` diagnostics
- Pluggable execution strategies (DataFusion planner by default, simple executor, Lance Native placeholder)
- Async query execution that returns Arrow `RecordBatch` results
- JSON-serializable parameter binding for reusable query templates
- Logical plan debugging via `CypherQuery::explain`

## Quick Start

```rust
use std::collections::HashMap;
use std::sync::Arc;

use arrow_array::{ArrayRef, Int32Array, RecordBatch, StringArray};
use arrow_schema::{DataType, Field, Schema};
use lance_graph::{CypherQuery, ExecutionStrategy, GraphConfig};

let config = GraphConfig::builder()
    .with_node_label("Person", "person_id")
    .with_relationship("KNOWS", "src_person_id", "dst_person_id")
    .build()?;

let schema = Arc::new(Schema::new(vec![
    Field::new("person_id", DataType::Int32, false),
    Field::new("name", DataType::Utf8, false),
    Field::new("age", DataType::Int32, false),
]));
let batch = RecordBatch::try_new(
    schema,
    vec![
        Arc::new(Int32Array::from(vec![1, 2])) as ArrayRef,
        Arc::new(StringArray::from(vec!["Alice", "Bob"])) as ArrayRef,
        Arc::new(Int32Array::from(vec![29, 35])) as ArrayRef,
    ],
)?;

let mut tables = HashMap::new();
tables.insert("Person".to_string(), batch);

let query = CypherQuery::new("MATCH (p:Person) WHERE p.age > $min RETURN p.name")?
    .with_config(config)
    .with_parameter("min", 30);

let runtime = tokio::runtime::Runtime::new()?;
// Use default DataFusion-based execution
let result = runtime.block_on(query.execute(tables.clone(), None))?;

// Opt in to the simple executor if you only need projection/filter support.
let simple = runtime.block_on(query.execute(tables, Some(ExecutionStrategy::Simple)))?;
```

The query expects a `HashMap<String, RecordBatch>` keyed by the labels and relationship types referenced in the Cypher text. Each record batch should expose the columns configured through `GraphConfig` (ID fields, property fields, etc.). Relationship mappings also expect a batch keyed by the relationship type (for example `KNOWS`) that contains the configured source/target ID columns and any optional property columns.

## Configuring Graph Mappings

Graph mappings are declared with `GraphConfig::builder()`:

```rust
use lance_graph::{GraphConfig, NodeMapping, RelationshipMapping};

let config = GraphConfig::builder()
    .with_node_label("Person", "person_id")
    .with_relationship("KNOWS", "src_person_id", "dst_person_id")
    .build()?;
```

For finer control, build `NodeMapping` and `RelationshipMapping` instances explicitly:

```rust
let person = NodeMapping::new("Person", "person_id")
    .with_properties(vec!["name".into(), "age".into()])
    .with_filter("kind = 'person'");

let knows = RelationshipMapping::new("KNOWS", "src_person_id", "dst_person_id")
    .with_properties(vec!["since".into()]);

let config = GraphConfig::builder()
    .with_node_mapping(person)
    .with_relationship_mapping(knows)
    .build()?;
```

## Executing Cypher Queries

- `CypherQuery::new` parses Cypher text into the internal AST.
- `with_config` attaches the graph configuration used for validation and execution.
- `with_parameter` / `with_parameters` bind JSON-serializable values that can be referenced as `$param` in the Cypher text.
- `execute` is asynchronous and returns an Arrow `RecordBatch`. Pass `None` for the default DataFusion planner or `Some(ExecutionStrategy::Simple)` for the single-table executor. `ExecutionStrategy::LanceNative` is reserved for future native execution support and currently errors.
- `explain` is asynchronous and returns a formatted string containing the graph logical plan alongside the DataFusion logical and physical plans.

Queries with a single `MATCH` clause containing a path pattern are planned as joins using the provided mappings. Other queries can opt into the single-table projection/filter pipeline via `ExecutionStrategy::Simple` when DataFusion's planner is unnecessary.

A builder (`CypherQueryBuilder`) is also available for constructing queries programmatically without parsing text.

## Supported Cypher Surface

- Node patterns `(:Label)` with optional variables.
- Relationship patterns with fixed direction and type, including multi-hop paths.
- Property comparisons against literal values with `AND`/`OR`/`NOT`/`EXISTS`.
- RETURN lists of property accesses, optional `DISTINCT`, `ORDER BY`, `SKIP` (offset), and `LIMIT`.
- Positional and named parameters (e.g. `$min_age`).

Basic aggregations like `COUNT` are supported. Optional matches and subqueries are parsed but not executed yet.

## Crate Layout

- `ast` – Cypher AST definitions.
- `parser` – Nom-based Cypher parser.
- `semantic` – Lightweight semantic checks on the AST.
- `logical_plan` – Builders for graph logical plans.
- `datafusion_planner` – DataFusion-based execution planning.
- `simple_executor` – Simple single-table executor.
- `config` – Graph configuration types and builders.
- `query` – High level `CypherQuery` API and runtime.
- `error``GraphError` and result helpers.
- `source_catalog` – Helpers for looking up table metadata.

## Error Handling

Most APIs return `Result<T, GraphError>`. Errors include parsing failures, missing mappings, and execution issues surfaced from DataFusion.

## Testing

```bash
cargo test -p lance-graph
```

## Benchmarks

See the repository root `README.md` for benchmark setup, run commands, and report locations.

## Python Bindings

See the Python package docs for setup and development:

- Python package README: `python/README.md`
- Runnable examples (from repo root): `examples/README.md`

## License

Apache-2.0. See the top-level LICENSE file for details.