lance-graph 0.2.0-beta.1

Graph query engine for Lance datasets with Cypher support
Documentation

Lance Graph Query Engine

A graph query engine for Lance datasets with Cypher syntax support. This crate enables querying Lance's columnar datasets using familiar graph query patterns, interpreting tabular data as property graphs.

Features

  • Cypher query parsing and AST construction
  • Graph configuration for mapping Lance tables to nodes and relationships
  • Semantic validation with typed GraphError diagnostics
  • Pluggable execution strategies (DataFusion planner by default, simple executor, Lance Native placeholder)
  • Async query execution that returns Arrow RecordBatch results
  • JSON-serializable parameter binding for reusable query templates
  • Logical plan debugging via CypherQuery::explain

Quick Start

use std::collections::HashMap;
use std::sync::Arc;

use arrow_array::{ArrayRef, Int32Array, RecordBatch, StringArray};
use arrow_schema::{DataType, Field, Schema};
use lance_graph::{CypherQuery, ExecutionStrategy, GraphConfig};

let config = GraphConfig::builder()
    .with_node_label("Person", "person_id")
    .with_relationship("KNOWS", "src_person_id", "dst_person_id")
    .build()?;

let schema = Arc::new(Schema::new(vec![
    Field::new("person_id", DataType::Int32, false),
    Field::new("name", DataType::Utf8, false),
    Field::new("age", DataType::Int32, false),
]));
let batch = RecordBatch::try_new(
    schema,
    vec![
        Arc::new(Int32Array::from(vec![1, 2])) as ArrayRef,
        Arc::new(StringArray::from(vec!["Alice", "Bob"])) as ArrayRef,
        Arc::new(Int32Array::from(vec![29, 35])) as ArrayRef,
    ],
)?;

let mut tables = HashMap::new();
tables.insert("Person".to_string(), batch);

let query = CypherQuery::new("MATCH (p:Person) WHERE p.age > $min RETURN p.name")?
    .with_config(config)
    .with_parameter("min", 30);

let runtime = tokio::runtime::Runtime::new()?;
// Use default DataFusion-based execution
let result = runtime.block_on(query.execute(tables.clone(), None))?;

// Opt in to the simple executor if you only need projection/filter support.
let simple = runtime.block_on(query.execute(tables, Some(ExecutionStrategy::Simple)))?;

The query expects a HashMap<String, RecordBatch> keyed by the labels and relationship types referenced in the Cypher text. Each record batch should expose the columns configured through GraphConfig (ID fields, property fields, etc.). Relationship mappings also expect a batch keyed by the relationship type (for example KNOWS) that contains the configured source/target ID columns and any optional property columns.

Configuring Graph Mappings

Graph mappings are declared with GraphConfig::builder():

use lance_graph::{GraphConfig, NodeMapping, RelationshipMapping};

let config = GraphConfig::builder()
    .with_node_label("Person", "person_id")
    .with_relationship("KNOWS", "src_person_id", "dst_person_id")
    .build()?;

For finer control, build NodeMapping and RelationshipMapping instances explicitly:

let person = NodeMapping::new("Person", "person_id")
    .with_properties(vec!["name".into(), "age".into()])
    .with_filter("kind = 'person'");

let knows = RelationshipMapping::new("KNOWS", "src_person_id", "dst_person_id")
    .with_properties(vec!["since".into()]);

let config = GraphConfig::builder()
    .with_node_mapping(person)
    .with_relationship_mapping(knows)
    .build()?;

Executing Cypher Queries

  • CypherQuery::new parses Cypher text into the internal AST.
  • with_config attaches the graph configuration used for validation and execution.
  • with_parameter / with_parameters bind JSON-serializable values that can be referenced as $param in the Cypher text.
  • execute is asynchronous and returns an Arrow RecordBatch. Pass None for the default DataFusion planner or Some(ExecutionStrategy::Simple) for the single-table executor. ExecutionStrategy::LanceNative is reserved for future native execution support and currently errors.
  • explain is asynchronous and returns a formatted string containing the graph logical plan alongside the DataFusion logical and physical plans.

Queries with a single MATCH clause containing a path pattern are planned as joins using the provided mappings. Other queries can opt into the single-table projection/filter pipeline via ExecutionStrategy::Simple when DataFusion's planner is unnecessary.

A builder (CypherQueryBuilder) is also available for constructing queries programmatically without parsing text.

Supported Cypher Surface

  • Node patterns (:Label) with optional variables.
  • Relationship patterns with fixed direction and type, including multi-hop paths.
  • Property comparisons against literal values with AND/OR/NOT/EXISTS.
  • RETURN lists of property accesses, optional DISTINCT, ORDER BY, SKIP (offset), and LIMIT.
  • Positional and named parameters (e.g. $min_age).

Basic aggregations like COUNT are supported. Optional matches and subqueries are parsed but not executed yet.

Crate Layout

  • ast – Cypher AST definitions.
  • parser – Nom-based Cypher parser.
  • semantic – Lightweight semantic checks on the AST.
  • logical_plan – Builders for graph logical plans.
  • datafusion_planner – DataFusion-based execution planning.
  • simple_executor – Simple single-table executor.
  • config – Graph configuration types and builders.
  • query – High level CypherQuery API and runtime.
  • errorGraphError and result helpers.
  • source_catalog – Helpers for looking up table metadata.

Error Handling

Most APIs return Result<T, GraphError>. Errors include parsing failures, missing mappings, and execution issues surfaced from DataFusion.

Testing

cargo test -p lance-graph

Benchmarks

See the repository root README.md for benchmark setup, run commands, and report locations.

Python Bindings

See the Python package docs for setup and development:

  • Python package README: python/README.md
  • Runnable examples (from repo root): examples/README.md

License

Apache-2.0. See the top-level LICENSE file for details.