graphmind 0.7.0

# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

Graphmind is a high-performance distributed graph database written in Rust with ~90% OpenCypher query support, Redis protocol (RESP) compatibility, multi-tenancy, vector search, NLQ, and graph algorithms. Currently at Phase 4 (High Availability Foundation), version v0.6.1.

## Build & Development Commands

```bash
# Build
cargo build                    # Debug build
cargo build --release          # Release build (optimized)

# Run tests (1842 unit tests)
cargo test                     # All tests
cargo test graph::node         # Specific module tests
cargo test -- --nocapture      # Tests with output

# Benchmarks (all in benches/)
cargo bench                                    # All benchmarks
cargo bench --bench graph_benchmarks           # Criterion micro-benchmarks (15 benches)
cargo bench --bench full_benchmark             # Full suite (ingestion, vector, traversal, algorithms)
cargo bench --bench vector_benchmark           # HNSW vector search
cargo bench --bench graphalytics_benchmark     # LDBC Graphalytics algorithms
cargo bench --bench mvcc_benchmark             # MVCC & arena allocation
cargo bench --bench late_materialization_bench  # Late materialization traversal
cargo bench --bench graph_optimization_benchmark # Metaheuristic optimization solvers
cargo bench --bench ldbc_benchmark             # LDBC SNB Interactive queries (needs data)
cargo bench --bench ldbc_bi_benchmark          # LDBC SNB BI queries (needs data)
cargo bench --bench finbench_benchmark         # LDBC FinBench queries (synthetic data)

# Run examples
cargo run --example banking_demo              # Banking fraud detection + NLQ
cargo run --example clinical_trials_demo      # Clinical trials + vector search
cargo run --example supply_chain_demo         # Supply chain + optimization
cargo run --example smart_manufacturing_demo  # Digital twin + scheduling
cargo run --example social_network_demo       # Social network analysis
cargo run --example knowledge_graph_demo      # Enterprise knowledge graph
cargo run --example enterprise_soc_demo       # Security operations center
cargo run --example agentic_enrichment_demo   # GAK (Generation-Augmented Knowledge)
cargo run --example persistence_demo          # Persistence & multi-tenancy
cargo run --example cluster_demo              # Raft clustering
cargo run --example ldbc_loader               # Load LDBC SNB SF1 dataset
cargo run --example finbench_loader           # Load/generate FinBench dataset
cargo run --release --example cricket_loader  # Load 21K Cricsheet matches
cargo run --release --example aact_loader     # Load AACT clinical trials dataset

# Start RESP server
cargo run                      # RESP on 127.0.0.1:6379, HTTP on :8080

# Code quality
cargo fmt -- --check           # Check formatting
cargo clippy -- -D warnings    # Lint checks

# Integration tests (requires running server)
cd tests/integration
python3 test_resp_basic.py
python3 test_resp_visual.py

# Frontend UI (React)
cd ui && npm install       # Install frontend dependencies
cd ui && npm run dev       # Dev server on :5173 (proxies /api to :8080)
cd ui && npm run build     # Production build → ui/dist/
```

## Architecture

### Module Structure

```
src/
├── graph/           # Property Graph Model
│   ├── store.rs     # GraphStore - in-memory storage with indices + cardinality stats
│   ├── node.rs      # Node with labels and properties
│   ├── edge.rs      # Directed edges with types
│   ├── property.rs  # PropertyValue (String, Integer, Float, Boolean, DateTime, Array, Map, Null)
│   └── types.rs     # NodeId, EdgeId, Label, EdgeType
│
├── query/           # OpenCypher Query Engine (~90% coverage)
│   ├── parser.rs    # Pest-based OpenCypher parser
│   ├── cypher.pest  # PEG grammar (atomic keyword rules for word boundaries)
│   ├── ast.rs       # Query AST
│   └── executor/
│       ├── planner.rs   # Query planner (AST → ExecutionPlan)
│       ├── operator.rs  # Physical operators (Volcano iterator model)
│       └── record.rs    # Record, RecordBatch, Value (with late materialization)
│
├── protocol/        # RESP Protocol
│   ├── resp.rs      # RESP3 encoder/decoder
│   ├── server.rs    # Tokio TCP server
│   └── command.rs   # GRAPH.* command handler
│
├── persistence/     # Persistence & Multi-Tenancy
│   ├── storage.rs   # RocksDB with column families
│   ├── wal.rs       # Write-Ahead Log
│   └── tenant.rs    # Multi-tenancy & resource quotas
│
├── raft/            # High Availability
│   ├── node.rs      # RaftNode using openraft
│   ├── state_machine.rs  # GraphStateMachine
│   ├── cluster.rs   # ClusterConfig, ClusterManager
│   ├── network.rs   # Inter-node communication
│   └── storage.rs   # Raft log storage
│
├── nlq/             # Natural Language Query Pipeline
│   ├── mod.rs       # NLQPipeline (text_to_cypher, extract_cypher, is_safe_query)
│   └── client.rs    # NLQClient (OpenAI, Gemini, Ollama, Claude Code providers)
│
├── vector/          # HNSW Vector Index
├── snapshot/        # Portable .sgsnap export/import
└── sharding/        # Tenant-level sharding
```

### Key Architectural Patterns

1. **Volcano Iterator Model (ADR-007)**: Lazy, pull-based operators:
   - `NodeScanOperator` → `FilterOperator` → `ExpandOperator` → `ProjectOperator` → `LimitOperator`

2. **Late Materialization (ADR-012)**: Scan produces `Value::NodeRef(id)` not full clones. Properties resolved on demand via `resolve_property()`.

3. **In-Memory Graph Storage**: O(1) lookups via HashMaps with adjacency lists for traversal.

4. **Multi-Tenancy**: RocksDB column families with tenant-prefixed keys, per-tenant quotas.

5. **Raft Consensus**: Uses `openraft` crate with custom `GraphStateMachine`.

6. **Cross-Type Coercion**: Integer/Float promotion, String/Boolean coercion, Null propagation (three-valued logic).

### Frontend (React UI)

The web-based visualizer is a React 19 + TypeScript application in `ui/`.

```
ui/src/
├── api/client.ts          # Typed fetch client for all API endpoints
├── stores/                # Zustand state management
│   ├── graphStore.ts      # Nodes, edges, selection, multi-select
│   ├── queryStore.ts      # Query execution, history, saved queries
│   ├── uiStore.ts         # Connection status, schema, panel state
│   └── graphSettingsStore.ts  # Colors, icons, captions (persisted)
├── components/
│   ├── editor/            # CodeMirror 6 Cypher editor, query templates, saved queries
│   ├── graph/             # D3 force graph, fullscreen explorer, toolbar, stats, minimap
│   ├── inspector/         # Property inspector, schema browser with color/icon pickers
│   ├── layout/            # AppShell, Navbar, resizable panels
│   ├── results/           # TanStack Table for query results
│   └── ui/                # Reusable components (button, badge, tabs, resizable, theme toggle, icon picker)
├── lib/
│   ├── cypher.ts          # Cypher keywords/functions/procedures for autocomplete
│   ├── colors.ts          # Node label color mapping with custom override support
│   ├── icons.ts           # 55+ SVG icon catalog for node visualization
│   ├── shortcuts.ts       # Keyboard shortcuts system
│   └── utils.ts           # Tailwind merge utility
└── types/api.ts           # TypeScript interfaces matching backend API
```

**Tech stack:** React 19, Vite 6, TypeScript, Tailwind CSS v4, shadcn/ui, D3.js (d3-force), CodeMirror 6, TanStack Table, Zustand

**Dev commands:**
```bash
cd ui
npm install                # Install dependencies
npm run dev                # Dev server on :5173 (proxies /api to :8080)
npm run build              # Production build → ui/dist/
```

## Cypher Support

**Supported clauses:** MATCH, OPTIONAL MATCH, CREATE, DELETE, SET, REMOVE, MERGE, WITH, UNWIND, UNION, RETURN DISTINCT, ORDER BY, SKIP, LIMIT, EXPLAIN, EXISTS subqueries.

**Supported functions (30+):** toUpper, toLower, trim, replace, substring, left, right, reverse, toString, toInteger, toFloat, abs, ceil, floor, round, sqrt, sign, count, sum, avg, min, max, collect, size, length, head, last, tail, keys, id, labels, type, exists, coalesce.

**Remaining gaps:** `XOR` operator, `split()` function, `nodes()`/`relationships()` path functions.

## API Patterns

### Query Engine
```rust
// Read-only queries
let executor = QueryExecutor::new(&store);
let result: RecordBatch = executor.execute(&query)?;

// Write queries (CREATE, DELETE, SET, MERGE)
let mut executor = MutQueryExecutor::new(&mut store, tenant_id);
executor.execute(&query)?;

// EXPLAIN (no execution)
// Returns plan as RecordBatch with operator descriptions
```

### NLQ Pipeline
```rust
let pipeline = NLQPipeline::new(nlq_config)?;
let cypher = pipeline.text_to_cypher("Who knows Alice?", &schema_summary).await?;
// Returns clean Cypher with markdown fences stripped and safety validation
```

### Graph Store
```rust
let mut graph = GraphStore::new();
let node_id = graph.create_node("Person");
graph.get_node_mut(node_id)?.set_property("name", "Alice");
graph.create_edge(source_id, target_id, "KNOWS")?;
```

### HTTP API Endpoints

The Axum HTTP server runs on port 8080 alongside the RESP server on port 6379.

| Method | Path | Description |
|--------|------|-------------|
| POST | `/api/query` | Execute a single Cypher query |
| POST | `/api/script` | Execute multi-statement Cypher script (line-by-line) |
| POST | `/api/nlq` | Translate natural language to Cypher via LLM |
| GET | `/api/status` | Server health, node/edge counts, cache stats |
| GET | `/api/schema` | Schema introspection (labels, types, properties, stats) |
| POST | `/api/sample` | Sample a subgraph for visualization |
| POST | `/api/import/csv` | Import nodes from CSV |
| POST | `/api/import/json` | Import nodes from JSON |
| POST | `/api/snapshot/export` | Export graph as .sgsnap |
| POST | `/api/snapshot/import` | Import .sgsnap snapshot |

#### Script Endpoint
```bash
# Execute a .cypher file with multiple statements
curl -X POST http://localhost:8080/api/script \
  -H 'Content-Type: application/json' \
  -d '{"query": "CREATE (a:Person {name: '\''Alice'\''})\nCREATE (b:Person {name: '\''Bob'\''})"}'
# Response: {"status":"ok","executed":2,"errors":[],"storage":{"nodes":2,"edges":0}}
```

#### NLQ Endpoint (Natural Language Queries)
Requires an LLM provider configured via environment variables:
- `OPENAI_API_KEY` — Uses OpenAI (model: `OPENAI_MODEL` or `gpt-4o-mini`)
- `GEMINI_API_KEY` — Uses Google Gemini (model: `GEMINI_MODEL` or `gemini-2.0-flash`)
- `CLAUDE_CODE_NLQ=1` — Uses Claude Code CLI

```bash
OPENAI_API_KEY=sk-... cargo run
# Then:
curl -X POST http://localhost:8080/api/nlq \
  -H 'Content-Type: application/json' \
  -d '{"query": "Who are Alice'\''s friends?"}'
# Response: {"cypher":"MATCH (a:Person {name:'Alice'})-[:FRIENDS_WITH]-(b) RETURN a,b","provider":"OpenAI","model":"gpt-4o-mini"}
```

## Testing

- **1842 unit tests** across all modules (89.7% coverage)
- **10 benchmark binaries** in `benches/` (Criterion micro-benchmarks + domain benchmarks)
- **Integration tests**: Python scripts in `tests/integration/`
- **8 domain-specific example demos** with NLQ integration
- **4 data loaders** (LDBC SNB, FinBench, Cricket, AACT) in `examples/`
- **28 CRUD integration tests** in `tests/cypher_crud_test.rs` (CREATE, MATCH, SET, DELETE, MERGE, WITH, UNWIND, OPTIONAL MATCH, EXPLAIN)

## Demo Scripts

```bash
# Social Network Demo — 52 nodes, 142 edges
# Load via UI script button or API:
curl -X POST http://localhost:8080/api/script \
  -H 'Content-Type: application/json' \
  -d @scripts/social_network_demo.cypher

# Or paste into the UI editor and use the Upload button
```

The demo creates: 16 Person, 6 City, 5 Company, 7 Property, 6 Car, 5 Pet, 8 Hobby, 4 University nodes with MARRIED_TO, LIVES_IN, FRIENDS_WITH, WORKS_AT, OWNS, ATTENDED, ENJOYS, INVESTED_IN, HEADQUARTERED_IN relationships. Includes 25 test queries (Q1-Q25).

## UI Features (Visualizer)

The web visualizer at `http://localhost:8080` provides:

- **Cypher Editor**: CodeMirror 6 with syntax highlighting, autocomplete (keywords, functions, schema labels/types, variables), Ctrl+Enter to execute
- **D3 Force Graph**: Canvas-based rendering with zoom/pan, node dragging, click selection, right-click context menu (expand neighbors, load all relationships)
- **Graph Layouts**: Force-directed (default), circular, hierarchical, grid
- **Fullscreen Explorer**: Immersive mode with floating glassmorphism legend, search, minimap, and property inspector
- **Query Templates**: Pre-built queries organized by category (Exploration, Pathfinding, Analytics, Social Network, Algorithms)
- **Saved Queries**: Name and save queries, persisted to localStorage
- **NLQ Mode**: Natural language to Cypher translation (requires LLM API key)
- **Search**: Type to highlight matching nodes on canvas, dims non-matching
- **Shortest Path**: Click two nodes to find and visualize shortest path (BFS)
- **Highlight Mode**: Select a node to highlight connected nodes, dim the rest
- **Multi-select**: Shift+click to select multiple nodes
- **Node Icons**: 55+ built-in SVG icons assignable per label, auto-detect image URLs
- **Color Customization**: Per-label node colors and per-type edge colors (persisted)
- **Caption Property**: Choose which property displays on each node label
- **Export**: PNG screenshot, CSV data, JSON graph
- **Graph Stats**: Floating panel with node/edge distribution and degree metrics
- **Dark/Light Theme**: Toggle with system preference support
- **Keyboard Shortcuts**: F5 (run), Escape (clear), Delete (remove from canvas), ? (help)
- **Script Loader**: Upload .cypher files to execute all statements
- **Query History**: With play/delete buttons and timeline scrubber

## Security

Authentication is **disabled by default**. For production deployments:

```bash
# Token auth
GRAPHMIND_AUTH_TOKEN=your-secret cargo run

# Username/password auth
GRAPHMIND_ADMIN_USER=admin GRAPHMIND_ADMIN_PASSWORD=secret cargo run
```

Both HTTP and RESP servers require authentication when enabled. The UI shows a login screen automatically.