graphify-rs
AI-powered knowledge graph builder — transform code, docs, papers, and images into queryable, interactive knowledge graphs.
中文文档 | CLI Reference | Changelog
What is graphify-rs?
graphify-rs is built around Andrej Karpathy's /raw folder workflow: drop anything into a folder — papers, tweets, screenshots, code, notes — and get a structured knowledge graph that shows you what you didn't know was connected.
It is a Rust rewrite of graphify (Python), with full feature parity and significant performance improvements.
Three things it does that an LLM alone cannot
- Persistent graph — relationships are stored in
graph.jsonand survive across sessions. Ask questions weeks later without re-reading everything. - Honest audit trail — every edge is tagged
EXTRACTED,INFERRED, orAMBIGUOUS. You always know what was found in source vs. what was inferred. - Cross-document surprise — community detection finds connections between concepts in different files that you would never think to ask about directly.
Use cases
- New to a codebase — understand architecture before touching anything
- Research corpus — papers + tweets + notes → one navigable graph with citation + concept edges
- Personal /raw folder — drop everything in, let it grow, query it anytime
- Agentic workflows — AI agents query the graph via MCP server for grounded, structured context
Compared to the Python Version
| Area | Python (original) | Rust (this repo) |
|---|---|---|
| Performance | ~204ms, ~48MB RAM | ~24ms, ~1MB RAM (8.5x faster, 48x less memory) |
| AST parsing | Regex only | 11 languages native tree-sitter + regex fallback |
| Semantic extraction | Sequential | Concurrent with configurable parallelism (-j) |
| Community detection | Louvain (graspologic) | Leiden (hand-written, with refinement phase) |
| MCP server | Not included | 11 tools over JSON-RPC 2.0 stdio |
| Export formats | 7 | 9 (+ Obsidian vault, split HTML per community) |
| CLI | Basic | 21 subcommands, --quiet/--verbose, shell completions |
| Watch mode | Full rebuild | Incremental (only changed files re-extracted) |
Output format is fully compatible — graph.json uses the same NetworkX node_link_data schema.
Installation
From crates.io
From source
Quick Start
For the full CLI reference, see docs/CLI.md.
How It Works
Pipeline Overview
Source Files graphify-rs build
┌──────────┐ ┌─────────────────────────────────────────────────────────┐
│ .py .rs │ │ │
│ .go .ts │───▶│ detect → extract → build → cluster → analyze → export │
│ .md .pdf │ │ │
│ .png │ └──────────┬──────────────────────────────────────────────┘
└──────────┘ │
▼
graphify-out/
├── graph.json (queryable graph data)
├── graph.html (interactive visualization)
├── GRAPH_REPORT.md (analysis report)
├── wiki/ (per-community wiki pages)
└── obsidian/ (Obsidian vault)
Two-Pass Extraction
Pass 1 — Deterministic AST extraction (free, fast, always runs):
Uses tree-sitter to parse source code into ASTs, then extracts functions, classes, imports, and call relationships. Supports 21 languages with 11 native tree-sitter grammars and regex fallback for the rest. Every edge from this pass is tagged EXTRACTED with confidence 1.0.
Pass 2 — Semantic extraction via Claude API (optional, --no-llm to skip):
Sends document/paper/image content to the Claude API to discover higher-level relationships that syntax alone cannot reveal — conceptual links, shared assumptions, design rationale. Edges from this pass are tagged INFERRED with confidence scores from 0.4 to 0.9.
Confidence System
Every edge in the graph carries a confidence tag:
| Tag | Meaning | Score |
|---|---|---|
EXTRACTED |
Found directly in source (import, call, citation) | 1.0 |
INFERRED |
Reasonable inference from context | 0.4–0.9 |
AMBIGUOUS |
Uncertain — flagged for human review | 0.1–0.3 |
This ensures you always know which relationships are facts vs. guesses.
Leiden Community Detection
After building the graph, graphify-rs runs the Leiden algorithm to partition nodes into communities:
- Louvain phase — greedy modularity optimization, moving nodes to neighboring communities for maximum modularity gain
- Refinement phase — BFS within each community to ensure internal connectivity; disconnected sub-communities are split
- Small community merging — communities with < 5 nodes are merged into their most-connected neighbor
Each community receives a cohesion score (ratio of actual intra-community edges to maximum possible), and the report surfaces "god nodes" (highest-degree hubs) and "surprising connections" (edges that bridge different communities).
Architecture
14 crates organized as a Cargo workspace:
| Crate | Purpose |
|---|---|
graphify-core |
Data models (GraphNode, GraphEdge, KnowledgeGraph), ID generation, confidence system |
graphify-detect |
File discovery, classification (code/doc/paper/image), .graphifyignore, sensitive file filtering |
graphify-extract |
AST extraction (tree-sitter, 21 languages), Claude API semantic extraction, deduplication |
graphify-build |
Graph assembly from extraction results, node/edge deduplication |
graphify-cluster |
Leiden community detection, cohesion scoring, community splitting/merging |
graphify-analyze |
God nodes, surprising connections, suggested questions, graph diff |
graphify-export |
9 formats: JSON, HTML, split HTML, SVG, GraphML, Cypher, Wiki, Report, Obsidian |
graphify-cache |
SHA256 content-hash caching for incremental rebuilds |
graphify-security |
URL validation (SSRF prevention), path traversal protection, label injection defense |
graphify-ingest |
URL fetching: arXiv abstracts, tweets (oEmbed), PDFs, generic webpages |
graphify-serve |
MCP server with 11 query tools over JSON-RPC 2.0 stdio |
graphify-watch |
File monitoring with debounce, incremental rebuild on code changes |
graphify-hooks |
Git hook install/uninstall/status (post-commit, post-checkout) |
graphify-benchmark |
Token efficiency measurement (graph tokens vs. raw corpus tokens) |
Output Formats
| File | Description |
|---|---|
graph.json |
NetworkX-compatible node_link_data JSON |
graph.html |
Interactive vis.js visualization (dark theme, auto-pruning for large graphs) |
html/ |
Per-community HTML pages with overview navigation |
GRAPH_REPORT.md |
Analysis report: communities, god nodes, surprises, suggested questions |
graph.svg |
Static circular-layout graph visualization |
graph.graphml |
For graph editors (yEd, Gephi) |
cypher.txt |
Neo4j Cypher import script |
wiki/ |
Wiki-style markdown pages per community |
obsidian/ |
Obsidian vault with wikilinks and frontmatter |
CLI Reference
See docs/CLI.md for the complete command reference with all flags, defaults, and examples.
Quick overview:
Agent Integration
graphify-rs integrates with AI coding agents (Claude Code, Codex, OpenCode, etc.) via skill installation and MCP server.
Once installed, agents automatically check the knowledge graph before answering architecture questions and rebuild it after code changes.
For full agent setup instructions, see the Agent Integration section of the CLI reference.
MCP Server Tools
| Tool | Description |
|---|---|
query_graph |
Search nodes by keywords, return subgraph context |
get_node |
Get detailed info about a specific node |
get_neighbors |
Get a node's neighbors and connecting edges |
get_community |
List all nodes in a community |
god_nodes |
Find the most-connected hub nodes |
graph_stats |
Overall graph statistics |
shortest_path |
Find shortest path between two nodes |
Supported Languages (21)
| Native (tree-sitter) | Regex Fallback |
|---|---|
| Python, JavaScript, TypeScript, Rust, Go, Java | Kotlin, Scala, PHP, Swift, Lua |
| C, C++, Ruby, C#, Dart | Zig, PowerShell, Elixir, Obj-C, Julia |
Contributing
See CONTRIBUTING.md for development setup, code style, and PR guidelines.
License
MIT — see LICENSE.
This project is a Rust rewrite of graphify by safishamsi.