scribe-graph
Graph-based code representation and centrality analysis for Scribe.
Overview
scribe-graph builds dependency graphs from repository code and computes centrality metrics to identify the most important files. It implements research-grade graph algorithms specifically adapted for code analysis, enabling Scribe to intelligently prioritize files based on their structural importance.
Key Features
Dependency Graph Construction
- Multi-language import extraction: Python, JavaScript/TypeScript, Rust, Go, Java
- Bidirectional edges: Tracks both imports (who depends on me) and exports (what I depend on)
- Transitive closure computation: Find all direct and indirect dependencies
- Strongly connected components (SCC): Detect circular dependency cycles using Kosaraju's algorithm
PageRank Centrality
- Research-grade implementation adapted specifically for code dependency graphs
- Reverse edge emphasis: Importance flows TO imported files (not FROM)
- Configurable damping factor: Default 0.85 (standard PageRank value)
- Convergence detection: L1 norm delta with configurable precision (1e-6 default)
- Parallel computation: Multi-core PageRank using Rayon
- Performance: 10ms for small graphs, ~100ms for 10k+ node graphs
Graph Algorithms
- Betweenness centrality: Identifies bridge files connecting different parts of the codebase
- Degree centrality: Counts import/export connections
- Community detection: Groups related files into clusters
- Distance computation: BFS-based shortest path for understanding relationships
- Graph metrics: Diameter, clustering coefficient, density
Surgical Covering Set Selection
- Entity-targeted selection: Find minimal file sets needed to understand specific functions/classes
- Transitive dependency/dependent computation: Configurable depth limits
- Inclusion reasons: Explains why each file was selected (target, direct dep, transitive, etc.)
- Impact analysis mode: Analyzes what breaks if you change a target
Architecture
FileMetadata → Import Extraction → Dependency Graph → Centrality Analysis → Selection
↓ ↓ ↓ ↓ ↓
AST Parse Language-specific petgraph DiGraph PageRank/SCC Coverage Set
Patterns (Node: File) Scoring/Ranking Computation
(Edge: Import)
Core Components
DependencyGraph
Main graph structure built on petgraph::DiGraph:
- Nodes: File paths with metadata
- Edges: Import relationships with weight
- Indexing: Fast lookup by file path using
HashMap - Serialization: Save/load graphs for caching
ImportExtractor
Language-specific import parsing:
- Python:
import,from ... importstatements - JavaScript/TypeScript:
import,require(), ES modules - Rust:
use,mod,extern cratedeclarations - Go:
importblocks with package resolution - Java:
importstatements with wildcard support
CentralityComputer
Implements graph centrality algorithms:
- PageRank: Iterative power method with convergence checks
- Betweenness: Brandes' algorithm for shortest path betweenness
- Closeness: Normalized distance to all other nodes
- Eigenvector: Spectral centrality via power iteration
CoveringSetSelector
Computes minimal file sets for entity understanding:
- Entity search: AST-based function/class/module lookup with fuzzy matching
- Closure computation: Transitive dependencies and dependents
- Importance filtering: Excludes noise below threshold
- Reason tracking: Records why each file was included
Usage
Building a Dependency Graph
use ;
let extractor = new;
let mut graph = new;
for file in files
println!;
Computing PageRank Centrality
use ;
let config = PageRankConfig ;
let scores = compute_pagerank?;
// Get top 10 most central files
let mut ranked: = scores.iter.collect;
ranked.sort_by;
for in ranked.iter.take
Surgical Covering Set Selection
use ;
let config = CoveringSetConfig ;
let result = compute_covering_set?;
println!;
for in result.files
Transitive Dependencies
use compute_transitive_dependencies;
let target = from;
let deps = compute_transitive_dependencies?;
println!;
for in deps
PageRank for Code
Traditional PageRank flows importance FROM pages that link TO you. For code:
Key Insight: Important files are those that many other files depend on, not files that depend on many things.
Implementation: We reverse edge direction when computing PageRank:
- If
A imports B, we add edgeB → Ain the PageRank graph - Importance flows FROM importers TO imported files
- Highly imported files (like
utils.rs,config.py) get high scores
Example:
src/main.rs imports utils/helpers.rs
src/api.rs imports utils/helpers.rs
src/db.rs imports utils/helpers.rs
→ helpers.rs gets high PageRank (many files depend on it)
→ main.rs gets lower PageRank (only helpers depends on it)
Performance
Targets
- Small graphs (<100 nodes): <10ms PageRank computation
- Medium graphs (100-1k): <50ms PageRank computation
- Large graphs (1k-10k): <200ms PageRank computation
- Very large (10k+): <1s PageRank computation
Optimizations
- Sparse computation: Only iterate over non-zero edges
- Parallel PageRank: Multi-threaded score updates using Rayon
- Early convergence: Stop iteration when delta falls below tolerance
- Graph caching: Serialize computed metrics for reuse
- Lazy SCC: Only compute strongly connected components when needed
Configuration
PageRankConfig
| Field | Type | Default | Description |
|---|---|---|---|
damping |
f64 |
0.85 |
Damping factor (probability of following links) |
max_iterations |
usize |
100 |
Maximum iterations before stopping |
tolerance |
f64 |
1e-6 |
Convergence threshold (L1 norm delta) |
parallel |
bool |
true |
Use parallel computation |
CoveringSetConfig
| Field | Type | Default | Description |
|---|---|---|---|
entity_name |
String |
- | Function/class/module to target |
entity_type |
EntityType |
- | Function, Class, Module, etc. |
max_depth |
Option<usize> |
None |
Limit transitive dependency depth |
include_dependents |
bool |
false |
Include files that depend on target |
importance_threshold |
f64 |
0.01 |
Exclude files below centrality score |
Integration
scribe-graph is used by:
- scribe-selection: Applies PageRank scores to file selection heuristics
- scribe-scaling: Uses graph metrics for context positioning optimization
- scribe-webservice: Provides API endpoints for graph visualization
- CLI covering-set command: Surgical file selection for specific entities
See Also
scribe-analysis: Provides AST parsing for import extractionscribe-selection: Uses centrality scores for file rankingscribe-core: Shared types and configuration../../scribe-rs/scribe-scaling/docs/context-positioning.md: Context optimization using centrality