scribe-graph 0.5.1

Graph-based code representation and analysis for Scribe
Documentation
# scribe-graph

Graph-based code representation and centrality analysis for Scribe.

## Overview

`scribe-graph` builds dependency graphs from repository code and computes centrality metrics to identify the most important files. It implements research-grade graph algorithms specifically adapted for code analysis, enabling Scribe to intelligently prioritize files based on their structural importance.

## Key Features

### Dependency Graph Construction
- **Multi-language import extraction**: Python, JavaScript/TypeScript, Rust, Go, Java
- **Bidirectional edges**: Tracks both imports (who depends on me) and exports (what I depend on)
- **Transitive closure computation**: Find all direct and indirect dependencies
- **Strongly connected components (SCC)**: Detect circular dependency cycles using Kosaraju's algorithm

### PageRank Centrality
- **Research-grade implementation** adapted specifically for code dependency graphs
- **Reverse edge emphasis**: Importance flows TO imported files (not FROM)
- **Configurable damping factor**: Default 0.85 (standard PageRank value)
- **Convergence detection**: L1 norm delta with configurable precision (1e-6 default)
- **Parallel computation**: Multi-core PageRank using Rayon
- **Performance**: 10ms for small graphs, ~100ms for 10k+ node graphs

### Graph Algorithms
- **Betweenness centrality**: Identifies bridge files connecting different parts of the codebase
- **Degree centrality**: Counts import/export connections
- **Community detection**: Groups related files into clusters
- **Distance computation**: BFS-based shortest path for understanding relationships
- **Graph metrics**: Diameter, clustering coefficient, density

### Surgical Covering Set Selection
- **Entity-targeted selection**: Find minimal file sets needed to understand specific functions/classes
- **Transitive dependency/dependent computation**: Configurable depth limits
- **Inclusion reasons**: Explains why each file was selected (target, direct dep, transitive, etc.)
- **Impact analysis mode**: Analyzes what breaks if you change a target

## Architecture

```
FileMetadata → Import Extraction → Dependency Graph → Centrality Analysis → Selection
      ↓              ↓                    ↓                   ↓               ↓
  AST Parse    Language-specific    petgraph DiGraph    PageRank/SCC    Coverage Set
                  Patterns          (Node: File)       Scoring/Ranking   Computation
                                    (Edge: Import)
```

### Core Components

#### `DependencyGraph`
Main graph structure built on `petgraph::DiGraph`:
- **Nodes**: File paths with metadata
- **Edges**: Import relationships with weight
- **Indexing**: Fast lookup by file path using `HashMap`
- **Serialization**: Save/load graphs for caching

#### `ImportExtractor`
Language-specific import parsing:
- **Python**: `import`, `from ... import` statements
- **JavaScript/TypeScript**: `import`, `require()`, ES modules
- **Rust**: `use`, `mod`, `extern crate` declarations
- **Go**: `import` blocks with package resolution
- **Java**: `import` statements with wildcard support

#### `CentralityComputer`
Implements graph centrality algorithms:
- **PageRank**: Iterative power method with convergence checks
- **Betweenness**: Brandes' algorithm for shortest path betweenness
- **Closeness**: Normalized distance to all other nodes
- **Eigenvector**: Spectral centrality via power iteration

#### `CoveringSetSelector`
Computes minimal file sets for entity understanding:
- **Entity search**: AST-based function/class/module lookup with fuzzy matching
- **Closure computation**: Transitive dependencies and dependents
- **Importance filtering**: Excludes noise below threshold
- **Reason tracking**: Records why each file was included

## Usage

### Building a Dependency Graph

```rust
use scribe_graph::{DependencyGraph, ImportExtractor};

let extractor = ImportExtractor::new();
let mut graph = DependencyGraph::new();

for file in files {
    let imports = extractor.extract_imports(&file)?;
    graph.add_node(file.path.clone(), file.clone());

    for import in imports {
        graph.add_edge(file.path.clone(), import.target, import.weight);
    }
}

println!("Graph has {} nodes and {} edges",
    graph.node_count(),
    graph.edge_count()
);
```

### Computing PageRank Centrality

```rust
use scribe_graph::centrality::{PageRankConfig, compute_pagerank};

let config = PageRankConfig {
    damping: 0.85,
    max_iterations: 100,
    tolerance: 1e-6,
    parallel: true,
};

let scores = compute_pagerank(&graph, config)?;

// Get top 10 most central files
let mut ranked: Vec<_> = scores.iter().collect();
ranked.sort_by(|a, b| b.1.partial_cmp(a.1).unwrap());

for (file, score) in ranked.iter().take(10) {
    println!("{}: {:.6}", file.display(), score);
}
```

### Surgical Covering Set Selection

```rust
use scribe_graph::covering::{CoveringSetConfig, compute_covering_set};

let config = CoveringSetConfig {
    entity_name: "authenticate_user".to_string(),
    entity_type: EntityType::Function,
    max_depth: 3,
    include_dependents: false, // for understanding
    importance_threshold: 0.01,
};

let result = compute_covering_set(&graph, config)?;

println!("Selected {} files to understand function:", result.files.len());
for (file, reason) in result.files {
    println!("  {}: {:?}", file.display(), reason);
}
```

### Transitive Dependencies

```rust
use scribe_graph::traversal::compute_transitive_dependencies;

let target = PathBuf::from("src/auth/login.rs");
let deps = compute_transitive_dependencies(&graph, &target, Some(3))?;

println!("Dependencies of {}: ", target.display());
for (depth, dep) in deps {
    println!("  [depth {}] {}", depth, dep.display());
}
```

## PageRank for Code

Traditional PageRank flows importance FROM pages that link TO you. For code:

**Key Insight**: Important files are those that *many other files depend on*, not files that depend on many things.

**Implementation**: We reverse edge direction when computing PageRank:
- If `A imports B`, we add edge `B → A` in the PageRank graph
- Importance flows FROM importers TO imported files
- Highly imported files (like `utils.rs`, `config.py`) get high scores

**Example**:
```
src/main.rs imports utils/helpers.rs
src/api.rs imports utils/helpers.rs
src/db.rs imports utils/helpers.rs

→ helpers.rs gets high PageRank (many files depend on it)
→ main.rs gets lower PageRank (only helpers depends on it)
```

## Performance

### Targets
- **Small graphs (<100 nodes)**: <10ms PageRank computation
- **Medium graphs (100-1k)**: <50ms PageRank computation
- **Large graphs (1k-10k)**: <200ms PageRank computation
- **Very large (10k+)**: <1s PageRank computation

### Optimizations
- **Sparse computation**: Only iterate over non-zero edges
- **Parallel PageRank**: Multi-threaded score updates using Rayon
- **Early convergence**: Stop iteration when delta falls below tolerance
- **Graph caching**: Serialize computed metrics for reuse
- **Lazy SCC**: Only compute strongly connected components when needed

## Configuration

### `PageRankConfig`

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `damping` | `f64` | `0.85` | Damping factor (probability of following links) |
| `max_iterations` | `usize` | `100` | Maximum iterations before stopping |
| `tolerance` | `f64` | `1e-6` | Convergence threshold (L1 norm delta) |
| `parallel` | `bool` | `true` | Use parallel computation |

### `CoveringSetConfig`

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `entity_name` | `String` | - | Function/class/module to target |
| `entity_type` | `EntityType` | - | Function, Class, Module, etc. |
| `max_depth` | `Option<usize>` | `None` | Limit transitive dependency depth |
| `include_dependents` | `bool` | `false` | Include files that depend on target |
| `importance_threshold` | `f64` | `0.01` | Exclude files below centrality score |

## Integration

`scribe-graph` is used by:

- **scribe-selection**: Applies PageRank scores to file selection heuristics
- **scribe-scaling**: Uses graph metrics for context positioning optimization
- **scribe-webservice**: Provides API endpoints for graph visualization
- **CLI covering-set command**: Surgical file selection for specific entities

## See Also

- `scribe-analysis`: Provides AST parsing for import extraction
- `scribe-selection`: Uses centrality scores for file ranking
- `scribe-core`: Shared types and configuration
- `../../scribe-rs/scribe-scaling/docs/context-positioning.md`: Context optimization using centrality