Scribe Graph - Advanced Code Dependency Analysis

High-performance graph-based code analysis with PageRank centrality computation. This crate provides sophisticated tools for understanding code structure, dependency relationships, and file importance through research-grade graph algorithms.

Key Features

PageRank Centrality Analysis

Research-grade PageRank implementation optimized for code dependency graphs
Reverse edge emphasis (importance flows to imported files)
Convergence detection with configurable precision
Multi-language import detection (Python, JavaScript, TypeScript, Rust, Go, Java)

Graph Construction and Analysis

Efficient dependency graph representation with adjacency lists
Comprehensive statistics (degree distribution, connectivity, structural patterns)
Performance optimized for large codebases (10k+ files)
Concurrent processing support for multi-core systems

Integration with FastPath Heuristics

Seamless V2 integration with existing heuristic scoring system
Configurable centrality weighting in final importance scores
Multiple normalization methods (min-max, z-score, rank-based)
Entrypoint boosting for main/index files

Quick Start

use scribe_graph::{CentralityCalculator, PageRankConfig};
# use scribe_analysis::heuristics::ScanResult;
# use std::collections::HashMap;
#
# // Mock implementation for documentation
# #[derive(Debug)]
# struct MockScanResult {
#     path: String,
#     relative_path: String,
# }
#
# impl ScanResult for MockScanResult {
#     fn path(&self) -> &str { &self.path }
#     fn relative_path(&self) -> &str { &self.relative_path }
#     fn depth(&self) -> usize { 1 }
#     fn is_docs(&self) -> bool { false }
#     fn is_readme(&self) -> bool { false }
#     fn is_entrypoint(&self) -> bool { false }
#     fn is_examples(&self) -> bool { false }
#     fn is_tests(&self) -> bool { false }
#     fn priority_boost(&self) -> f64 { 0.0 }
#     fn get_documentation_score(&self) -> f64 { 0.0 }
#     fn get_file_size(&self) -> usize { 1000 }
#     fn get_imports(&self) -> Vec<String> { vec![] }
#     fn get_git_churn(&self) -> usize { 0 }
# }
#
# fn example() -> Result<(), Box<dyn std::error::Error>> {
// Create centrality calculator optimized for code analysis
let calculator = CentralityCalculator::for_large_codebases()?;

// Example scan results (replace with actual scan results)
let scan_results = vec![
    MockScanResult { path: "main.rs".to_string(), relative_path: "main.rs".to_string() },
    MockScanResult { path: "lib.rs".to_string(), relative_path: "lib.rs".to_string() },
];
let heuristic_scores = HashMap::new();

// Calculate PageRank centrality for scan results
let centrality_results = calculator.calculate_centrality(&scan_results)?;

// Get top files by centrality
let top_files = centrality_results.top_files_by_centrality(10);

// Integrate with existing heuristic scores
let integrated_scores = calculator.integrate_with_heuristics(
    &centrality_results,
    &heuristic_scores
)?;
# Ok(())
# }

Performance Characteristics

Memory usage: ~2MB for 1000-file codebases, ~20MB for 10k+ files
Computation time: ~10ms for small projects, ~100ms for large codebases
Convergence: Typically 8-15 iterations for most dependency graphs
Parallel efficiency: Near-linear speedup on multi-core systems