Scribe Graph - Advanced Code Dependency Analysis
High-performance graph-based code analysis with PageRank centrality computation. This crate provides sophisticated tools for understanding code structure, dependency relationships, and file importance through research-grade graph algorithms.
Key Features
PageRank Centrality Analysis
- Research-grade PageRank implementation optimized for code dependency graphs
- Reverse edge emphasis (importance flows to imported files)
- Convergence detection with configurable precision
- Multi-language import detection (Python, JavaScript, TypeScript, Rust, Go, Java)
Graph Construction and Analysis
- Efficient dependency graph representation with adjacency lists
- Comprehensive statistics (degree distribution, connectivity, structural patterns)
- Performance optimized for large codebases (10k+ files)
- Concurrent processing support for multi-core systems
Integration with FastPath Heuristics
- Seamless V2 integration with existing heuristic scoring system
- Configurable centrality weighting in final importance scores
- Multiple normalization methods (min-max, z-score, rank-based)
- Entrypoint boosting for main/index files
Quick Start
use scribe_graph::{CentralityCalculator, PageRankConfig};
# use scribe_analysis::heuristics::ScanResult;
# use std::collections::HashMap;
#
# // Mock implementation for documentation
# #[derive(Debug)]
# struct MockScanResult {
# path: String,
# relative_path: String,
# }
#
# impl ScanResult for MockScanResult {
# fn path(&self) -> &str { &self.path }
# fn relative_path(&self) -> &str { &self.relative_path }
# fn depth(&self) -> usize { 1 }
# fn is_docs(&self) -> bool { false }
# fn is_readme(&self) -> bool { false }
# fn is_entrypoint(&self) -> bool { false }
# fn is_examples(&self) -> bool { false }
# fn is_tests(&self) -> bool { false }
# fn priority_boost(&self) -> f64 { 0.0 }
# fn get_documentation_score(&self) -> f64 { 0.0 }
# fn get_file_size(&self) -> usize { 1000 }
# fn get_imports(&self) -> Vec<String> { vec![] }
# fn get_git_churn(&self) -> usize { 0 }
# }
#
# fn example() -> Result<(), Box<dyn std::error::Error>> {
// Create centrality calculator optimized for code analysis
let calculator = CentralityCalculator::for_large_codebases()?;
// Example scan results (replace with actual scan results)
let scan_results = vec![
MockScanResult { path: "main.rs".to_string(), relative_path: "main.rs".to_string() },
MockScanResult { path: "lib.rs".to_string(), relative_path: "lib.rs".to_string() },
];
let heuristic_scores = HashMap::new();
// Calculate PageRank centrality for scan results
let centrality_results = calculator.calculate_centrality(&scan_results)?;
// Get top files by centrality
let top_files = centrality_results.top_files_by_centrality(10);
// Integrate with existing heuristic scores
let integrated_scores = calculator.integrate_with_heuristics(
¢rality_results,
&heuristic_scores
)?;
# Ok(())
# }
Performance Characteristics
- Memory usage: ~2MB for 1000-file codebases, ~20MB for 10k+ files
- Computation time: ~10ms for small projects, ~100ms for large codebases
- Convergence: Typically 8-15 iterations for most dependency graphs
- Parallel efficiency: Near-linear speedup on multi-core systems