scribe-graph 0.4.0

Graph-based code representation and analysis for Scribe
Documentation

Scribe Graph - Advanced Code Dependency Analysis

High-performance graph-based code analysis with PageRank centrality computation. This crate provides sophisticated tools for understanding code structure, dependency relationships, and file importance through research-grade graph algorithms.

Key Features

PageRank Centrality Analysis

  • Research-grade PageRank implementation optimized for code dependency graphs
  • Reverse edge emphasis (importance flows to imported files)
  • Convergence detection with configurable precision
  • Multi-language import detection (Python, JavaScript, TypeScript, Rust, Go, Java)

Graph Construction and Analysis

  • Efficient dependency graph representation with adjacency lists
  • Comprehensive statistics (degree distribution, connectivity, structural patterns)
  • Performance optimized for large codebases (10k+ files)
  • Concurrent processing support for multi-core systems

Integration with FastPath Heuristics

  • Seamless V2 integration with existing heuristic scoring system
  • Configurable centrality weighting in final importance scores
  • Multiple normalization methods (min-max, z-score, rank-based)
  • Entrypoint boosting for main/index files

Quick Start

use scribe_graph::{CentralityCalculator, PageRankConfig};
# use scribe_analysis::heuristics::ScanResult;
# use std::collections::HashMap;
#
# // Mock implementation for documentation
# #[derive(Debug)]
# struct MockScanResult {
#     path: String,
#     relative_path: String,
# }
#
# impl ScanResult for MockScanResult {
#     fn path(&self) -> &str { &self.path }
#     fn relative_path(&self) -> &str { &self.relative_path }
#     fn depth(&self) -> usize { 1 }
#     fn is_docs(&self) -> bool { false }
#     fn is_readme(&self) -> bool { false }
#     fn is_entrypoint(&self) -> bool { false }
#     fn is_examples(&self) -> bool { false }
#     fn is_tests(&self) -> bool { false }
#     fn priority_boost(&self) -> f64 { 0.0 }
#     fn get_documentation_score(&self) -> f64 { 0.0 }
#     fn get_file_size(&self) -> usize { 1000 }
#     fn get_imports(&self) -> Vec<String> { vec![] }
#     fn get_git_churn(&self) -> usize { 0 }
# }
#
# fn example() -> Result<(), Box<dyn std::error::Error>> {
// Create centrality calculator optimized for code analysis
let calculator = CentralityCalculator::for_large_codebases()?;

// Example scan results (replace with actual scan results)
let scan_results = vec![
    MockScanResult { path: "main.rs".to_string(), relative_path: "main.rs".to_string() },
    MockScanResult { path: "lib.rs".to_string(), relative_path: "lib.rs".to_string() },
];
let heuristic_scores = HashMap::new();

// Calculate PageRank centrality for scan results
let centrality_results = calculator.calculate_centrality(&scan_results)?;

// Get top files by centrality
let top_files = centrality_results.top_files_by_centrality(10);

// Integrate with existing heuristic scores
let integrated_scores = calculator.integrate_with_heuristics(
    &centrality_results,
    &heuristic_scores
)?;
# Ok(())
# }

Performance Characteristics

  • Memory usage: ~2MB for 1000-file codebases, ~20MB for 10k+ files
  • Computation time: ~10ms for small projects, ~100ms for large codebases
  • Convergence: Typically 8-15 iterations for most dependency graphs
  • Parallel efficiency: Near-linear speedup on multi-core systems