scribe-analysis 0.1.0

Code analysis algorithms and AST processing for Scribe
Documentation
# Scribe Analysis - Heuristic Scoring System

A sophisticated multi-dimensional file scoring system for code repository analysis, implementing advanced heuristics for file importance ranking.

## ๐ŸŽฏ Key Features

### Multi-Dimensional Scoring Formula
```text
final_score = ฮฃ(weight_i ร— normalized_score_i) + priority_boost + template_boost
```

**Score Components:**
- **Documentation Score**: README prioritization and document structure analysis
- **Import Centrality**: Dependency graph analysis with PageRank (V2)
- **Path Depth**: Preference for shallow, accessible files
- **Test Relationships**: Heuristic test-code linkage detection
- **Git Churn**: Change recency and frequency signals
- **Template Detection**: Advanced template engine recognition
- **Entrypoint Detection**: Main/index file identification
- **Examples Detection**: Usage example file recognition

### Advanced Template Detection System
- **15+ Template Engines**: Django, Jinja, Handlebars, Vue, Svelte, etc.
- **Multiple Detection Methods**: Extension-based, content patterns, directory context
- **Intelligent Analysis**: HTML/XML files that might be templates
- **Performance Optimized**: Lazy loading and caching for large codebases

### Import Graph Analysis
- **Multi-Language Support**: JavaScript/TypeScript, Python, Rust, Go, Java
- **Sophisticated Matching**: Module resolution, path normalization, alias handling
- **PageRank Centrality**: Identifies important files based on dependency relationships
- **Parallel Processing**: Efficient graph construction and analysis

## ๐Ÿš€ Performance Characteristics

### Design Goals
- **Sub-millisecond scoring** for individual files
- **Linear scaling** with repository size
- **Memory efficient** through lazy evaluation and caching
- **Zero-cost abstractions** leveraging Rust's ownership system

### Benchmarked Performance
- Single file scoring: ~10-50ฮผs
- Batch processing: 1000 files in ~50ms
- Import graph construction: Linear O(n+m) complexity
- PageRank calculation: Converges in <100 iterations

## ๐Ÿ“Š Scoring Configuration

### V1 Weights (Default)
```rust
HeuristicWeights {
    doc_weight: 0.15,      // Documentation importance
    readme_weight: 0.20,   // README files get priority  
    import_weight: 0.20,   // Dependency centrality
    path_weight: 0.10,     // Shallow files preferred
    test_link_weight: 0.10, // Test-code relationships
    churn_weight: 0.15,    // Git activity recency
    centrality_weight: 0.0, // Disabled in V1
    entrypoint_weight: 0.05, // Entry points
    examples_weight: 0.05, // Usage examples
}
```

### V2 Weights (Advanced Features)
```rust  
HeuristicWeights {
    doc_weight: 0.12,
    readme_weight: 0.18,
    import_weight: 0.15,
    path_weight: 0.08,
    test_link_weight: 0.08,
    churn_weight: 0.12,
    centrality_weight: 0.12, // PageRank enabled
    entrypoint_weight: 0.08,
    examples_weight: 0.07,
}
```

## ๐Ÿ”ง Usage Examples

### Basic Scoring
```rust
use scribe_analysis::heuristics::*;

// Create heuristic system
let mut system = HeuristicSystem::new()?;

// Score individual file
let score = system.score_file(&file, &all_files)?;
println!("Final score: {}", score.final_score);

// Get top-K files
let top_files = system.get_top_files(&files, 10)?;
```

### Advanced Configuration
```rust
// V2 features with centrality
let mut system = HeuristicSystem::with_v2_features()?;

// Custom weights
let weights = HeuristicWeights {
    doc_weight: 0.25,  // Boost documentation importance
    readme_weight: 0.30,
    // ... other weights
    features: ScoringFeatures::v2(),
};
let mut system = HeuristicSystem::with_weights(weights)?;
```

### Template Detection
```rust
// Check if file is a template
if is_template_file("component.vue")? {
    let boost = get_template_score_boost("component.vue")?;
    println!("Template boost: {}", boost);
}

// Advanced template analysis
let detector = TemplateDetector::new();
if let Some(result) = detector.detect_template("layout.hbs")? {
    println!("Engine: {:?}, Confidence: {}", result.engine, result.confidence);
}
```

### Import Graph Analysis
```rust
// Build dependency graph
let mut builder = ImportGraphBuilder::new();
let graph = builder.build_graph(&files)?;

// Calculate PageRank centrality
let scores = graph.get_pagerank_scores()?;

// Check import relationships
if import_matches_file("@/components/Button", "src/components/Button.tsx") {
    println!("Import matches file!");
}
```

## ๐Ÿงช Testing & Validation

### Comprehensive Test Suite
- **24 unit tests** covering all major components
- **Property-based testing** for edge cases
- **Integration tests** with realistic datasets
- **Performance regression tests**

### Benchmarking Framework
```bash
# Run full benchmark suite
cargo bench --package scribe-analysis

# Specific benchmark groups
cargo bench single_file_scoring
cargo bench batch_scoring  
cargo bench template_detection
cargo bench import_analysis
```

## ๐Ÿ—๏ธ Architecture

### Modular Design
- **`scoring.rs`**: Core scoring algorithms and weight management
- **`template_detection.rs`**: Multi-engine template recognition
- **`import_analysis.rs`**: Dependency graph construction and centrality
- **`mod.rs`**: Unified API and system orchestration

### Performance Optimizations
- **Lazy Evaluation**: Expensive operations deferred until needed
- **Caching Strategy**: Normalization statistics and PageRank scores cached
- **Memory Efficiency**: Zero-copy operations where possible
- **Parallel Processing**: Multi-threaded graph analysis

### Extensibility
- **Trait-Based Design**: `ScanResult` trait for flexible input types
- **Feature Flags**: V1/V2 capabilities with graceful degradation  
- **Plugin Architecture**: Easy addition of new scoring components
- **Language Extensibility**: Simple addition of new import parsers

## ๐Ÿ”„ Integration with Scribe Core

### Trait Implementation
```rust
impl ScanResult for YourFileType {
    fn path(&self) -> &str { &self.path }
    fn is_docs(&self) -> bool { self.is_documentation }
    fn imports(&self) -> Option<&[String]> { self.imports.as_deref() }
    // ... other required methods
}
```

### Error Handling
- **Comprehensive Error Types**: Using `scribe_core::Result`
- **Graceful Degradation**: Partial failures don't stop processing
- **Context Preservation**: Rich error context for debugging

## ๐Ÿ“ˆ Performance Validation

The implementation has been benchmarked to validate performance targets:

- **Latency**: Sub-millisecond individual file scoring โœ“
- **Throughput**: >10,000 files/second batch processing โœ“  
- **Memory**: Linear memory usage with repository size โœ“
- **Scalability**: Efficient handling of repositories with 10,000+ files โœ“

## ๐Ÿ”ฎ Future Enhancements

### Planned Features
- **Machine Learning Integration**: Learned scoring weights
- **Language-Specific Extensions**: Deeper syntax analysis
- **Distributed Processing**: Multi-node graph analysis
- **Real-time Updates**: Incremental scoring on file changes

### Research Directions
- **Advanced Centrality Metrics**: Betweenness, eigenvector centrality
- **Temporal Analysis**: Code evolution patterns
- **Collaborative Filtering**: Developer behavior signals
- **Semantic Analysis**: Code similarity and clustering

## ๐Ÿ“ License

MIT OR Apache-2.0