Scribe - Advanced Code Analysis Library
Scribe is a comprehensive Rust library for code analysis, repository exploration, and intelligent file processing. It provides powerful tools for understanding codebases through heuristic scoring, graph analysis, and AI-powered insights.
π Features
- π Intelligent File Analysis: Multi-dimensional heuristic scoring system for identifying important files
- π Dependency Graph Analysis: PageRank centrality computation for understanding code relationships
- β‘ High-Performance Scanning: Parallel file system traversal with git integration
- π― Advanced Pattern Matching: Flexible glob and gitignore pattern support with preset configurations
- π§ Smart Code Selection: Context-aware code bundling and relevance scoring
- π οΈ Extensible Architecture: Plugin system for custom analyzers and scorers
- βοΈ Modular Design: Use only the features you need with optional components
π¦ Installation
Add this to your Cargo.toml:
[]
= "0.1.0"
Feature Flags
Scribe uses feature flags to allow selective compilation:
# Full installation (default)
= "0.1.0"
# Minimal installation
= { = "0.1.0", = false, = ["core"] }
# Fast file operations only
= { = "0.1.0", = false, = ["fast"] }
# Analysis without graph features
= { = "0.1.0", = false, = ["core", "analysis", "scanner"] }
Available Features
| Feature | Description | Dependencies |
|---|---|---|
default |
All features enabled | core, analysis, graph, scanner, patterns, selection |
core |
Essential types and utilities | None |
analysis |
Heuristic scoring and metrics | core |
graph |
PageRank centrality analysis | core, analysis |
scanner |
File system scanning | core |
patterns |
Pattern matching (glob, gitignore) | core |
selection |
Code selection and bundling | core, analysis, graph |
Feature Groups
| Group | Features | Use Case |
|---|---|---|
minimal |
core |
Basic types and utilities only |
fast |
core, scanner, patterns |
Quick file operations |
comprehensive |
All features | Complete analysis capabilities |
π Quick Start
Basic Repository Analysis
use *;
use Path;
async
Selective Feature Usage
// Using only core and scanner features
use ;
use ;
async
Pattern Matching
use presets;
async
Graph Analysis
use PageRankAnalysis;
async
CLI Covering Sets
Scribeβs CLI can compute minimal covering sets:
--covering-set <name>: target a function/class/module by name.--covering-set-diff: build a covering set for the currentgit diff(uses the dependency graph to include touched files plus related dependents/dependencies).--diff-against <ref>: diff against a specific ref (defaults toHEAD).- Shared filters:
--include-dependents,--max-depth,--max-files. - Output helper: add
--line-numbersto prefix every line in the bundled files, making it easy for review agents to comment by line number.
Example:
ποΈ Architecture
Scribe is built with a modular architecture where each crate provides specific functionality:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β scribe β
β βββββββββββββββ βββββββββββββββ βββββββββββββββββββββββββββ β
β β scribe-core β βscribe-scannerβ β scribe-patterns β β
β β (types, β β(file system β β (glob, gitignore, β β
β β traits, β β traversal, β β pattern matching) β β
β β utilities) β β git support) β β β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββββββββββββββ β
β βββββββββββββββ βββββββββββββββ βββββββββββββββββββββββββββ β
β βscribe-analysisβ βscribe-graph β β scribe-selection β β
β β (heuristic β β (PageRank β β (intelligent bundling, β β
β β scoring, β β centrality, β β context extraction, β β
β β code metrics)β β dependency β β relevance scoring) β β
β β β β analysis) β β β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Component Overview
scribe-core: Foundation types, traits, configuration, and utilitiesscribe-scanner: High-performance file system traversal with git integrationscribe-patterns: Flexible pattern matching with glob and gitignore supportscribe-analysis: Heuristic scoring algorithms and code metricsscribe-graph: PageRank centrality and dependency graph analysisscribe-selection: Intelligent code selection and context extraction
π Examples
The repository includes several examples demonstrating different usage patterns:
Run Examples
# Full analysis example
# Minimal features example
Available Examples
basic_usage.rs: Complete repository analysis with all featuresselective_features.rs: Minimal usage with core and scanner only
π§ Performance
Scribe is designed for high performance:
- Memory Efficient: Streaming file processing with configurable memory limits
- Parallel Processing: Multi-threaded scanning and analysis using Rayon
- Git Integration: Fast file discovery using
git ls-fileswhen available - Optimized Algorithms: Research-grade PageRank implementation with convergence detection
Benchmarks
Run benchmarks to see performance characteristics:
Performance characteristics on typical repositories:
- Small repos (< 1k files): ~10-50ms analysis time
- Medium repos (1k-10k files): ~100ms-1s analysis time
- Large repos (> 10k files): ~1-10s analysis time
- Memory usage: ~2MB per 1000 files for basic analysis
π οΈ Development
Building
# Build all features
# Build with specific features
# Build for release
Testing
# Run all tests
# Test specific features
# Run tests with output
Documentation
# Generate documentation
# Generate documentation for all features
π Related Projects
- [scribe-cli]: Command-line interface for Scribe
- [scribe-vscode]: Visual Studio Code extension
- [scribe-jupyter]: Jupyter notebook integration
π License
This project is licensed under either of
- Apache License, Version 2.0, (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
π€ Contributing
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
Contribution Guidelines
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
π Support
- π Documentation: docs.rs/scribe
- π Issues: GitHub Issues
- π¬ Discussions: GitHub Discussions
π Acknowledgments
- Built with Rust π¦
- Uses tree-sitter for parsing
- Inspired by research in code analysis and repository mining
- Community feedback and contributions