# scribe-scaling
High-performance scaling optimizations for large repository analysis in Scribe.
## Overview
`scribe-scaling` is the performance and optimization layer that enables Scribe to handle repositories of any size—from small projects to enterprise codebases with 100k+ files. It implements streaming architecture, intelligent caching, parallel processing, and context positioning to maintain sub-second performance on small repos and sub-30-second analysis on massive repositories.
## Key Features
### Progressive Loading Architecture
- **Metadata-first streaming**: Load file metadata before content to enable early filtering
- **Lazy content loading**: Only read file content when selected for inclusion
- **Backpressure management**: Adaptive throttling prevents memory exhaustion
- **Incremental processing**: Process files as they're discovered, not after full scan
### Intelligent Caching
- **Persistent cache**: Disk-based cache with Blake3 content signatures
- **Signature-based invalidation**: Only re-process changed files
- **Multi-level caching**: In-memory LRU + disk persistence
- **Compression**: Compressed cache entries using flate2 (gzip)
- **Cache warming**: Pre-populate cache for common operations
### Parallel Processing
- **Multi-core scanning**: Parallel directory traversal using Rayon
- **Concurrent analysis**: Parallel AST parsing and scoring
- **Async I/O**: Non-blocking file operations with tokio
- **Work stealing**: Dynamic load balancing across threads
- **Adaptive parallelism**: Scale thread count based on system resources
### Context Positioning Optimization
- **Transformer-aware ordering**: Exploits LLM attention patterns
- **3-tier positioning**: HEAD (20%) → MIDDLE (60%) → TAIL (20%)
- **Query-aware relevance**: Surfaces most relevant files at HEAD
- **Centrality-based placement**: High-centrality files at HEAD and TAIL
- **Relatedness grouping**: Clusters related files together
### Adaptive Thresholds
- **Repository-aware configuration**: Auto-tune based on repo size
- **Memory-aware limits**: Adjust based on available system memory
- **Performance targets**: Dynamic optimization for time/memory tradeoffs
- **Quality preservation**: Maintain information density while scaling
## Architecture
```
Repository → Streaming Scan → Parallel Analysis → Selection + Positioning → Caching → Output
↓ ↓ ↓ ↓ ↓ ↓
Metadata Progressive Multi-threaded 3-Tier Context Blake3 Optimized
First Loading AST/Scoring HEAD/MIDDLE/TAIL Signature Bundle
↓ ↓ ↓
Backpressure Work Stealing LRU + Disk
Control Load Balance Cache
```
### Core Components
#### `ScalingSelector`
Main entry point for scaled repository processing:
- Integrates all optimization layers
- Configurable performance/quality tradeoffs
- Automatic adaptation based on repository characteristics
- Progress reporting and cancellation support
#### `StreamingScanner`
Progressive file system traversal:
- Yields file metadata as discovered
- Supports filtering during traversal
- Adaptive buffering based on memory pressure
- Early termination support
#### `ParallelAnalyzer`
Multi-threaded file analysis:
- Rayon-based thread pool
- Dynamic work distribution
- Priority-based scheduling (high-importance files first)
- Error isolation (one failure doesn't stop processing)
#### `ContextPositioner`
Optimizes file order for LLM recall:
- **HEAD**: Query-relevant high-centrality files (transformers attend best here)
- **MIDDLE**: Supporting files with lower centrality (background context)
- **TAIL**: Core functionality high-centrality files (attention strengthens again)
- Centrality calculation using PageRank
- Query relevance scoring with term matching
- Relatedness grouping by imports and structure
#### `CacheManager`
Persistent caching system:
- Blake3 content hashing for change detection
- LRU eviction for memory cache
- Compressed disk storage
- Atomic cache updates
- Cache statistics and monitoring
#### `AdaptiveConfig`
Auto-tuning configuration:
- Detects repository size and complexity
- Queries available system memory
- Adjusts thread counts, buffer sizes, cache limits
- Scales thresholds based on detected patterns
## Usage
### Basic Scaled Selection
```rust
use scribe_scaling::{ScalingSelector, ScalingConfig};
let config = ScalingConfig {
performance_target: PerformanceTarget::Balanced,
enable_caching: true,
enable_positioning: true,
..Default::default()
};
let selector = ScalingSelector::new(config);
let result = selector.select_and_process(repo_path).await?;
println!("Analyzed {} files in {:?}",
result.files_processed,
result.elapsed_time
);
println!("Cache hit rate: {:.1}%", result.cache_stats.hit_rate * 100.0);
```
### Query-Aware Context Positioning
```rust
use scribe_scaling::{ScalingSelector, ContextPositioningConfig};
let mut config = ScalingConfig::default();
config.positioning = ContextPositioningConfig {
enable_positioning: true,
head_percentage: 0.20, // 20% high-priority files at HEAD
tail_percentage: 0.20, // 20% core files at TAIL
centrality_weight: 0.5,
query_relevance_weight: 0.3,
relatedness_weight: 0.2,
};
let selector = ScalingSelector::new(config);
let result = selector.select_and_process_with_query(
repo_path,
Some("authentication middleware") // Query hint
).await?;
if result.has_context_positioning() {
let ordered = result.get_optimally_ordered_files();
println!("HEAD: {} files", ordered.head.len());
println!("MIDDLE: {} files", ordered.middle.len());
println!("TAIL: {} files", ordered.tail.len());
}
```
### Streaming with Progress
```rust
use scribe_scaling::{StreamingScanner, ScanConfig};
use indicatif::{ProgressBar, ProgressStyle};
let scanner = StreamingScanner::new(ScanConfig::default());
let progress = ProgressBar::new_spinner();
progress.set_style(
ProgressStyle::default_spinner()
.template("{spinner} [{elapsed}] {msg} ({pos} files)")
);
let mut file_stream = scanner.scan_streaming(repo_path).await?;
while let Some(file) = file_stream.next().await {
progress.set_message(format!("Scanning {}", file.path.display()));
progress.inc(1);
// Process file metadata immediately
if file.score > threshold {
// Load and analyze content only for high-scoring files
let content = file.load_content().await?;
analyze(content).await?;
}
}
progress.finish_with_message("Scan complete");
```
### Custom Performance Targets
```rust
use scribe_scaling::{PerformanceTarget, ScalingConfig};
// Fast mode: Prioritize speed over completeness
let fast_config = ScalingConfig {
performance_target: PerformanceTarget::Speed,
parallel_threads: Some(num_cpus::get()),
cache_size_mb: 500,
max_file_size: 500_000, // Skip large files
enable_positioning: false,
};
// Quality mode: Prioritize completeness and quality
let quality_config = ScalingConfig {
performance_target: PerformanceTarget::Quality,
parallel_threads: Some(4), // Fewer threads, more thorough
cache_size_mb: 2000,
max_file_size: 5_000_000,
enable_positioning: true,
};
// Balanced mode (default)
let balanced_config = ScalingConfig::default();
```
### Cache Management
```rust
use scribe_scaling::cache::{CacheManager, CacheConfig};
let cache_config = CacheConfig {
cache_dir: PathBuf::from(".scribe-cache"),
max_size_mb: 1000,
compression_level: 6,
ttl_hours: 24,
};
let cache = CacheManager::new(cache_config)?;
// Check cache status
let stats = cache.stats();
println!("Cache entries: {}", stats.entry_count);
println!("Total size: {} MB", stats.size_mb);
println!("Hit rate: {:.1}%", stats.hit_rate * 100.0);
// Clear old entries
cache.evict_expired()?;
// Clear entire cache
cache.clear()?;
```
## Performance Targets
### Repository Size Profiles
| **Small** | ≤1k | <1s | <50MB | In-memory, minimal caching |
| **Medium** | 1k-10k | <5s | <200MB | Parallel + caching |
| **Large** | 10k-100k | <15s | <1GB | Streaming + aggressive caching |
| **Enterprise** | 100k+ | <30s | <2GB | Full optimization suite |
### Achieved Performance
Based on internal benchmarks:
- **Linux kernel** (70k files): ~12s analysis time, ~800MB memory
- **Chromium** (300k files): ~28s analysis time, ~1.8GB memory
- **Small Rust project** (500 files): ~450ms analysis time, ~40MB memory
## Context Positioning
### Why It Matters
Transformer models don't attend equally to all tokens:
- **Strong attention at HEAD**: First 20% gets highest attention
- **Weak attention in MIDDLE**: Middle 60% gets less focus
- **Strong attention at TAIL**: Final 20% gets second-highest attention
**Strategy**: Place most important files where LLMs attend best.
### 3-Tier System
#### HEAD (20%)
- Query-relevant files with high centrality
- Entry points matching query terms
- Critical dependencies for query context
- **Goal**: Give LLM immediate relevant context
#### MIDDLE (60%)
- Supporting files with lower centrality
- Utility code and helpers
- Test files (if included)
- **Goal**: Provide background without overwhelming attention
#### TAIL (20%)
- Core high-centrality files (lib.rs, __init__.py)
- Foundational configuration and types
- Architectural anchors
- **Goal**: Ground LLM understanding with core concepts
### Positioning Algorithm
```
1. Compute PageRank centrality for all files
2. Score query relevance (if query provided)
3. Combined score = centrality_weight * centrality +
query_weight * relevance
4. Sort by combined score
5. Top 20% → HEAD
6. Middle 60% → MIDDLE
7. Bottom 20% with highest centrality → TAIL
8. Group related files within each tier
```
## Configuration
### `ScalingConfig`
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `performance_target` | `PerformanceTarget` | `Balanced` | Speed, Balanced, or Quality |
| `parallel_threads` | `Option<usize>` | CPU count | Thread pool size |
| `cache_size_mb` | `usize` | `1000` | Maximum cache size |
| `enable_caching` | `bool` | `true` | Enable persistent cache |
| `enable_positioning` | `bool` | `true` | Enable context positioning |
| `max_file_size` | `usize` | `1_000_000` | Skip files larger than this |
### `ContextPositioningConfig`
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `enable_positioning` | `bool` | `true` | Enable/disable positioning |
| `head_percentage` | `f64` | `0.20` | Percentage for HEAD section |
| `tail_percentage` | `f64` | `0.20` | Percentage for TAIL section |
| `centrality_weight` | `f64` | `0.4` | Weight for centrality scoring |
| `query_relevance_weight` | `f64` | `0.3` | Weight for query matching |
| `relatedness_weight` | `f64` | `0.3` | Weight for file grouping |
## Optimizations
### Memory Management
- **Streaming processing**: Never load entire repository into memory
- **Lazy content loading**: Load file content only when needed
- **Compressed caching**: Reduce cache memory footprint
- **Incremental GC**: Release memory as processing progresses
### I/O Optimization
- **Async file operations**: Non-blocking reads with tokio
- **Read-ahead buffering**: Pre-fetch likely-needed files
- **Memory-mapped files**: For very large files
- **Batched writes**: Coalesce cache updates
### CPU Optimization
- **Work stealing**: Dynamic thread load balancing
- **SIMD**: Use SIMD for hash computation (Blake3)
- **Compiled patterns**: Pre-compile regex and globs
- **Lazy evaluation**: Skip unnecessary computations
## Integration
`scribe-scaling` is used by:
- **CLI**: Top-level orchestration of repository analysis
- **scribe-webservice**: Powers web API for large repositories
- **scribe-selection**: Provides scalable selection infrastructure
- **scribe-graph**: Scales PageRank to large graphs
## See Also
- `docs/context-positioning.md`: Detailed context positioning documentation
- `scribe-selection`: File selection algorithms that scaling optimizes
- `scribe-graph`: PageRank computation used by positioning
- `../../WHY_SCRIBE.md`: Philosophy on performance and intelligence