scribe-scaling
High-performance scaling optimizations for large repository analysis in Scribe.
Overview
scribe-scaling is the performance and optimization layer that enables Scribe to handle repositories of any size—from small projects to enterprise codebases with 100k+ files. It implements streaming architecture, intelligent caching, parallel processing, and context positioning to maintain sub-second performance on small repos and sub-30-second analysis on massive repositories.
Key Features
Progressive Loading Architecture
- Metadata-first streaming: Load file metadata before content to enable early filtering
- Lazy content loading: Only read file content when selected for inclusion
- Backpressure management: Adaptive throttling prevents memory exhaustion
- Incremental processing: Process files as they're discovered, not after full scan
Intelligent Caching
- Persistent cache: Disk-based cache with Blake3 content signatures
- Signature-based invalidation: Only re-process changed files
- Multi-level caching: In-memory LRU + disk persistence
- Compression: Compressed cache entries using flate2 (gzip)
- Cache warming: Pre-populate cache for common operations
Parallel Processing
- Multi-core scanning: Parallel directory traversal using Rayon
- Concurrent analysis: Parallel AST parsing and scoring
- Async I/O: Non-blocking file operations with tokio
- Work stealing: Dynamic load balancing across threads
- Adaptive parallelism: Scale thread count based on system resources
Context Positioning Optimization
- Transformer-aware ordering: Exploits LLM attention patterns
- 3-tier positioning: HEAD (20%) → MIDDLE (60%) → TAIL (20%)
- Query-aware relevance: Surfaces most relevant files at HEAD
- Centrality-based placement: High-centrality files at HEAD and TAIL
- Relatedness grouping: Clusters related files together
Adaptive Thresholds
- Repository-aware configuration: Auto-tune based on repo size
- Memory-aware limits: Adjust based on available system memory
- Performance targets: Dynamic optimization for time/memory tradeoffs
- Quality preservation: Maintain information density while scaling
Architecture
Repository → Streaming Scan → Parallel Analysis → Selection + Positioning → Caching → Output
↓ ↓ ↓ ↓ ↓ ↓
Metadata Progressive Multi-threaded 3-Tier Context Blake3 Optimized
First Loading AST/Scoring HEAD/MIDDLE/TAIL Signature Bundle
↓ ↓ ↓
Backpressure Work Stealing LRU + Disk
Control Load Balance Cache
Core Components
ScalingSelector
Main entry point for scaled repository processing:
- Integrates all optimization layers
- Configurable performance/quality tradeoffs
- Automatic adaptation based on repository characteristics
- Progress reporting and cancellation support
StreamingScanner
Progressive file system traversal:
- Yields file metadata as discovered
- Supports filtering during traversal
- Adaptive buffering based on memory pressure
- Early termination support
ParallelAnalyzer
Multi-threaded file analysis:
- Rayon-based thread pool
- Dynamic work distribution
- Priority-based scheduling (high-importance files first)
- Error isolation (one failure doesn't stop processing)
ContextPositioner
Optimizes file order for LLM recall:
- HEAD: Query-relevant high-centrality files (transformers attend best here)
- MIDDLE: Supporting files with lower centrality (background context)
- TAIL: Core functionality high-centrality files (attention strengthens again)
- Centrality calculation using PageRank
- Query relevance scoring with term matching
- Relatedness grouping by imports and structure
CacheManager
Persistent caching system:
- Blake3 content hashing for change detection
- LRU eviction for memory cache
- Compressed disk storage
- Atomic cache updates
- Cache statistics and monitoring
AdaptiveConfig
Auto-tuning configuration:
- Detects repository size and complexity
- Queries available system memory
- Adjusts thread counts, buffer sizes, cache limits
- Scales thresholds based on detected patterns
Usage
Basic Scaled Selection
use ;
let config = ScalingConfig ;
let selector = new;
let result = selector.select_and_process.await?;
println!;
println!;
Query-Aware Context Positioning
use ;
let mut config = default;
config.positioning = ContextPositioningConfig ;
let selector = new;
let result = selector.select_and_process_with_query.await?;
if result.has_context_positioning
Streaming with Progress
use ;
use ;
let scanner = new;
let progress = new_spinner;
progress.set_style;
let mut file_stream = scanner.scan_streaming.await?;
while let Some = file_stream.next.await
progress.finish_with_message;
Custom Performance Targets
use ;
// Fast mode: Prioritize speed over completeness
let fast_config = ScalingConfig ;
// Quality mode: Prioritize completeness and quality
let quality_config = ScalingConfig ;
// Balanced mode (default)
let balanced_config = default;
Cache Management
use ;
let cache_config = CacheConfig ;
let cache = new?;
// Check cache status
let stats = cache.stats;
println!;
println!;
println!;
// Clear old entries
cache.evict_expired?;
// Clear entire cache
cache.clear?;
Performance Targets
Repository Size Profiles
| Size | Files | Time Target | Memory Target | Strategy |
|---|---|---|---|---|
| Small | ≤1k | <1s | <50MB | In-memory, minimal caching |
| Medium | 1k-10k | <5s | <200MB | Parallel + caching |
| Large | 10k-100k | <15s | <1GB | Streaming + aggressive caching |
| Enterprise | 100k+ | <30s | <2GB | Full optimization suite |
Achieved Performance
Based on internal benchmarks:
- Linux kernel (70k files): ~12s analysis time, ~800MB memory
- Chromium (300k files): ~28s analysis time, ~1.8GB memory
- Small Rust project (500 files): ~450ms analysis time, ~40MB memory
Context Positioning
Why It Matters
Transformer models don't attend equally to all tokens:
- Strong attention at HEAD: First 20% gets highest attention
- Weak attention in MIDDLE: Middle 60% gets less focus
- Strong attention at TAIL: Final 20% gets second-highest attention
Strategy: Place most important files where LLMs attend best.
3-Tier System
HEAD (20%)
- Query-relevant files with high centrality
- Entry points matching query terms
- Critical dependencies for query context
- Goal: Give LLM immediate relevant context
MIDDLE (60%)
- Supporting files with lower centrality
- Utility code and helpers
- Test files (if included)
- Goal: Provide background without overwhelming attention
TAIL (20%)
- Core high-centrality files (lib.rs, init.py)
- Foundational configuration and types
- Architectural anchors
- Goal: Ground LLM understanding with core concepts
Positioning Algorithm
1. Compute PageRank centrality for all files
2. Score query relevance (if query provided)
3. Combined score = centrality_weight * centrality +
query_weight * relevance
4. Sort by combined score
5. Top 20% → HEAD
6. Middle 60% → MIDDLE
7. Bottom 20% with highest centrality → TAIL
8. Group related files within each tier
Configuration
ScalingConfig
| Field | Type | Default | Description |
|---|---|---|---|
performance_target |
PerformanceTarget |
Balanced |
Speed, Balanced, or Quality |
parallel_threads |
Option<usize> |
CPU count | Thread pool size |
cache_size_mb |
usize |
1000 |
Maximum cache size |
enable_caching |
bool |
true |
Enable persistent cache |
enable_positioning |
bool |
true |
Enable context positioning |
max_file_size |
usize |
1_000_000 |
Skip files larger than this |
ContextPositioningConfig
| Field | Type | Default | Description |
|---|---|---|---|
enable_positioning |
bool |
true |
Enable/disable positioning |
head_percentage |
f64 |
0.20 |
Percentage for HEAD section |
tail_percentage |
f64 |
0.20 |
Percentage for TAIL section |
centrality_weight |
f64 |
0.4 |
Weight for centrality scoring |
query_relevance_weight |
f64 |
0.3 |
Weight for query matching |
relatedness_weight |
f64 |
0.3 |
Weight for file grouping |
Optimizations
Memory Management
- Streaming processing: Never load entire repository into memory
- Lazy content loading: Load file content only when needed
- Compressed caching: Reduce cache memory footprint
- Incremental GC: Release memory as processing progresses
I/O Optimization
- Async file operations: Non-blocking reads with tokio
- Read-ahead buffering: Pre-fetch likely-needed files
- Memory-mapped files: For very large files
- Batched writes: Coalesce cache updates
CPU Optimization
- Work stealing: Dynamic thread load balancing
- SIMD: Use SIMD for hash computation (Blake3)
- Compiled patterns: Pre-compile regex and globs
- Lazy evaluation: Skip unnecessary computations
Integration
scribe-scaling is used by:
- CLI: Top-level orchestration of repository analysis
- scribe-webservice: Powers web API for large repositories
- scribe-selection: Provides scalable selection infrastructure
- scribe-graph: Scales PageRank to large graphs
See Also
docs/context-positioning.md: Detailed context positioning documentationscribe-selection: File selection algorithms that scaling optimizesscribe-graph: PageRank computation used by positioning../../WHY_SCRIBE.md: Philosophy on performance and intelligence