scribe-scaling 0.5.1

High-performance scaling optimizations for large repositories
Documentation

scribe-scaling

High-performance scaling optimizations for large repository analysis in Scribe.

Overview

scribe-scaling is the performance and optimization layer that enables Scribe to handle repositories of any size—from small projects to enterprise codebases with 100k+ files. It implements streaming architecture, intelligent caching, parallel processing, and context positioning to maintain sub-second performance on small repos and sub-30-second analysis on massive repositories.

Key Features

Progressive Loading Architecture

  • Metadata-first streaming: Load file metadata before content to enable early filtering
  • Lazy content loading: Only read file content when selected for inclusion
  • Backpressure management: Adaptive throttling prevents memory exhaustion
  • Incremental processing: Process files as they're discovered, not after full scan

Intelligent Caching

  • Persistent cache: Disk-based cache with Blake3 content signatures
  • Signature-based invalidation: Only re-process changed files
  • Multi-level caching: In-memory LRU + disk persistence
  • Compression: Compressed cache entries using flate2 (gzip)
  • Cache warming: Pre-populate cache for common operations

Parallel Processing

  • Multi-core scanning: Parallel directory traversal using Rayon
  • Concurrent analysis: Parallel AST parsing and scoring
  • Async I/O: Non-blocking file operations with tokio
  • Work stealing: Dynamic load balancing across threads
  • Adaptive parallelism: Scale thread count based on system resources

Context Positioning Optimization

  • Transformer-aware ordering: Exploits LLM attention patterns
  • 3-tier positioning: HEAD (20%) → MIDDLE (60%) → TAIL (20%)
  • Query-aware relevance: Surfaces most relevant files at HEAD
  • Centrality-based placement: High-centrality files at HEAD and TAIL
  • Relatedness grouping: Clusters related files together

Adaptive Thresholds

  • Repository-aware configuration: Auto-tune based on repo size
  • Memory-aware limits: Adjust based on available system memory
  • Performance targets: Dynamic optimization for time/memory tradeoffs
  • Quality preservation: Maintain information density while scaling

Architecture

Repository → Streaming Scan → Parallel Analysis → Selection + Positioning → Caching → Output
     ↓              ↓                ↓                      ↓                 ↓          ↓
  Metadata     Progressive      Multi-threaded        3-Tier Context      Blake3    Optimized
   First         Loading          AST/Scoring          HEAD/MIDDLE/TAIL   Signature   Bundle
                ↓                     ↓                                       ↓
            Backpressure         Work Stealing                          LRU + Disk
             Control              Load Balance                            Cache

Core Components

ScalingSelector

Main entry point for scaled repository processing:

  • Integrates all optimization layers
  • Configurable performance/quality tradeoffs
  • Automatic adaptation based on repository characteristics
  • Progress reporting and cancellation support

StreamingScanner

Progressive file system traversal:

  • Yields file metadata as discovered
  • Supports filtering during traversal
  • Adaptive buffering based on memory pressure
  • Early termination support

ParallelAnalyzer

Multi-threaded file analysis:

  • Rayon-based thread pool
  • Dynamic work distribution
  • Priority-based scheduling (high-importance files first)
  • Error isolation (one failure doesn't stop processing)

ContextPositioner

Optimizes file order for LLM recall:

  • HEAD: Query-relevant high-centrality files (transformers attend best here)
  • MIDDLE: Supporting files with lower centrality (background context)
  • TAIL: Core functionality high-centrality files (attention strengthens again)
  • Centrality calculation using PageRank
  • Query relevance scoring with term matching
  • Relatedness grouping by imports and structure

CacheManager

Persistent caching system:

  • Blake3 content hashing for change detection
  • LRU eviction for memory cache
  • Compressed disk storage
  • Atomic cache updates
  • Cache statistics and monitoring

AdaptiveConfig

Auto-tuning configuration:

  • Detects repository size and complexity
  • Queries available system memory
  • Adjusts thread counts, buffer sizes, cache limits
  • Scales thresholds based on detected patterns

Usage

Basic Scaled Selection

use scribe_scaling::{ScalingSelector, ScalingConfig};

let config = ScalingConfig {
    performance_target: PerformanceTarget::Balanced,
    enable_caching: true,
    enable_positioning: true,
    ..Default::default()
};

let selector = ScalingSelector::new(config);
let result = selector.select_and_process(repo_path).await?;

println!("Analyzed {} files in {:?}",
    result.files_processed,
    result.elapsed_time
);
println!("Cache hit rate: {:.1}%", result.cache_stats.hit_rate * 100.0);

Query-Aware Context Positioning

use scribe_scaling::{ScalingSelector, ContextPositioningConfig};

let mut config = ScalingConfig::default();
config.positioning = ContextPositioningConfig {
    enable_positioning: true,
    head_percentage: 0.20,      // 20% high-priority files at HEAD
    tail_percentage: 0.20,      // 20% core files at TAIL
    centrality_weight: 0.5,
    query_relevance_weight: 0.3,
    relatedness_weight: 0.2,
};

let selector = ScalingSelector::new(config);
let result = selector.select_and_process_with_query(
    repo_path,
    Some("authentication middleware")  // Query hint
).await?;

if result.has_context_positioning() {
    let ordered = result.get_optimally_ordered_files();
    println!("HEAD: {} files", ordered.head.len());
    println!("MIDDLE: {} files", ordered.middle.len());
    println!("TAIL: {} files", ordered.tail.len());
}

Streaming with Progress

use scribe_scaling::{StreamingScanner, ScanConfig};
use indicatif::{ProgressBar, ProgressStyle};

let scanner = StreamingScanner::new(ScanConfig::default());
let progress = ProgressBar::new_spinner();

progress.set_style(
    ProgressStyle::default_spinner()
        .template("{spinner} [{elapsed}] {msg} ({pos} files)")
);

let mut file_stream = scanner.scan_streaming(repo_path).await?;

while let Some(file) = file_stream.next().await {
    progress.set_message(format!("Scanning {}", file.path.display()));
    progress.inc(1);

    // Process file metadata immediately
    if file.score > threshold {
        // Load and analyze content only for high-scoring files
        let content = file.load_content().await?;
        analyze(content).await?;
    }
}

progress.finish_with_message("Scan complete");

Custom Performance Targets

use scribe_scaling::{PerformanceTarget, ScalingConfig};

// Fast mode: Prioritize speed over completeness
let fast_config = ScalingConfig {
    performance_target: PerformanceTarget::Speed,
    parallel_threads: Some(num_cpus::get()),
    cache_size_mb: 500,
    max_file_size: 500_000,  // Skip large files
    enable_positioning: false,
};

// Quality mode: Prioritize completeness and quality
let quality_config = ScalingConfig {
    performance_target: PerformanceTarget::Quality,
    parallel_threads: Some(4),  // Fewer threads, more thorough
    cache_size_mb: 2000,
    max_file_size: 5_000_000,
    enable_positioning: true,
};

// Balanced mode (default)
let balanced_config = ScalingConfig::default();

Cache Management

use scribe_scaling::cache::{CacheManager, CacheConfig};

let cache_config = CacheConfig {
    cache_dir: PathBuf::from(".scribe-cache"),
    max_size_mb: 1000,
    compression_level: 6,
    ttl_hours: 24,
};

let cache = CacheManager::new(cache_config)?;

// Check cache status
let stats = cache.stats();
println!("Cache entries: {}", stats.entry_count);
println!("Total size: {} MB", stats.size_mb);
println!("Hit rate: {:.1}%", stats.hit_rate * 100.0);

// Clear old entries
cache.evict_expired()?;

// Clear entire cache
cache.clear()?;

Performance Targets

Repository Size Profiles

Size Files Time Target Memory Target Strategy
Small ≤1k <1s <50MB In-memory, minimal caching
Medium 1k-10k <5s <200MB Parallel + caching
Large 10k-100k <15s <1GB Streaming + aggressive caching
Enterprise 100k+ <30s <2GB Full optimization suite

Achieved Performance

Based on internal benchmarks:

  • Linux kernel (70k files): ~12s analysis time, ~800MB memory
  • Chromium (300k files): ~28s analysis time, ~1.8GB memory
  • Small Rust project (500 files): ~450ms analysis time, ~40MB memory

Context Positioning

Why It Matters

Transformer models don't attend equally to all tokens:

  • Strong attention at HEAD: First 20% gets highest attention
  • Weak attention in MIDDLE: Middle 60% gets less focus
  • Strong attention at TAIL: Final 20% gets second-highest attention

Strategy: Place most important files where LLMs attend best.

3-Tier System

HEAD (20%)

  • Query-relevant files with high centrality
  • Entry points matching query terms
  • Critical dependencies for query context
  • Goal: Give LLM immediate relevant context

MIDDLE (60%)

  • Supporting files with lower centrality
  • Utility code and helpers
  • Test files (if included)
  • Goal: Provide background without overwhelming attention

TAIL (20%)

  • Core high-centrality files (lib.rs, init.py)
  • Foundational configuration and types
  • Architectural anchors
  • Goal: Ground LLM understanding with core concepts

Positioning Algorithm

1. Compute PageRank centrality for all files
2. Score query relevance (if query provided)
3. Combined score = centrality_weight * centrality +
                    query_weight * relevance
4. Sort by combined score
5. Top 20% → HEAD
6. Middle 60% → MIDDLE
7. Bottom 20% with highest centrality → TAIL
8. Group related files within each tier

Configuration

ScalingConfig

Field Type Default Description
performance_target PerformanceTarget Balanced Speed, Balanced, or Quality
parallel_threads Option<usize> CPU count Thread pool size
cache_size_mb usize 1000 Maximum cache size
enable_caching bool true Enable persistent cache
enable_positioning bool true Enable context positioning
max_file_size usize 1_000_000 Skip files larger than this

ContextPositioningConfig

Field Type Default Description
enable_positioning bool true Enable/disable positioning
head_percentage f64 0.20 Percentage for HEAD section
tail_percentage f64 0.20 Percentage for TAIL section
centrality_weight f64 0.4 Weight for centrality scoring
query_relevance_weight f64 0.3 Weight for query matching
relatedness_weight f64 0.3 Weight for file grouping

Optimizations

Memory Management

  • Streaming processing: Never load entire repository into memory
  • Lazy content loading: Load file content only when needed
  • Compressed caching: Reduce cache memory footprint
  • Incremental GC: Release memory as processing progresses

I/O Optimization

  • Async file operations: Non-blocking reads with tokio
  • Read-ahead buffering: Pre-fetch likely-needed files
  • Memory-mapped files: For very large files
  • Batched writes: Coalesce cache updates

CPU Optimization

  • Work stealing: Dynamic thread load balancing
  • SIMD: Use SIMD for hash computation (Blake3)
  • Compiled patterns: Pre-compile regex and globs
  • Lazy evaluation: Skip unnecessary computations

Integration

scribe-scaling is used by:

  • CLI: Top-level orchestration of repository analysis
  • scribe-webservice: Powers web API for large repositories
  • scribe-selection: Provides scalable selection infrastructure
  • scribe-graph: Scales PageRank to large graphs

See Also

  • docs/context-positioning.md: Detailed context positioning documentation
  • scribe-selection: File selection algorithms that scaling optimizes
  • scribe-graph: PageRank computation used by positioning
  • ../../WHY_SCRIBE.md: Philosophy on performance and intelligence