Expand description
§Scribe Scanner
High-performance file system scanning and indexing capabilities for the Scribe library. This crate provides efficient tools for discovering, filtering, and analyzing files in large codebases with git integration and parallel processing.
§Features
- Fast Repository Traversal: Efficient file discovery using
walkdir
andignore
- Git Integration: Prefer
git ls-files
when available, with fallback to filesystem walk - Language Detection: Automatic detection for 25+ programming languages
- Content Analysis: Extract imports, documentation structure, and metadata
- Parallel Processing: Memory-efficient parallel file processing using Rayon
- Binary Detection: Smart binary file detection using content analysis
§Usage
use scribe_scanner::{Scanner, ScanOptions};
use std::path::Path;
let scanner = Scanner::new();
let options = ScanOptions::default()
.with_git_integration(true)
.with_parallel_processing(true);
let results = scanner.scan(Path::new("."), options).await?;
println!("Scanned {} files", results.len());
Re-exports§
pub use content::ContentAnalyzer;
pub use content::ContentStats;
pub use content::DocumentationInfo;
pub use content::ImportInfo;
pub use git_integration::GitCommitInfo;
pub use git_integration::GitFileInfo;
pub use git_integration::GitIntegrator;
pub use language_detection::DetectionStrategy;
pub use language_detection::LanguageDetector;
pub use language_detection::LanguageHints;
pub use metadata::FileMetadata;
pub use metadata::MetadataExtractor;
pub use metadata::SizeStats;
pub use scanner::ScanOptions;
pub use scanner::ScanProgress;
pub use scanner::ScanResult;
pub use scanner::Scanner;
pub use filtering::DirectoryFilter;
pub use filtering::FileFilter;
pub use filtering::FilterReason;
pub use filtering::FilterResult;
pub use parallel::ParallelConfig;
pub use parallel::ParallelController;
pub use parallel::ParallelMetrics;
pub use parallel::WorkItem;
pub use aho_corasick_reference_index::AhoCorasickReferenceIndex;
pub use aho_corasick_reference_index::IndexConfig;
pub use aho_corasick_reference_index::IndexMetrics;
pub use performance::ErrorType;
pub use performance::PerfTimer;
pub use performance::PerformanceMonitor;
pub use performance::PerformanceReport;
pub use performance::PerformanceSnapshot;
pub use performance::PERF_MONITOR;
Modules§
- aho_
corasick_ reference_ index - High-performance file reference indexing using Aho-Corasick multi-pattern search.
- content
- Content analysis for extracting imports, documentation structure, and code metrics.
- filtering
- High-performance file filtering with early content reads and strict pre-filtering.
- git_
integration - Git integration for enhanced file discovery and status tracking.
- language_
detection - Advanced programming language detection for 25+ languages.
- metadata
- File metadata extraction and analysis.
- parallel
- Bounded parallelism with backpressure control and adaptive batching.
- performance
- Performance instrumentation and monitoring for the scanning system.
- scanner
- Core scanning functionality for efficient file system traversal.
Macros§
- perf_
timer - Macro for easy performance timing
Structs§
- File
Scanner - High-level scanner facade providing convenient access to all scanning functionality
- Scanner
Stats - Statistics about the scanning process
Constants§
- VERSION
- Current version of the scanner crate