Crate scribe_scanner

Crate scribe_scanner 

Source
Expand description

§Scribe Scanner

High-performance file system scanning and indexing capabilities for the Scribe library. This crate provides efficient tools for discovering, filtering, and analyzing files in large codebases with git integration and parallel processing.

§Features

  • Fast Repository Traversal: Efficient file discovery using walkdir and ignore
  • Git Integration: Prefer git ls-files when available, with fallback to filesystem walk
  • Language Detection: Automatic detection for 25+ programming languages
  • Content Analysis: Extract imports, documentation structure, and metadata
  • Parallel Processing: Memory-efficient parallel file processing using Rayon
  • Binary Detection: Smart binary file detection using content analysis

§Usage

use scribe_scanner::{Scanner, ScanOptions};
use std::path::Path;

let scanner = Scanner::new();
let options = ScanOptions::default()
    .with_git_integration(true)
    .with_parallel_processing(true);

let results = scanner.scan(Path::new("."), options).await?;
println!("Scanned {} files", results.len());

Re-exports§

pub use content::ContentAnalyzer;
pub use content::ContentStats;
pub use content::DocumentationInfo;
pub use content::ImportInfo;
pub use git_integration::GitCommitInfo;
pub use git_integration::GitFileInfo;
pub use git_integration::GitIntegrator;
pub use language_detection::DetectionStrategy;
pub use language_detection::LanguageDetector;
pub use language_detection::LanguageHints;
pub use metadata::FileMetadata;
pub use metadata::MetadataExtractor;
pub use metadata::SizeStats;
pub use scanner::ScanOptions;
pub use scanner::ScanProgress;
pub use scanner::ScanResult;
pub use scanner::Scanner;
pub use filtering::DirectoryFilter;
pub use filtering::FileFilter;
pub use filtering::FilterReason;
pub use filtering::FilterResult;
pub use parallel::ParallelConfig;
pub use parallel::ParallelController;
pub use parallel::ParallelMetrics;
pub use parallel::WorkItem;
pub use aho_corasick_reference_index::AhoCorasickReferenceIndex;
pub use aho_corasick_reference_index::IndexConfig;
pub use aho_corasick_reference_index::IndexMetrics;
pub use performance::ErrorType;
pub use performance::PerfTimer;
pub use performance::PerformanceMonitor;
pub use performance::PerformanceReport;
pub use performance::PerformanceSnapshot;
pub use performance::PERF_MONITOR;

Modules§

aho_corasick_reference_index
High-performance file reference indexing using Aho-Corasick multi-pattern search.
content
Content analysis for extracting imports, documentation structure, and code metrics.
filtering
High-performance file filtering with early content reads and strict pre-filtering.
git_integration
Git integration for enhanced file discovery and status tracking.
language_detection
Advanced programming language detection for 25+ languages.
metadata
File metadata extraction and analysis.
parallel
Bounded parallelism with backpressure control and adaptive batching.
performance
Performance instrumentation and monitoring for the scanning system.
scanner
Core scanning functionality for efficient file system traversal.

Macros§

perf_timer
Macro for easy performance timing

Structs§

FileScanner
High-level scanner facade providing convenient access to all scanning functionality
ScannerStats
Statistics about the scanning process

Constants§

VERSION
Current version of the scanner crate