RustScout
A high-performance, concurrent code search tool written in Rust. RustScout is designed for quickly searching and analyzing large codebases with a focus on performance and usability.
Features
-
๐ High Performance: Utilizes Rust's concurrency features for blazing-fast searches
-
๐ Incremental Search: Smart caching for faster repeated searches
- Automatic change detection (Git or file signatures)
- Cache compression support
- Configurable cache size and location
- Intelligent cache invalidation
-
๐ Smart Search: Support for multiple patterns with mix of simple text and regex
- Word boundary matching for precise identifier search
- Smart hyphen handling (by default, uses code/joining mode where
test-caseis one token; use--hyphen-mode=boundaryfor natural text wherehello-worldis two words) - Underscores always join words, even when bridging different scripts (e.g.,
hello_ไธ็orcafรฉ_์๋ ) - Full Unicode support for word boundaries
- Configurable per-pattern behavior
- Smart hyphen handling (by default, uses code/joining mode where
At a Glance: Word Boundary Behavior
- Hyphens: Default (joining mode for code), or
--hyphen-mode=boundaryfor text - Underscores: Always join words (no override)
- Word Boundaries: Auto-adds
\bunless already in pattern - Unicode: Full support for mixed scripts and special characters
- Mix of simple and regex patterns
- Case-sensitive and case-insensitive modes
- Word boundary matching for precise identifier search
-
๐ Search and Replace: Powerful find and replace functionality
- Memory-efficient processing for files of any size
- Preview changes before applying
- Backup and undo support
- Regular expressions with capture groups
-
๐ File Filtering: Flexible ignore patterns and file type filtering
-
๐ Rich Output: Detailed search results with statistics
-
๐ Context Lines: Show lines before and after matches for better understanding
--context-before Nor-B N: Show N lines before each match--context-after Nor-A N: Show N lines after each match--context Nor-C N: Show N lines before and after each match
-
๐ ๏ธ Developer Friendly: Clear documentation with .NET comparison examples
Installation
From crates.io
# Install CLI tool
# Or add library to your project
From Source
Quick Start
Basic usage:
# Simple text search
# Search with word boundaries
# Search with regex and word boundaries
# Note: If your regex already has \b markers (e.g., "\btest\b"), RustScout preserves them.
# Otherwise, --word-boundary automatically adds them around your pattern.
# Filter by file type
# Show only statistics
# Ignore specific patterns
# Control thread count
Usage Examples
Basic Search
# Search for a pattern in current directory
# Search in a specific directory
# Case-sensitive search
# Show context lines around matches
Advanced Pattern Matching
Basic Regex Examples
# Find function definitions
# Find standalone TODO comments
# Multiple patterns with word boundaries
# Mix of patterns with different settings
Hyphen and Underscore Handling
# Smart hyphen handling (--hyphen-mode flag)
# Underscore handling (always joins in all modes)
Unicode Hyphen Support
# Matches with any hyphen type:
# - ASCII hyphen-minus (U+002D)
# - Unicode hyphen (U+2010)
# - Non-breaking hyphen (U+2011)
# - Figure dash (U+2012)
# - En dash (U+2013)
Regex with Word Boundaries
# Explicit \b in pattern vs. --word-boundary flag
# Pattern with no word boundaries
Unicode Word Boundaries
# Unicode-aware word boundaries
# Mixed-script identifiers (underscore always joins different scripts)
Incremental Search
# Enable incremental search with default settings
# Specify cache location
# Choose change detection strategy
# Enable cache compression
# Set cache size limit
Search and Replace
# Simple text replacement
# Preview changes before applying
# Replace with regex and capture groups
# Complete backup and undo workflow
# Preserve file metadata
# Custom backup directory
# Examples of validation behavior (with descriptive errors)
# File size processing strategies (configurable thresholds)
# Undo system features
Validation and Safety Features
As of v1.1.0, RustScout includes enhanced validation and safety features to ensure reliable replacements. For the full story behind these improvements, check out our blog post on the replace module journey.
RustScout includes robust validation and safety features to ensure reliable replacements:
-
Pattern Validation
- Empty patterns are rejected to prevent accidental mass replacements
- Regex patterns are validated before execution with clear error messages
- Capture group references are checked against available groups
- Overlapping replacements are detected and prevented with line numbers
-
File Processing Safety
- Adaptive processing based on file size:
- Small files (< 32KB by default): Memory mapping for speed
- Medium files: Buffered reading with reasonable memory usage
- Large files (> 10MB by default): Streaming for memory efficiency
- Configurable thresholds via CLI flags or config file
- Clear error messages guide you to the right processing strategy
- Adaptive processing based on file size:
-
Backup System
- Automatic backup creation before modifications
- Backups stored with timestamps for easy identification
- Custom backup directory support
- Metadata preservation option
-
Undo System
- JSON-formatted undo logs for transparency
- Dry-run option to preview restorations
- Bulk undo support for reverting multiple changes
- Chronological ordering of operations
Replace Pipeline:
Input -> Validate -> Backup -> Replace -> Record Undo
|
v
[Restore if needed]
File Filtering
# Search only Rust files
# Search multiple file types
# Ignore specific patterns
Configuration
RustScout can be configured via a YAML file (.rustscout.yaml). Configuration files are loaded from multiple locations in order of precedence:
- Custom config file specified via
--configflag - Local
.rustscout.yamlin the current directory - Global
$HOME/.config/rustscout/config.yaml
Example configuration with explanations:
# Search Patterns
# - Support for multiple patterns in a single search
# - Each pattern can be simple text or regex
# - Simple patterns use fast string matching
# - Regex patterns use full regex engine
patterns:
- "TODO" # Simple text pattern
- "FIXME" # Another simple pattern
- "BUG-\\d+" # Regex pattern with number
- text: "test" # Pattern with explicit settings
is_regex: false
boundary_mode: WholeWords # Match whole words only
- text: "address" # Pattern with no boundaries
boundary_mode: None # Match within words too
# Legacy single pattern support (using regex alternation)
pattern: "TODO|FIXME"
# Root directory to search in (default: ".")
root_path: "."
# File Extensions
# - Optional list to include (case-insensitive)
# - If not specified, searches all non-binary files
file_extensions:
- "rs"
- "toml"
# Ignore Patterns
# - Uses .gitignore syntax
# - Supports glob patterns
# - Built-in ignores: .git/, target/
ignore_patterns:
- "target/**" # Ignore target directory
- ".git/**" # Ignore git directory
- "**/*.min.js" # Ignore minified JS
- "invalid.rs" # Ignore any file named invalid.rs
# Ignore Pattern Syntax
RustScout uses a simplified `.gitignore`-like syntax:
- If the pattern **does not contain a slash**, it matches **only** the final file name.
- Example: `invalid.rs` will match **any** file named `invalid.rs` in any directory.
- If the pattern **contains a slash**, it is interpreted as a glob pattern applied to the **entire path**.
- Example: `tests/*.rs` matches `.rs` files in the `tests/` folder only.
- Example: `**/invalid.rs` matches `invalid.rs` anywhere in the directory tree.
### Examples
- `invalid.rs` => Ignores any file with the exact name `invalid.rs`.
- `**/test_*.rs` => Ignores `test_foo.rs`, `test_bar.rs` in **any** subdirectory.
- `docs/*.md` => Ignores `.md` files in `docs/`, but not deeper subdirs like `docs/nested/`.
# Performance Settings
stats_only: false # Show only statistics
thread_count: 4 # Number of threads (default: CPU cores)
# Logging
log_level: "info" # trace, debug, info, warn, error
# Context Lines
context_before: 2 # Lines before matches
context_after: 2 # Lines after matches
# Incremental Search
incremental: false # Enable incremental search
cache_path: ".rustscout/cache.json"
cache_strategy: "auto" # "auto", "git", or "signature"
max_cache_size: "100MB" # Optional size limit
use_compression: false # Enable cache compression
# File Size Processing Strategies
processing:
small_file_threshold: 32KB # Default: 32KB
large_file_threshold: 10MB # Default: 10MB
# Undo System
undo:
- "old_api"
- "new_api"
Command-Line Options
)
)
)
)
<PATTERN> Pattern
<FILES>...
)
)
Library Usage
RustScout can also be used as a library in your Rust projects:
[]
= "0.1.0"
Search Example
use ;
use NonZeroUsize;
use PathBuf;
Replace Example
use ;
use PathBuf;
Performance & Benchmarks
Performance comparison with other popular search tools (searching a large Rust codebase):
| Tool | Time (ms) | Memory (MB) |
|---|---|---|
| RustScout | 120 | 15 |
| ripgrep | 150 | 18 |
| grep | 450 | 12 |
Note: These are example benchmarks. Actual performance may vary based on the specific use case and system configuration.
Adaptive Processing Strategies
RustScout employs different processing strategies based on file size:
-
Small Files (<32KB):
- Direct string operations
- ~330 ยตs for simple patterns
- ~485 ยตs for regex patterns
- Optimal for quick access to small files
-
Medium Files (32KB - 10MB):
- Buffered reading
- ~1.4 ms for 10 files
- ~696 ยตs for 50 files (parallel processing)
- Good balance of memory usage and performance
-
Large Files (>10MB):
- Memory mapping for efficient access
- ~2.1 ms for 20MB file with simple pattern
- ~3.2 ms for 20MB file with regex pattern
- Parallel pattern matching within files
Pattern Optimization
-
Pattern Caching:
- Global thread-safe pattern cache
- Simple patterns: ~331 ยตs (consistent performance)
- Regex patterns: ~483 ยตs (~0.7% improvement)
- Zero-cost abstraction when no contention
-
Smart Pattern Detection:
- Automatic detection of pattern complexity
- Simple string matching for basic patterns
- Full regex support for complex patterns
- Threshold-based optimization
Memory Usage Tracking
RustScout now includes comprehensive memory metrics:
- Total allocated memory and peak usage
- Memory mapped regions for large files
- Pattern cache size and hit/miss rates
- File processing statistics by size category
Performance Tips
-
Use Simple Patterns when possible:
# Faster - uses optimized literal search # Slower - requires regex engine -
Control Thread Count based on your system:
# Use all available cores (default) # Limit to 4 threads for lower CPU usage -
Filter File Types to reduce search space:
# Search only Rust and TOML files -
Monitor Memory Usage:
# Show memory usage statistics
Troubleshooting
Common Issues and Solutions
-
Pattern Not Found
- Issue: Search returns no results
- Solutions:
- Check if pattern is case-sensitive
- Verify file extensions are correctly specified
- Check ignore patterns aren't too broad
-
Performance Issues
- Issue: Search is slower than expected
- Solutions:
- Use simple patterns instead of complex regex
- Adjust thread count with
--threads - Filter specific file types with
--extensions - Check if searching binary files (use
--stats-onlyto verify)
-
Permission Errors
- Issue: "Permission denied" errors
- Solutions:
- Run with appropriate permissions
- Check file and directory access rights
- Use ignore patterns to skip problematic directories
-
Invalid Regex Pattern
- Issue: "Invalid regex pattern" error
- Solutions:
- Escape special characters:
\., \*, \+ - Use raw strings for Windows paths:
\\path\\to\\file - Verify regex syntax at regex101.com
- Escape special characters:
-
Memory Usage
- Issue: High memory consumption
- Solutions:
- Use
--stats-onlyfor large codebases - Filter specific file types
- Adjust thread count to limit concurrency
- Use
Error Messages
| Error Message | Cause | Solution |
|---|---|---|
| Error: Invalid regex pattern | Malformed regex expression | Check regex syntax and escape special characters |
| Error: Permission denied | Insufficient file permissions | Run with appropriate permissions or ignore problematic paths |
| Error: File too large | File exceeds size limit | Use --stats-only or filter by file type |
| Error: Invalid thread count | Invalid --threads value |
Use a positive number within system limits |
| Error: Invalid file extension | Malformed extension filter | Use comma-separated list without spaces |
Development Process
RustScout represents an interesting experiment in AI-assisted software development. The entire codebase was primarily developed through collaboration with AI language models, with human oversight focusing on:
- Project direction and requirements
- Design decisions and architecture
- Testing and validation
- User experience
This approach demonstrates how AI can be leveraged to:
- Bootstrap complex software projects
- Implement best practices and patterns
- Handle sophisticated technical implementations
- Maintain consistency across a growing codebase
The project serves as a case study in AI-driven development, showing that with proper orchestration, AI can produce production-quality code while adhering to language idioms and best practices. Notably, this was achieved without the human overseer having prior Rust experience, illustrating how AI can bridge the gap between concept and implementation.
Development Principles
- AI handles implementation details
- Humans focus on high-level direction
- Continuous testing and validation
- Emphasis on maintainable, documented code
- Regular review and refinement cycles
This transparent approach to development aims to:
- Demonstrate the capabilities of AI-assisted development
- Provide insights into new software development methodologies
- Encourage discussion about AI's role in software engineering
- Show how AI can complement human expertise
We welcome contributions and discussions about both the codebase and the development methodology.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request. See CONTRIBUTING.md for guidelines.
License
This project is licensed under the MIT License - see the LICENSE file for details.