scribe-patterns
Advanced pattern matching and filtering for Scribe repository analysis.
Overview
scribe-patterns provides sophisticated pattern matching capabilities for file selection, filtering, and search operations. It handles glob patterns, regex matching, .gitignore semantics, and custom ignore rules with high performance and correct edge case handling.
Key Features
Glob Pattern Matching
- Standard glob syntax:
*,**,?,[abc],{a,b,c} - Directory-aware matching: Handles
**/for recursive directory traversal - Negative patterns:
!patternto exclude specific files - Case sensitivity control: Case-insensitive matching on Windows by default
Gitignore Semantics
.gitignoreparsing: Full compatibility with Git's ignore rules- Directory negation: Properly handles
!negation patterns - Relative vs absolute paths: Distinguishes
/patternfrompattern - Trailing slashes: Directory-only patterns with
/ - Comment support: Lines starting with
#are ignored
Custom Ignore Files
.scribeignore: Scribe-specific ignore patterns- Multiple ignore files: Hierarchical ignore file processing
- Override precedence: Later patterns override earlier ones
- Inheritance: Child directories inherit parent ignore rules
Performance Optimizations
- Compiled pattern sets: Pre-compile globs into efficient matchers
- Aho-Corasick for literals: Fast multi-pattern matching for literal strings
- Regex caching: Compiled regex patterns are cached
- Early returns: Short-circuit evaluation for common cases
Architecture
Pattern Input → Parser → Compiled Matcher → Match Engine
↓ ↓ ↓ ↓
Glob/Regex Validate globset/regex Apply to Paths
Strings Syntax Compilation Fast Matching
Core Components
PatternSet
Collection of patterns with unified matching interface:
- Globs: File name patterns like
*.rs,**/*.py - Regex: Complex patterns using regular expressions
- Literals: Exact string matches (optimized with Aho-Corasick)
- Negations: Exclude patterns that override includes
IgnoreBuilder
Constructs ignore rule sets from multiple sources:
.gitignorefiles: Standard Git ignore semantics.scribeignorefiles: Scribe-specific patterns- Custom patterns: Programmatically added rules
- Precedence handling: Correct override behavior
PathMatcher
Efficient path matching against pattern sets:
- Compiled matchers: Pre-compiled globset for performance
- Path normalization: Handles Windows vs Unix path separators
- Absolute vs relative: Correct matching for both path types
- Directory detection: Special handling for directory patterns
PatternParser
Parses and validates pattern syntax:
- Glob expansion: Converts globs to regex when needed
- Escape sequence handling: Properly handles
\*,\?, etc. - Error reporting: Clear error messages for invalid patterns
- Syntax validation: Detects malformed patterns early
Usage
Basic Glob Matching
use ;
let patterns = from_globs?;
let matcher = new;
assert!;
assert!;
assert!; // Negated
Gitignore-Style Filtering
use IgnoreBuilder;
let mut builder = new;
builder.add_gitignore?;
builder.add_custom?; // Exclude Rust build directory
builder.add_custom?; // But include this file
let ignore = builder.build?;
for entry in new
Multiple Pattern Sets
use ;
// Include patterns
let include = from_globs?;
// Exclude patterns
let exclude = from_globs?;
let matcher = new
.include
.exclude;
// File must match include AND not match exclude
if matcher.should_include
Regex Patterns
use PatternSet;
let patterns = from_regex?;
assert!;
assert!;
Case-Insensitive Matching
use ;
let patterns = from_globs?;
let options = MatchOptions ;
let matcher = new.with_options;
assert!; // Matches *.Md
assert!; // Matches *.TXT
Pattern Syntax
Glob Patterns
| Pattern | Matches | Example |
|---|---|---|
* |
Any string (not /) |
*.rs → main.rs, lib.rs |
** |
Any path segment | **/*.py → a/b/c.py |
? |
Single character | ?.txt → a.txt, 1.txt |
[abc] |
Character set | [abc].rs → a.rs, b.rs |
{a,b} |
Alternatives | *.{rs,py} → main.rs, util.py |
!pattern |
Negation | !test*.py → exclude test files |
Gitignore Rules
| Pattern | Behavior |
|---|---|
pattern |
Matches in any directory |
/pattern |
Matches only at root |
dir/ |
Matches directory only |
!pattern |
Negates previous patterns |
#comment |
Ignored line |
Special Cases
- Empty patterns: Ignored (no effect)
- Whitespace: Leading/trailing whitespace is trimmed
- Backslash escapes:
\*matches literal* - Unicode: Full UTF-8 support for paths and patterns
Performance
Benchmarks
Pattern compilation and matching is highly optimized:
- Glob compilation: <1ms for typical pattern sets (10-50 patterns)
- Path matching: <1μs per path for compiled matchers
- Literal matching: <100ns using Aho-Corasick for large literal sets
- Regex matching: ~1-10μs depending on pattern complexity
Optimizations
- Lazy compilation: Patterns compiled only when first used
- Caching: Compiled matchers cached in
OnceCell - Fast paths: Literal string matching before expensive regex
- Set operations: Boolean algebra simplification for pattern sets
- Aho-Corasick: Multi-pattern matching for literals in O(n) time
Configuration
MatchOptions
| Field | Type | Default | Description |
|---|---|---|---|
case_sensitive |
bool |
Platform | Match case-sensitively |
require_literal_separator |
bool |
true |
* doesn't match / |
require_literal_leading_dot |
bool |
true |
* doesn't match .hidden |
IgnoreOptions
| Field | Type | Default | Description |
|---|---|---|---|
hidden |
bool |
true |
Ignore hidden files (.file) |
parents |
bool |
true |
Check parent .gitignore files |
git_global |
bool |
false |
Use Git global ignore |
git_exclude |
bool |
false |
Use .git/info/exclude |
Error Handling
All pattern operations return Result<T, PatternError>:
Integration
scribe-patterns is used throughout Scribe:
- scribe-scanner: Filters files during repository traversal
- scribe-analysis: Selects files for AST parsing
- scribe-selection: Applies include/exclude rules to selection
- CLI: Processes
--includeand--excludeflags
See Also
scribe-scanner: Repository scanning and filteringscribe-selection: File selection using patternsscribe-core: Shared types and configuration- globset documentation: Underlying glob implementation