PatternHunt - Advanced File Globbing Library for Rust
A high-performance, feature-rich glob pattern matching library for Rust, designed for efficient and flexible file path matching with both synchronous and asynchronous APIs.
Overview
This library provides robust glob pattern matching capabilities with support for advanced features such as brace expansion, extended glob patterns (extglobs), regex integration, and metadata-based filtering. It is optimized for performance with caching mechanisms for compiled patterns and filesystem metadata, making it suitable for both small-scale and large-scale filesystem operations.
The library is built with safety and security in mind, including protections against path traversal, symlink cycles, and excessive resource usage (e.g., preventing ReDoS attacks and stack overflows).
Features
- Brace Expansion: Supports nested brace patterns (e.g.,
file.{txt,md}) and numeric ranges (e.g.,test{1..3}). - Extended Glob Patterns: Handles advanced glob patterns like
@(pattern),*(pattern),+(pattern),?(pattern), and!(pattern)using themicromatchmodule. - Regex Integration: Supports explicit regex patterns prefixed with
re:and converts complex glob patterns to regex when needed. - Synchronous and Asynchronous APIs: Provides both
syncandasyncglobbing functions for flexible integration into different application types. - Metadata Filtering: Allows filtering of matched files based on size, file type, and timestamps using the
Predicatesstruct. - Caching: Implements LRU caching for compiled glob patterns, regexes, and filesystem metadata to improve performance.
- Security Features:
- Path traversal protection.
- Symlink cycle detection.
- Configurable symlink following behavior.
- Limits on brace expansion depth and count to prevent DoS attacks.
- Regex complexity checks to prevent ReDoS attacks.
- Configurable Options: Fine-grained control over globbing behavior through
GlobOptions, including case sensitivity, maximum depth, and concurrency limits. - Performance Monitoring: Provides cache metrics (hits, misses, evictions) for performance tuning.
Installation
Add the following to your Cargo.toml:
[]
= { = "https://github.com/username/glob-lib.git" }
Ensure you have the required dependencies installed, including globset, regex, lru, once_cell, camino, walkdir, and tokio (for async features).
Usage
Basic Example (Synchronous)
use ;
use Utf8PathBuf;
Asynchronous Example
use ;
use StreamExt;
async
Advanced Example with Predicates
use ;
use ;
Modules
brace.rs: Handles brace expansion with support for nested braces and numeric ranges, with protections against excessive recursion and expansion counts.cache.rs: Implements LRU caching for compiled glob patterns and regexes, with TTL-based expiration and performance metrics.micromatch.rs: Converts extended glob patterns to regex, supporting features like character classes and extglobs.mod.rs: Core pattern compilation and matching logic, integrating brace expansion, regex, and glob patterns.batch_io.rs: Provides efficient filesystem metadata access with caching and symlink handling.error.rs: Defines comprehensive error types for all glob operations.async_glob.rs: Implements asynchronous globbing with a streaming API and bounded concurrency.options.rs: Configures globbing behavior with a builder pattern for flexible customization.predicates.rs: Filters files based on metadata attributes like size, type, and timestamps.sync.rs: Implements synchronous globbing usingWalkDirfor efficient directory traversal.
Error Handling
The library uses a comprehensive GlobError enum to handle errors, including:
- I/O errors (
Io) - Regex compilation errors (
Regex) - Invalid pattern syntax (
InvalidPattern) - Path traversal attempts (
PathTraversal) - Symlink cycles (
SymlinkCycle) - Permission issues (
PermissionDenied) - Excessive brace expansion (
BraceExpansionDepth,BraceExpansionCount) - Regex complexity limits (
RegexTooComplex)
Performance Considerations
- Caching: Use
cache_metrics()to monitor cache performance and adjustMAX_CACHE_SIZEor TTL as needed. - Concurrency: For async operations, tune
max_inflightinGlobOptionsto balance performance and resource usage. - Depth Limits: Set
max_depthto avoid excessive traversal in deep directory structures. - Symlink Handling: Disable
follow_symlinksif symlinks are not needed to reduce I/O overhead.
Security Considerations
- Path Traversal Protection: Patterns containing
../are rejected to prevent unauthorized access. - Symlink Cycle Detection: Prevents infinite loops when following symlinks.
- Resource Limits: Caps on brace expansion and regex complexity prevent resource exhaustion.
- Permission Checks: Ensures files are readable before processing.
Testing
The library includes comprehensive unit tests for each module, covering:
- Brace expansion (
brace.rs) - Cache performance and eviction (
cache.rs) - Extended glob pattern conversion (
micromatch.rs) - Pattern compilation and matching (
mod.rs)
Run tests with:
Contributing
Contributions are welcome! Please submit issues or pull requests to the GitHub repository. Ensure code follows Rust conventions and includes tests for new features.
License
This project is licensed under the MIT License. See the LICENSE file for details.