fsindex
Fast, powerful filesystem indexing with .gitignore support and an iterator-based API.
Features
- Fast parallel traversal - Uses
rayonfor parallel file processing .gitignoresupport - Respects.gitignore,.git/info/exclude, and global gitignore- Content hashing - XXH3 hashing for change detection
- Language detection - Automatic programming language detection from file extensions
- File watching - Real-time filesystem monitoring with
notify - Flexible configuration - Builder pattern for easy customization
- Iterator-based API - Memory efficient streaming over large directories
- Content chunking - Split files for embedding models with token limits
- Persistent state - Save/load index state for incremental updates
- Content streaming - Stream large files in chunks without loading entirely into memory
- Code structure parsing - Extract functions, classes, imports, and other symbols from source files
Installation
Add to your Cargo.toml:
[]
= "0.1"
Quick Start
use FileIndexer;
let indexer = new;
for file in indexer.files
Configuration
Use the builder pattern for custom configuration:
use ;
let config = builder
.respect_gitignore
.include_hidden
.max_depth
.extensions
.follow_symlinks
.parallel
.build;
let indexer = with_config;
for file in indexer.files
Available Configuration Options
| Option | Default | Description |
|---|---|---|
respect_gitignore |
true |
Honor .gitignore files |
include_hidden |
false |
Include hidden files/directories |
max_depth |
None |
Maximum traversal depth (unlimited by default) |
extensions |
[] |
Filter by file extensions (empty = all) |
exclude_patterns |
[] |
Glob patterns to exclude |
include_patterns |
[] |
Glob patterns to include |
follow_symlinks |
false |
Follow symbolic links |
read_contents |
true |
Read file contents into memory |
max_content_size |
10MB |
Maximum file size for content reading |
custom_ignore_files |
[] |
Additional ignore files |
parallel |
true |
Use parallel traversal |
threads |
0 |
Thread count (0 = auto) |
parse_structure |
false |
Parse code structure (functions, classes, etc.) |
Parallel Processing
For large directories, use parallel file processing:
use FileIndexer;
let indexer = new;
let files = indexer.files_parallel; // Collects and processes in parallel
for file in files
File Watching
Monitor filesystem changes in real-time:
use FileIndexer;
let indexer = new;
let watcher = indexer.watch.expect;
for event in watcher.filtered_events
Language Detection
Files are automatically tagged with their detected programming language:
use FileIndexer;
let indexer = new;
for file in indexer.files
Supports 50+ languages including Rust, Python, JavaScript, TypeScript, Go, C/C++, Java, and many more.
Content Hashing
Each file includes an XXH3 hash for efficient change detection:
use FileIndexer;
let indexer = new;
for file in indexer.files
Error Handling
Use files_result() for explicit error handling:
use FileIndexer;
let indexer = new;
for result in indexer.files_result
Content Chunking
Split file content into chunks suitable for embedding models with token limits:
use FileIndexer;
let indexer = new;
for file in indexer.files
For different tokenization ratios (e.g., code has more tokens per character):
// Use 2.5 chars/token for code
if let Some = file.chunks_with_ratio
Persistent State
Save and load index state to avoid re-scanning unchanged files:
use FileIndexer;
use Path;
let indexer = new;
// Save current state
indexer.save_state.unwrap;
// Later, load and compare
let old_state = load_state.unwrap;
let diff = indexer.diff_with_state;
println!;
println!;
println!;
if diff.has_changes
Content Streaming
For very large files, stream content in chunks instead of loading entirely into memory:
use ;
use Path;
// Stream in 64KB chunks
let mut stream = new.stream_chunks.unwrap;
for chunk in stream
Stream line by line:
use StreamExt;
use Path;
for line in new.stream_lines.unwrap
Code Structure Parsing
Extract functions, classes, imports, and other symbols from source files:
use ;
let config = builder
.extensions
.parse_structure // Enable structure parsing
.build;
let indexer = with_config;
for file in indexer.files
Supported languages for structure parsing:
- Rust (functions, structs, enums, traits, modules, macros)
- Python (functions, classes, imports)
- JavaScript/TypeScript (functions, classes, interfaces, types, imports)
- Go (functions, methods, structs, interfaces, constants)
- Java (classes, interfaces, enums, methods, imports)
- C/C++ (functions, structs, classes, enums, macros, includes)
- C# (classes, interfaces, structs, enums, imports)
- Ruby (methods, classes, modules, requires)
- PHP (functions, classes, interfaces, traits, imports)
License
MIT