llm-utl (llm-utl)
A high-performance Rust tool for converting code repositories into LLM-friendly prompts. Transform your codebase into optimally-chunked, formatted prompts ready for use with Large Language Models like Claude, GPT-4, or other AI assistants.
Features
- ๐ Blazingly Fast - Parallel file scanning with multi-threaded processing
- ๐ฏ Smart Chunking - Automatically splits large codebases into optimal token-sized chunks with overlap
- ๐งน Code Filtering - Removes tests, comments, debug prints, and other noise from code
- ๐ Multiple Formats - Output to Markdown, XML, or JSON
- ๐ Gitignore Support - Respects
.gitignorefiles automatically - ๐ Multi-Language - Built-in filters for Rust, Python, JavaScript/TypeScript, Go, Java, C/C++
- ๐พ Safe Operations - Atomic file writes with automatic backups
- ๐ Statistics - Detailed metrics on processing and token usage
Installation
As a CLI Tool
As a Library
Add to your Cargo.toml:
[]
= "0.1.0"
Quick Start
Command Line Usage
Basic usage:
# Convert current directory to prompts
# Specify input and output directories
# Configure token limits and format
# Dry run to preview what would be generated
All options:
Library Usage
use ;
Advanced Configuration
Code Filtering
Control what gets removed from your code:
use ;
let config = builder
.root_dir
.filter_config
.build?;
Or use presets:
use FilterConfig;
// Minimal - remove everything except code
let minimal = minimal;
// Preserve docs - keep documentation comments
let with_docs = preserve_docs;
// Production - ready for production review
let production = production;
File Filtering
Include or exclude specific files and directories:
use ;
let config = builder
.root_dir
.file_filter_config
.build?;
Important: When using .allow_only(), use glob patterns like **/*.rs instead of *.rs to match files in all subdirectories. The pattern *.rs only matches files in the root directory.
Custom Tokenizers
Choose between simple and enhanced tokenization:
use ;
let config = builder
.root_dir
.tokenizer // More accurate
// .tokenizer(TokenizerKind::Simple) // Faster, ~4 chars per token
.build?;
Output Formats
Markdown (Default)
```rust
fn main() {
}
### XML
```xml
JSON
Use Cases
- ๐ Code Review with AI - Feed your codebase to Claude or GPT-4 for comprehensive reviews
- ๐ Learning - Generate study materials from large codebases
- ๐ Documentation - Create AI-friendly documentation sources
- ๐ Analysis - Prepare code for AI-powered analysis and insights
- ๐ค Training Data - Generate datasets for fine-tuning models
How It Works
The tool follows a 4-stage pipeline:
- Scanner - Discovers files in parallel, respecting
.gitignore - Filter - Removes noise (tests, comments, debug statements) using language-specific filters
- Splitter - Intelligently chunks content based on token limits with overlap for context
- Writer - Renders chunks using Tera templates with atomic file operations
Performance
- Parallel file scanning using all CPU cores
- Streaming mode for large files (>10MB)
- Zero-copy operations where possible
- Optimized for minimal allocations
Typical performance: ~1000 files/second on modern hardware.
Supported Languages
Built-in filtering support for:
- Rust
- Python
- JavaScript/TypeScript (including JSX/TSX)
- Go
- Java/Kotlin
- C/C++
Other languages are processed as plain text.
Examples
See the examples/ directory for more usage examples:
Development
# Clone the repository
# Build
# Run tests
# Run with verbose logging
RUST_LOG=llm_utl=debug
# Format code
# Lint
Troubleshooting
"No processable files found" Error
If you see this error:
Error: No processable files found in '.'.
Common causes:
-
Wrong directory: The tool is running in an empty directory or a directory without source files.
# โ Wrong - running in home directory # โ Correct - specify your project directory -
All files are gitignored: Your
.gitignoreexcludes all files in the directory.# Check what files would be scanned -
No source files: The directory contains only non-source files (images, binaries, etc.).
# Make sure directory contains code files
Quick fix:
# Always specify the directory containing your source code
Permission Issues
If you encounter permission errors:
# Ensure you have read access to source directory
# and write access to output directory
Large Files
If processing is slow with very large files:
# Increase token limit for large codebases
# Or use simple tokenizer for better performance
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
Built with these excellent crates:
- ignore - Fast gitignore-aware file walking
- tera - Powerful template engine
- clap - CLI argument parsing
- tracing - Structured logging