A powerful CLI tool that enhances code understanding and automation by finding the most relevant files in your codebase for AI-assisted programming.
Why This Tool?
Finding the right files to manipulate with AI is crucial for effective code generation and modification. Traditional approaches like grep or fuzzy finding often miss semantically relevant files that don't contain exact keyword matches.
This tool uses a hybrid approach combining two powerful search techniques:
-
BM25 Ranking: A battle-tested information retrieval algorithm (used by search engines) that excels at keyword matching while accounting for term frequency and document length. It's particularly good at finding files containing specific technical terms or function names.
-
RAG (Retrieval Augmented Generation) with Embedding Distance: Uses OpenAI's embeddings to capture the semantic meaning of both your query and codebase. By measuring vector dot product distances, it can find conceptually related files even when they use different terminology.
The hybrid scoring system combines both approaches:
- BM25 helps catch direct matches and technical terms
- Embedding distance captures semantic relationships and higher-level concepts
- Results are normalized and merged to give you the most relevant files for your task
This dual approach helps ensure you don't miss important context when using AI to modify your codebase.
Warnings!
This tool is alpha and not thoroughly evalulated with real world tests.
Features
- File scanning with customizable chunk sizes and overlap
- Semantic search using OpenAI embeddings and BM25 ranking
- Support for piped input and file suggestions
- Intelligent context expansion
- Supports Unix-philosophy piped commands
Installation
Usage
Scanning Files
Generate embeddings for your codebase using the scan command:
# Basic scan of all Rust files
# Basic scan of all Rust and Markdown files
# Scan with chunking enabled
# Include file metadata in embeddings
# Scan with all options
The scan command:
- Finds files matching your pattern (respecting .gitignore)
- Generates embeddings using OpenAI's API
- Saves results to
.luckyshot.file.vectors.v1
Finding Relevant Files
To find files related to a topic or question:
# Basic file suggestion
# Using piped input
|
# Filter results by similarity score (matches >= specified value, range 0.0 to 1.0)
# Show detailed information including similarity scores
# Show file contents of matches
# Limit number of results
# Combine options
# Chain commands Unix-style
| \
| \
This will:
- Convert your query into an embedding
- Use cross-product ranking to find similar file embedding
- Display relevant files with similarity scores
Expanding Context
To expand a query with additional context:
Environment Setup
You'll need an OpenAI API key. Either:
Or create a .env file:
OPENAI_API_KEY=your-api-key
Experimental
BM25 tokinization and ranking.
License
MIT