rag-cli
Local-first semantic search over your files. Index a directory and query it with natural language — all embeddings are computed on your machine using candle, with no external API calls.
Quick start
cargo install --path .
# Index a directory
rag index ./my-project
# Search it
rag search "how does authentication work"
Commands
rag index <path>
Recursively discovers text files under <path>, chunks them, computes
embeddings, and writes the index to disk.
| Flag | Default | Description |
|---|---|---|
-o, --output <dir> |
.rag |
Where to store the index |
-m, --model <id> |
sentence-transformers/all-MiniLM-L6-v2 |
HuggingFace model ID |
--chunk-size <n> |
512 |
Chunk size in characters |
--chunk-overlap <n> |
64 |
Overlap between consecutive chunks |
Re-running rag index on the same directory performs incremental indexing —
only changed or new files are re-embedded. File changes are detected using
blake3 content hashes. If you change
the model or chunk settings the entire index is rebuilt automatically.
rag search <query>
Embeds your query and returns the most similar chunks by cosine similarity.
| Flag | Default | Description |
|---|---|---|
-i, --index <dir> |
.rag |
Index directory to search |
-k, --top-k <n> |
5 |
Number of results |
-m, --model <id> |
(from index) | Override embedding model |
--full |
off | Show full chunk text instead of truncated preview |
--json |
off | Output compact JSON (for piping to LLMs or other tools) |
rag info
Prints index metadata: model, chunk count, source file count, index size, etc.
| Flag | Default | Description |
|---|---|---|
-i, --index <dir> |
.rag |
Index directory to inspect |
Hardware acceleration
On macOS, Metal GPU and Accelerate BLAS are enabled automatically. On other platforms the build falls back to CPU — no extra flags are needed.
Model management
rag-cli downloads model weights directly from HuggingFace over HTTPS on first
use, then caches them locally in the standard HuggingFace Hub layout
(~/.cache/huggingface/hub). We do this instead of using the hf-hub crate because hf-hub uses a bundled
certificate store and does not respect the system root CA certificates. That
makes it fail in environments with custom CA roots (corporate proxies, internal
TLS inspection, etc.). rag-cli uses native-tls, which delegates to the OS certificate store,
so it works in those environments without extra configuration.
You can control the cache location:
# Via flag
# Via environment variable
# Or use HF_HOME (standard HuggingFace convention)
Supported file types
rag-cli indexes 60+ text file extensions including common source code (.rs,
.py, .js, .ts, .go, .java, .c, .cpp, etc.), markup (.md,
.html, .tex, .rst), config (.toml, .yaml, .json, .ini), and
infrastructure files (.tf, .hcl, .dockerfile, .nix). Files like
Makefile, Dockerfile, LICENSE, and .gitignore are recognized by name.
Hidden directories, node_modules, target, __pycache__, vendor, dist,
and build are skipped automatically.
How it works
- Discover text files recursively, skipping binary and vendored content
- Chunk each file into overlapping segments (~512 chars), breaking at paragraph or line boundaries when possible
- Embed chunks in batches of 64 using a BERT model
(
all-MiniLM-L6-v2, 384-dimensional vectors) with mean pooling and L2 normalization - Store the index as a bincode-serialized file (
.rag/index.bin) with JSON metadata (.rag/meta.json) - Search by embedding the query with the same model and ranking chunks by cosine similarity (dot product on normalized vectors)
License
Apache-2.0