SIFS
SIFS means "SIFS Is Fast Search." It is a Rust code search tool and library for agents, editor integrations, and local developer workflows. It indexes a repository, splits files into useful chunks, embeds those chunks with a Model2Vec-compatible encoder, and runs hybrid search over semantic and BM25 rankings.
On the 63-repository Semble benchmark corpus, SIFS reached NDCG@10=0.8444
with 93.0ms average indexing time and 0.0017ms repeated-query p50 latency.
That is 0.0100 NDCG@10 behind Semble and 0.0173 behind CodeRankEmbed
Hybrid, with a much faster warm query path in this benchmark.
What SIFS does
SIFS turns a local directory or Git repository into a searchable code index. You can use it as a command-line tool, as a Rust crate, or as a Model Context Protocol server for agent clients.
- Search code with natural-language or symbol-heavy queries.
- Find chunks related to a specific file and line.
- Index local directories or shallow-cloned Git repositories.
- Use hybrid, semantic-only, or BM25-only ranking.
- Run BM25 search fully offline without loading or downloading a model.
- Use explicit sparse-only indexes or model-free hashing for local smoke tests.
- Run quality and latency benchmarks over annotated repositories.
Install SIFS
SIFS is intended to ship as the sifs command. Package-manager installs should
install that binary by default.
Homebrew installs are published through
tristanmanchester/homebrew-tap.
The draft formula in packaging/homebrew/sifs.rb
is the source-repo template for the tap formula.
Build SIFS
SIFS builds with Cargo. The default release build gives you the public sifs
binary under target/release/.
The sifs-benchmark and sifs-embed binaries are supported diagnostics for
benchmarking and embedding-model checks. Build them explicitly with the
diagnostics feature.
Run the test suite after changing indexing, chunking, ranking, or model-loading behavior.
Quick start
Use sifs search for direct command-line search. The default path is the
current directory and the default mode is hybrid.
Use sifs find-related when you already have a location and want similar code
elsewhere in the same index.
Start the MCP server by running sifs without a subcommand. Passing a path
pre-indexes that source and lets MCP clients call search and find_related
without including a repo argument on every tool call.
Documentation
Start with the page that matches how you plan to use SIFS.
- Command-line usage covers
search,find-related,init, and MCP server startup. - Rust library usage covers
SifsIndex, search modes, filters, and indexing options. - MCP server usage covers the stdio protocol surface and tool schemas.
- Agent-native scorecard defines the agent-facing contract and readiness evidence.
- Benchmarking covers quality, latency, embedding, and local smoke benchmarks.
- Architecture explains file selection, chunking, embedding, sparse search, dense search, and hybrid ranking.
Search model
SIFS uses minishlab/potion-code-16M by default through a local Model2Vec
loader. The loader reads the model tensors and tokenizer files directly, so the
query path stays inside the Rust process after the model is available locally.
BM25 mode does not use the model at all, so it is safe for network-free package
manager smoke tests and first-run checks.
Use sifs model pull or sifs model fetch to prefetch the default model, and
sifs doctor to check whether semantic search is ready for offline use.
Hybrid search combines semantic and BM25 rankings. It over-fetches candidates, normalizes each ranking with reciprocal rank fusion, applies query-aware boosts, and reranks the top results. Symbol-like queries lean more heavily on BM25, while natural-language queries keep more semantic weight.
Benchmarks
The current full-corpus benchmark uses the Semble benchmark suite: 63 pinned
open-source repositories, 19 languages, and 1,251 annotated search tasks. In
this local run, SIFS reached NDCG@10=0.8444 with p50=0.0017ms
repeated-query latency after indexing.
SIFS is third on raw NDCG@10 in this comparison, behind CodeRankEmbed Hybrid and
Semble. It is 0.0100 NDCG@10 behind Semble and 0.0173 behind CodeRankEmbed
Hybrid, while reporting much lower warm-query latency and lower average index
time than the embedding-heavy baselines.
| Method | NDCG@10 | Index time | Query p50 |
|---|---|---|---|
| CodeRankEmbed Hybrid | 0.8617 | 57.3 s | 16.9 ms |
| semble | 0.8544 | 439.4 ms | 1.3 ms |
| SIFS | 0.8444 | 93.0 ms | 0.0017 ms |
| CodeRankEmbed | 0.7648 | 57.3 s | 13.3 ms |
| ColGREP | 0.6925 | 3.9 s | 979.3 ms |
| grepai | 0.5606 | 35.0 s | 47.7 ms |
| probe | 0.3872 | 0.0000 ms | 207.1 ms |
| ripgrep | 0.1257 | 0.0000 ms | 8.8 ms |

SIFS is strongest on symbol-heavy queries while still performing well on semantic and architecture questions.
| Query category | NDCG@10 |
|---|---|
| symbol | 0.9566 |
| semantic | 0.8262 |
| architecture | 0.8070 |

The benchmark artifacts live in benchmarks/results, and the full methodology, per-language breakdown, additional figures, and React large-repository smoke result are in docs/benchmark-report.md.
File coverage
By default, SIFS indexes code files and skips common generated, dependency, and
cache directories. It uses the ignore crate, so nested .gitignore files,
Git excludes, global Git ignores, and hidden files behave like familiar
developer search tools. Text-like documents such as Markdown, YAML, TOML, and
JSON are available through the library options.
The file walker currently recognizes Python, JavaScript, TypeScript, Go, Rust, Java, Kotlin, Ruby, PHP, C, C++, C#, Swift, Scala, Elixir, Dart, Lua, SQL, Bash, Zig, Haskell, Markdown, YAML, TOML, and JSON extensions.