sifs 0.3.3

SIFS Is Fast Search: instant local code search for agents
Documentation
# Rust library usage

The `sifs` crate exposes the indexing and search engine used by the CLI and MCP
server. Use the library when you want structured results, long-lived indexes,
custom filters, or direct integration inside a Rust application.

## Public API

The crate re-exports the main index and result types from `src/lib.rs`. These
types are the stable surface to use from downstream Rust code.

```rust
use sifs::{Chunk, EncoderSpec, IndexStats, SearchMode, SearchOptions, SearchResult, SifsIndex};
```

The core types are:

- `SifsIndex`: Owns chunks, BM25 data, lazy semantic state, and lookup maps.
- `Chunk`: Stores content, file path, line range, and optional language.
- `SearchMode`: Selects `Hybrid`, `Semantic`, or `Bm25` ranking.
- `SearchResult`: Returns a `Chunk`, score, and source mode.
- `IndexStats`: Reports indexed file count, chunk count, and language counts.

## Index a local path

Use `SifsIndex::from_path` for the default semantic-capable local indexing
behavior. It walks supported code files, chunks them, and builds a sparse BM25
index. It does not load the embedding model until semantic, hybrid, or
related-code search needs the dense index.

```rust
use sifs::{SearchMode, SearchOptions, SifsIndex};

fn main() -> anyhow::Result<()> {
    let index = SifsIndex::from_path("/path/to/project")?;
    let results = index.search_with(
        "where is authentication handled",
        &SearchOptions::new(5).with_mode(SearchMode::Hybrid),
    )?;

    for result in results {
        println!("{} {}", result.chunk.location(), result.score);
    }

    Ok(())
}
```

`from_path` returns an error when the path doesn't exist, isn't a directory, or
contains no supported non-empty files. Model-loading errors are returned later
from semantic or hybrid search.

Use `SifsIndex::from_path_sparse` when you want an explicitly sparse-only index
that can never initialize semantic state. BM25 search works normally; semantic,
hybrid, and related-code search return an error telling callers to build a
hybrid index or use `SearchMode::Bm25`.

```rust
use sifs::{SearchMode, SearchOptions, SifsIndex};

let index = SifsIndex::from_path_sparse("/path/to/project")?;
let results = index.search_with(
    "SessionToken",
    &SearchOptions::new(10).with_mode(SearchMode::Bm25),
)?;
```

Use `SifsIndex::from_path_hybrid` when you want the default lazy semantic
capability with explicit model policy.

```rust
use sifs::{ModelLoadPolicy, ModelOptions, SifsIndex};

let index = SifsIndex::from_path_hybrid(
    "/path/to/project",
    ModelOptions::new(None, ModelLoadPolicy::NoDownload),
)?;
```

## Customize indexing

Use `SifsIndex::from_path_with_options` when you need a custom model path,
extension set, ignored directory names, or document file inclusion. Use
`SifsIndex::from_path_with_model_options` when you also need explicit model
download policy. The extension set must use leading-dot values such as `.rs` or
`.ts`.

```rust
use sifs::SifsIndex;
use std::collections::HashSet;

fn main() -> anyhow::Result<()> {
    let extensions = HashSet::from([".rs".to_owned(), ".toml".to_owned()]);
    let ignore = HashSet::from(["fixtures".to_owned()]);

    let index = SifsIndex::from_path_with_options(
        "/path/to/project",
        None,
        Some(extensions),
        Some(ignore),
        true,
    )?;

    println!("{:?}", index.stats());
    Ok(())
}
```

```rust
use sifs::{ModelLoadPolicy, ModelOptions, SifsIndex};

let index = SifsIndex::from_path_with_model_options(
    "/path/to/project",
    ModelOptions::new(None, ModelLoadPolicy::NoDownload),
    None,
    None,
    false,
)?;
```

Use `SifsIndex::from_path_with_encoder_spec` for non-Model2Vec encoders such
as the built-in hashing encoder.

```rust
use sifs::{EncoderSpec, SifsIndex};

let index = SifsIndex::from_path_with_encoder_spec(
    "/path/to/project",
    EncoderSpec::hashing(),
    None,
    None,
    false,
)?;
```

The `include_text_files` flag controls whether default document-like extensions
such as Markdown, YAML, TOML, and JSON are included when you don't pass an
explicit extension set.

## Index a Git repository

Use `SifsIndex::from_git` to clone and index a remote repository. SIFS performs
a shallow clone into a temporary directory and can check out a branch or tag.

```rust
use sifs::SifsIndex;

fn main() -> anyhow::Result<()> {
    let index = SifsIndex::from_git("https://github.com/owner/project", Some("main"))?;
    println!("{:?}", index.stats());
    Ok(())
}
```

The Git command must be available on `PATH`. Clone failures return an error
that includes the Git stderr output.

## Build from existing chunks

Use `SifsIndex::from_chunks` when your application owns file discovery or
chunking. You must provide a loaded encoder and a non-empty list of chunks.

```rust
use sifs::{Chunk, SifsIndex};
use sifs::model2vec::load_model;

fn main() -> anyhow::Result<()> {
    let model = load_model(None)?;
    let chunks = vec![Chunk {
        content: "fn authenticate() {}".to_owned(),
        file_path: "src/auth.rs".to_owned(),
        start_line: 1,
        end_line: 1,
        language: Some("rust".to_owned()),
    }];

    let index = SifsIndex::from_chunks(model, chunks)?;
    println!("{:?}", index.stats());
    Ok(())
}
```

`from_chunks` preserves compatibility for callers that already have an encoder:
it builds BM25 data and preloads semantic state. Use `from_chunks_sparse` for a
sparse-only chunk index, `from_chunks_hybrid` for a lazy Model2Vec-backed
semantic-capable index, or `from_chunks_with_encoder_spec` for hashing.

## Search an index

Use `SifsIndex::search_with` for all ranking modes. It returns
`Result<Vec<SearchResult>>` because semantic and hybrid search may need to load
or download a model. BM25 mode does not touch the model path. `SearchOptions`
keeps ranking, result count, hybrid alpha, and filters self-describing. The
`alpha` field is only used by hybrid search. When `alpha` is `None`, SIFS
selects a weight from the query shape.

```rust
let results = index.search_with(
    "parse oauth callback",
    &SearchOptions::new(10).with_mode(SearchMode::Hybrid),
)?;
```

Use language or path filters to search a subset of the index. Filters are exact
matches against chunk language strings and repository-relative file paths.

```rust
let results = index.search_with(
    "session expiry",
    &SearchOptions::new(5)
        .with_mode(SearchMode::Hybrid)
        .with_alpha(0.5)
        .with_languages(["rust".to_owned()])
        .with_paths(["src/auth.rs".to_owned()]),
)?;
```

If both filters are present, SIFS searches chunks that match both filter sets.
If no filter matches any chunk, SIFS falls back to searching the full index.

Disable the in-process query-result cache when measuring uncached warm query
latency or when a caller wants every request to execute ranking work.

```rust
let results = index.search_with(
    "parse oauth callback",
    &SearchOptions::new(10)
        .with_mode(SearchMode::Hybrid)
        .with_cache(false),
)?;
```

## Find related chunks

Use `find_related` when you already have a `Chunk` and want nearby concepts or
similar implementations. The method performs semantic search with a same
language filter when the source chunk has language metadata.

```rust
let source = &index.chunks[0];
let related = index.find_related(source, 5)?;
```

The source chunk itself is removed from the returned results.

## Get index statistics

Use `stats` to inspect index size and language coverage. This is useful for
debugging file selection and benchmark output.

```rust
let stats = index.stats();
println!("{} files, {} chunks", stats.indexed_files, stats.total_chunks);
```

The `languages` map stores chunk counts by language, not file counts.