1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
//! Ripvec retrieval pipeline ported into Rust.
//!
//! This subtree mirrors the Python reference implementation at
//! `~/src/semble/src/semble/`. Each Rust module corresponds to one
//! Python source file; the port preserves the ripvec pipeline shape
//! (chunker → tokenizer → BM25 path-enrichment → static encoder →
//! RRF hybrid → boosts → penalties → reranker) one-for-one.
//!
//! ## Module map
//!
//! | This module | Python source |
//! |---|---|
//! | [`tokens`] | `src/semble/tokens.py` (camelCase/snake_case splitter) |
//! | [`chunking`] | `src/semble/chunking/{core,chunking}.py` (AST-merge) |
//! | [`bm25`] | `src/semble/index/sparse.py` (path-enrichment + scoring) |
//! | [`dense`] | `src/semble/index/dense.py` (StaticEncoder via model2vec-rs) |
//! | [`ranking`] | `src/semble/ranking/{weighting,boosting}.py` (alpha + boosts) |
//! | [`penalties`] | `src/semble/ranking/penalties.py` (path priors + rerank_topk) |
//! | [`hybrid`] | `src/semble/search.py` (RRF + α-blend + boost + rerank) |
//! | [`index`] | `src/semble/index/index.py` (RipvecIndex orchestrator) |
//!
//! ## Scope under `--model ripvec`
//!
//! When `--model ripvec` is active, the orchestrator in [`index`] drives
//! the full pipeline: it builds a [`RipvecIndex`](index::RipvecIndex)
//! using the chunker in [`chunking`] and the encoder in [`dense`], and
//! dispatches search via [`hybrid::search_hybrid`]. Ripvec's existing
//! BM25 in `crate::bm25` and hybrid in `crate::hybrid` are *not* used
//! on this path.
//!
//! Per the `port+ripvec` scope decision in `docs/PLAN.md`, the final
//! ranking step applies ripvec's
//! [`boost_with_pagerank`](crate::hybrid::boost_with_pagerank) on top
//! of the ripvec engine's rerank — making `--model ripvec` the ripvec engine's retrieval plus
//! ripvec's structural prior.