ripvec_core/encoder/ripvec/mod.rs
1//! Ripvec retrieval pipeline ported into Rust.
2//!
3//! This subtree mirrors the Python reference implementation at
4//! `~/src/semble/src/semble/`. Each Rust module corresponds to one
5//! Python source file; the port preserves the ripvec pipeline shape
6//! (chunker → tokenizer → BM25 path-enrichment → static encoder →
7//! RRF hybrid → boosts → penalties → reranker) one-for-one.
8//!
9//! ## Module map
10//!
11//! | This module | Python source |
12//! |---|---|
13//! | [`tokens`] | `src/semble/tokens.py` (camelCase/snake_case splitter) |
14//! | [`chunking`] | `src/semble/chunking/{core,chunking}.py` (AST-merge) |
15//! | [`bm25`] | `src/semble/index/sparse.py` (path-enrichment + scoring) |
16//! | [`dense`] | `src/semble/index/dense.py` (StaticEncoder via model2vec-rs) |
17//! | [`ranking`] | `src/semble/ranking/{weighting,boosting}.py` (alpha + boosts) |
18//! | [`penalties`] | `src/semble/ranking/penalties.py` (path priors + rerank_topk) |
19//! | [`hybrid`] | `src/semble/search.py` (RRF + α-blend + boost + rerank) |
20//! | [`index`] | `src/semble/index/index.py` (RipvecIndex orchestrator) |
21//!
22//! ## Scope under `--model ripvec`
23//!
24//! When `--model ripvec` is active, the orchestrator in [`index`] drives
25//! the full pipeline: it builds a [`RipvecIndex`](index::RipvecIndex)
26//! using the chunker in [`chunking`] and the encoder in [`dense`], and
27//! dispatches search via [`hybrid::search_hybrid`]. Ripvec's existing
28//! BM25 in `crate::bm25` and hybrid in `crate::hybrid` are *not* used
29//! on this path.
30//!
31//! Per the `port+ripvec` scope decision in `docs/PLAN.md`, the final
32//! ranking step applies ripvec's
33//! [`boost_with_pagerank`](crate::hybrid::boost_with_pagerank) on top
34//! of the ripvec engine's rerank — making `--model ripvec` the ripvec engine's retrieval plus
35//! ripvec's structural prior.
36
37pub mod bm25;
38pub mod chunking;
39pub mod dense;
40pub mod hybrid;
41pub mod index;
42pub mod penalties;
43pub mod ranking;
44pub mod static_model;
45pub mod tokens;