ripvec-core 3.1.2

Semantic code + document search engine. Cacheless static-embedding + cross-encoder rerank by default; optional ModernBERT/BGE transformer engines with GPU backends. Tree-sitter chunking, hybrid BM25 + PageRank, composable ranking layers.
Documentation
//! Encoder abstraction for the ripvec static-table engine.
//!
//! [`VectorEncoder`] exposes the surviving ripvec engine behind one interface,
//! so downstream search code (CLI dispatch,
//! [`HybridIndex`](crate::hybrid::HybridIndex), cache layer) does not branch
//! on encoder internals.
//!
//! ## Implementation
//!
//! - [`StaticEncoder`](crate::encoder::ripvec::dense::StaticEncoder) —
//!   static embedding-table lookup via the in-process Model2Vec engine.
//!   Used for `--model ripvec`. CPU-only; no batching or ring buffer
//!   (table-lookup encoder is memory-bound, not compute-bound).
//!
//! ## Design rationale
//!
//! `VectorEncoder` abstracts at the repo→(chunks, embeddings) boundary,
//! where the concrete pipeline shape does not leak through. Callers receive
//! a `(Vec<CodeChunk>, Vec<Vec<f32>>)` pair regardless of how the encoder
//! implements walk, chunk, and embed internally.
//!
//! @Parnas (1972) — the module hides which engine is active; the trait is
//! the stable interface boundary. @Postel (1980) — callers use the same
//! `VectorEncoder` surface; no change at the call site after the transformer
//! path was removed.
//!
//! See `docs/PLAN.md` cluster B6 for the surgery context.

use std::path::Path;

use crate::chunk::CodeChunk;
use crate::embed::SearchConfig;
use crate::profile::Profiler;

pub mod ripvec;

/// Trait that abstracts text/chunks → embedding vectors.
///
/// The implementation owns its full pipeline (walk, chunk, encode).
///
/// # Object safety
///
/// `dyn VectorEncoder` is constructible. Methods take `&self` and use only
/// concrete return types — no associated types or generic methods.
///
/// # Thread safety
///
/// `Send + Sync` is required because the encoder is shared across the
/// indexing pipeline's rayon and channel-based workers.
pub trait VectorEncoder: Send + Sync {
    /// Walk `root`, chunk every supported file, and embed every chunk.
    ///
    /// Returns the chunks and their embeddings in parallel order: chunk `i`
    /// has embedding `embeddings[i]`. The ripvec engine uses an AST-merge
    /// chunker and projects chunks onto [`CodeChunk`].
    ///
    /// `cfg` carries pipeline tuning (walk filters, etc.).
    ///
    /// # Errors
    ///
    /// Returns an error if file walking, chunking, or inference fails.
    fn embed_root(
        &self,
        root: &Path,
        cfg: &SearchConfig,
        profiler: &Profiler,
    ) -> crate::Result<(Vec<CodeChunk>, Vec<Vec<f32>>)>;

    /// Hidden dimension of the emitted embeddings.
    ///
    /// Used by [`SearchIndex`](crate::index::SearchIndex) for the embedding
    /// matrix shape and by the cache layer to refuse dimension-mismatched
    /// loads.
    fn hidden_dim(&self) -> usize;

    /// Stable identifier used as the cache-manifest key.
    ///
    /// For the ripvec engine, the Model2Vec repo string (e.g.
    /// `"minishlab/potion-code-16M"`). Consulted for logging and diagnostics.
    fn identity(&self) -> &str;
}

#[cfg(test)]
mod tests {
    use super::*;

    /// Verify that `VectorEncoder` is object-safe by constructing a trait
    /// object type. Compilation is the test.
    #[test]
    fn trait_is_object_safe() {
        fn assert_object_safe(_: &dyn VectorEncoder) {}
        // Constructing the function item is the load-bearing check;
        // referencing it keeps the type-check live across dead-code analysis.
        let _ = assert_object_safe;
    }

    /// Verify that `Box<dyn VectorEncoder>` is `Send` + `Sync`.
    #[test]
    fn trait_object_is_send_and_sync() {
        fn assert_send_sync<T: Send + Sync>() {}
        assert_send_sync::<Box<dyn VectorEncoder>>();
    }

    /// Verify that `&dyn VectorEncoder` is `Send` (parallel pipelines).
    #[test]
    fn shared_reference_is_send() {
        fn assert_send<T: Send>() {}
        assert_send::<&dyn VectorEncoder>();
    }
}