Skip to main content

StaticEncoder

Struct StaticEncoder 

Source
pub struct StaticEncoder { /* private fields */ }
Expand description

CPU-only static encoder.

Owns a loaded StaticEmbedModel plus identity metadata. The embedder is constructed by main.rs::load_pipeline via StaticEncoder::from_pretrained, passing either a local path containing the Model2Vec files or (planned) an HF repo ID.

Implementations§

Source§

impl StaticEncoder

Source

pub fn encode_query(&self, query: &str) -> Vec<f32>

Encode a query string into a single embedding row.

Used by RipvecIndex::search for hybrid/semantic dispatch.

Source

pub fn from_pretrained(model_repo: &str) -> Result<Self>

Load a model by HuggingFace repo ID or local path.

Two acceptance shapes:

  1. Local path — if model_repo names an existing directory, load directly from it. Used by the parity test fixture path (/tmp/potion-base-32M) and any user pre-staging files.
  2. HuggingFace repo ID — otherwise treat as org/repo, download config.json / tokenizer.json / model.safetensors via hf-hub into ~/.cache/huggingface/hub/, and load from there. Matches load_classic_cpu / load_modernbert_cpu’s behaviour so the user-facing API is consistent: bare --model ripvec with no --model-repo flag works.
§Errors

Propagates the underlying I/O, download, or parse error if the files cannot be obtained or the safetensors layout is unrecognized.

Trait Implementations§

Source§

impl VectorEncoder for StaticEncoder

Source§

fn embed_root( &self, root: &Path, cfg: &SearchConfig, profiler: &Profiler, ) -> Result<(Vec<CodeChunk>, Vec<Vec<f32>>)>

Three-stage bounded-queue pipeline:

  1. Chunk producer — rayon par_iter over the file list. Each file is read, parsed by tree-sitter (or line-merged on fallback), and emitted as (CodeChunk, String) pairs into a bounded channel of capacity PIPELINE_BATCH_SIZE * 8.
  2. Batch accumulator — a single scoped thread drains the chunk channel, packs PIPELINE_BATCH_SIZE pairs per batch, and forwards into a bounded channel of capacity PIPELINE_RING_SIZE.
  3. Encode worker — a single scoped thread receives batches and calls StaticEmbedModel::encode_batch, whose internal par_iter lights up rayon for the pool_ids kernel.

Why this shape:

  • The previous “chunk all, then embed all” implementation held the entire Vec<String> of chunk contents in memory between phases. On the linux corpus that was ~400 MB peak. The bounded queues cap in-flight memory at PIPELINE_BATCH_SIZE * 8 + PIPELINE_RING_SIZE * PIPELINE_BATCH_SIZE chunks regardless of corpus size — under 15 MB.
  • The chunk phase (13s on linux) is hidden inside the embed phase (70s) instead of serializing before it. Pre-pipeline profile showed user-time at 394s on 82s wall = 4.8x parallelism on 12 cores; pipeline lets idle cores chew on chunking while embed runs.
  • Mirrors embed::embed_all_streaming’s shape so the two pipelines (BERT + semble) share architectural conventions.
Source§

fn hidden_dim(&self) -> usize

Hidden dimension of the emitted embeddings. Read more
Source§

fn identity(&self) -> &str

Stable identifier used as the cache-manifest key. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> ArchivePointee for T

Source§

type ArchivedMetadata = ()

The archived version of the pointer metadata for this type.
Source§

fn pointer_metadata( _: &<T as ArchivePointee>::ArchivedMetadata, ) -> <T as Pointee>::Metadata

Converts some archived metadata to the pointer metadata for itself.
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> Downcast for T
where T: Any,

Source§

fn into_any(self: Box<T>) -> Box<dyn Any>

Converts Box<dyn Trait> (where Trait: Downcast) to Box<dyn Any>, which can then be downcast into Box<dyn ConcreteType> where ConcreteType implements Trait.
Source§

fn into_any_rc(self: Rc<T>) -> Rc<dyn Any>

Converts Rc<Trait> (where Trait: Downcast) to Rc<Any>, which can then be further downcast into Rc<ConcreteType> where ConcreteType implements Trait.
Source§

fn as_any(&self) -> &(dyn Any + 'static)

Converts &Trait (where Trait: Downcast) to &Any. This is needed since Rust cannot generate &Any’s vtable from &Trait’s.
Source§

fn as_any_mut(&mut self) -> &mut (dyn Any + 'static)

Converts &mut Trait (where Trait: Downcast) to &Any. This is needed since Rust cannot generate &mut Any’s vtable from &mut Trait’s.
Source§

impl<T> DowncastSend for T
where T: Any + Send,

Source§

fn into_any_send(self: Box<T>) -> Box<dyn Any + Send>

Converts Box<Trait> (where Trait: DowncastSend) to Box<dyn Any + Send>, which can then be downcast into Box<ConcreteType> where ConcreteType implements Trait.
Source§

impl<T> DowncastSync for T
where T: Any + Send + Sync,

Source§

fn into_any_sync(self: Box<T>) -> Box<dyn Any + Sync + Send>

Converts Box<Trait> (where Trait: DowncastSync) to Box<dyn Any + Send + Sync>, which can then be downcast into Box<ConcreteType> where ConcreteType implements Trait.
Source§

fn into_any_arc(self: Arc<T>) -> Arc<dyn Any + Sync + Send>

Converts Arc<Trait> (where Trait: DowncastSync) to Arc<Any>, which can then be downcast into Arc<ConcreteType> where ConcreteType implements Trait.
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> LayoutRaw for T

Source§

fn layout_raw(_: <T as Pointee>::Metadata) -> Result<Layout, LayoutError>

Returns the layout of the type.
Source§

impl<T, N1, N2> Niching<NichedOption<T, N1>> for N2
where T: SharedNiching<N1, N2>, N1: Niching<T>, N2: Niching<T>,

Source§

unsafe fn is_niched(niched: *const NichedOption<T, N1>) -> bool

Returns whether the given value has been niched. Read more
Source§

fn resolve_niched(out: Place<NichedOption<T, N1>>)

Writes data to out indicating that a T is niched.
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> Pointee for T

Source§

type Metadata = ()

The metadata type for pointers and references to this type.
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

impl<T> Fruit for T
where T: Send + Downcast,