JinaCodeEmbedder

Struct JinaCodeEmbedder 

Source
pub struct JinaCodeEmbedder { /* private fields */ }
Expand description

Jina Code Embeddings 1.5B specialized embedder.

This embedder is optimized for the jina-code-embeddings-1.5b model with:

  • Task-specific instruction prefixes (NL2Code, Code2Code, etc.)
  • Last-token pooling (required for this decoder-based model)
  • Matryoshka dimension truncation (128, 256, 512, 1024, 1536)
  • Automatic handling of the 32768 context window

§Asymmetric vs Symmetric Embedding Mode

For optimal retrieval quality, use different modes for indexing vs querying:

  • Passage mode (default): Use for indexing code/documents - adds passage prefix
  • Query mode: Use for search queries - adds query prefix

§Example

use aurora_semantic::{JinaCodeEmbedder, EmbeddingTask, MatryoshkaDimension, EmbeddingMode};

// For INDEXING: use Passage mode (default)
let indexer = JinaCodeEmbedder::from_directory("./models/jina-code-1.5b")?
    .with_task(EmbeddingTask::NL2Code)
    .with_mode(EmbeddingMode::Passage);  // Default, can omit

// For SEARCHING: use Query mode
let searcher = JinaCodeEmbedder::from_directory("./models/jina-code-1.5b")?
    .with_task(EmbeddingTask::NL2Code)
    .with_mode(EmbeddingMode::Query);

Implementations§

Source§

impl JinaCodeEmbedder

Source

pub const DEFAULT_MAX_LENGTH: usize = 32_768usize

Default max sequence length for Jina Code 1.5B.

Source

pub const DEFAULT_DIMENSION: usize = 1_536usize

Default dimension for Jina Code 1.5B.

Source

pub fn from_directory<P: AsRef<Path>>(model_dir: P) -> Result<Self>

Load Jina Code Embeddings 1.5B from a model directory.

The directory should contain:

  • model.onnx - The ONNX model file
  • tokenizer.json - The HuggingFace tokenizer
Source

pub fn from_onnx_embedder(inner: OnnxEmbedder) -> Self

Create from an existing OnnxEmbedder.

Source

pub fn with_task(self, task: EmbeddingTask) -> Self

Set the embedding task (determines instruction prefix).

Source

pub fn with_dimension(self, dimension: MatryoshkaDimension) -> Self

Set the output dimension (Matryoshka truncation).

Smaller dimensions reduce storage and speed up similarity search with minimal quality loss.

Source

pub fn with_max_length(self, max_length: usize) -> Self

Set the maximum sequence length.

Source

pub fn with_mode(self, mode: EmbeddingMode) -> Self

Set the embedding mode (Query or Passage).

  • Passage (default): Use for indexing code - adds passage prefix
  • Query: Use for search queries - adds query prefix
Source

pub fn mode(&self) -> EmbeddingMode

Get the current embedding mode.

Source

pub fn task(&self) -> EmbeddingTask

Get the current task.

Source

pub fn output_dimension(&self) -> MatryoshkaDimension

Get the output dimension.

Source

pub fn execution_provider(&self) -> &ExecutionProviderInfo

Get information about the execution provider (CPU/GPU).

Source

pub fn is_gpu_accelerated(&self) -> bool

Check if GPU acceleration is being used.

Source

pub fn embed_query(&self, text: &str) -> Result<Vec<f32>>

Embed a search query with the query instruction prefix.

Use this for user queries when searching the index.

§Example
let query_embedding = embedder.embed_query("function to parse JSON")?;
Source

pub fn embed_passage(&self, text: &str) -> Result<Vec<f32>>

Embed a code snippet/passage with the passage instruction prefix.

Use this when indexing code to build the search index.

§Example
let code_embedding = embedder.embed_passage("fn parse_json(s: &str) -> Value { ... }")?;
Source

pub fn embed_queries(&self, texts: &[&str]) -> Result<Vec<Vec<f32>>>

Embed multiple queries in batch.

Source

pub fn embed_passages(&self, texts: &[&str]) -> Result<Vec<Vec<f32>>>

Embed multiple code passages in batch.

Use this for efficient bulk indexing.

Trait Implementations§

Source§

impl Embedder for JinaCodeEmbedder

Source§

fn embed_for_query(&self, text: &str) -> Result<Vec<f32>>

Override for asymmetric retrieval - always use query prefix for search queries.

Source§

fn embed(&self, text: &str) -> Result<Vec<f32>>

Generate an embedding for a single text.
Source§

fn embed_batch(&self, texts: &[&str]) -> Result<Vec<Vec<f32>>>

Generate embeddings for multiple texts in batch. Read more
Source§

fn dimension(&self) -> usize

Get the embedding dimension.
Source§

fn name(&self) -> &'static str

Get the name of this embedder.
Source§

fn max_sequence_length(&self) -> usize

Get the maximum sequence length supported.

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> Downcast for T
where T: Any,

Source§

fn into_any(self: Box<T>) -> Box<dyn Any>

Convert Box<dyn Trait> (where Trait: Downcast) to Box<dyn Any>. Box<dyn Any> can then be further downcast into Box<ConcreteType> where ConcreteType implements Trait.
Source§

fn into_any_rc(self: Rc<T>) -> Rc<dyn Any>

Convert Rc<Trait> (where Trait: Downcast) to Rc<Any>. Rc<Any> can then be further downcast into Rc<ConcreteType> where ConcreteType implements Trait.
Source§

fn as_any(&self) -> &(dyn Any + 'static)

Convert &Trait (where Trait: Downcast) to &Any. This is needed since Rust cannot generate &Any’s vtable from &Trait’s.
Source§

fn as_any_mut(&mut self) -> &mut (dyn Any + 'static)

Convert &mut Trait (where Trait: Downcast) to &Any. This is needed since Rust cannot generate &mut Any’s vtable from &mut Trait’s.
Source§

impl<T> DowncastSync for T
where T: Any + Send + Sync,

Source§

fn into_any_arc(self: Arc<T>) -> Arc<dyn Any + Sync + Send>

Convert Arc<Trait> (where Trait: Downcast) to Arc<Any>. Arc<Any> can then be further downcast into Arc<ConcreteType> where ConcreteType implements Trait.
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

impl<T> Fruit for T
where T: Send + Downcast,