Skip to main content

LlamaEngine

Trait LlamaEngine 

Source
pub trait LlamaEngine: Send + Sync {
    // Required methods
    fn load_model(&self, spec: &ModelSpec) -> Result<ModelHandle>;
    fn tokenize(&self, text: &str) -> Result<Vec<TokenId>>;
    fn detokenize(&self, tokens: &[TokenId]) -> Result<String>;
    fn prefill(
        &self,
        session: &mut Session,
        tokens: &[TokenId],
    ) -> Result<PrefillResult>;
    fn decode(&self, session: &mut Session) -> Result<DecodeResult>;
    fn embed(&self, texts: &[&str]) -> Result<Vec<Vec<f32>>>;
}
Expand description

The core engine trait — everything else plugs into this.

Implementations provide inference, tokenization, and embedding functionality. oxidizedRAG and oxidizedgraph depend on engine behavior, not implementation details. Swap CPU/Metal/FFI backends without changing application code.

Required Methods§

Source

fn load_model(&self, spec: &ModelSpec) -> Result<ModelHandle>

Load a model from disk given a specification.

Source

fn tokenize(&self, text: &str) -> Result<Vec<TokenId>>

Convert text into a sequence of token IDs.

Source

fn detokenize(&self, tokens: &[TokenId]) -> Result<String>

Convert token IDs back into text.

Source

fn prefill( &self, session: &mut Session, tokens: &[TokenId], ) -> Result<PrefillResult>

Run the prefill phase: process prompt tokens and populate the KV cache.

Source

fn decode(&self, session: &mut Session) -> Result<DecodeResult>

Run the decode phase: produce the next token from the model.

Source

fn embed(&self, texts: &[&str]) -> Result<Vec<Vec<f32>>>

Generate embeddings for a batch of texts (for oxidizedRAG integration).

Implementors§