pub trait LlamaEngine: Send + Sync {
// Required methods
fn load_model(&self, spec: &ModelSpec) -> Result<ModelHandle>;
fn tokenize(&self, text: &str) -> Result<Vec<TokenId>>;
fn detokenize(&self, tokens: &[TokenId]) -> Result<String>;
fn prefill(
&self,
session: &mut Session,
tokens: &[TokenId],
) -> Result<PrefillResult>;
fn decode(&self, session: &mut Session) -> Result<DecodeResult>;
fn embed(&self, texts: &[&str]) -> Result<Vec<Vec<f32>>>;
}Expand description
The core engine trait — everything else plugs into this.
Implementations provide inference, tokenization, and embedding functionality. oxidizedRAG and oxidizedgraph depend on engine behavior, not implementation details. Swap CPU/Metal/FFI backends without changing application code.
Required Methods§
Sourcefn load_model(&self, spec: &ModelSpec) -> Result<ModelHandle>
fn load_model(&self, spec: &ModelSpec) -> Result<ModelHandle>
Load a model from disk given a specification.
Sourcefn tokenize(&self, text: &str) -> Result<Vec<TokenId>>
fn tokenize(&self, text: &str) -> Result<Vec<TokenId>>
Convert text into a sequence of token IDs.
Sourcefn detokenize(&self, tokens: &[TokenId]) -> Result<String>
fn detokenize(&self, tokens: &[TokenId]) -> Result<String>
Convert token IDs back into text.
Sourcefn prefill(
&self,
session: &mut Session,
tokens: &[TokenId],
) -> Result<PrefillResult>
fn prefill( &self, session: &mut Session, tokens: &[TokenId], ) -> Result<PrefillResult>
Run the prefill phase: process prompt tokens and populate the KV cache.
Sourcefn decode(&self, session: &mut Session) -> Result<DecodeResult>
fn decode(&self, session: &mut Session) -> Result<DecodeResult>
Run the decode phase: produce the next token from the model.