Skip to main content

Crate llama_engine

Crate llama_engine 

Source
Expand description

§llama-engine

The “narrow waist” of the llama.rs stack. Defines the core LlamaEngine trait and associated types that all other crates depend on. Implementations can swap CPU/Metal/FFI backends without changing application code.

§Design Notes

§Interior Mutability

LlamaEngine methods take &self (not &mut self) to allow shared access across multiple sessions and to enable concurrent inference without requiring exclusive borrows or external synchronization at call sites. Backends using interior mutability (e.g., Mutex, Arc<RwLock>) are still responsible for performing any necessary internal synchronization to ensure thread-safe access to shared state.

§Token Type

TokenId is aliased as i32 for FFI compatibility, though token IDs are logically non-negative. This will be reconsidered if a u32/usize conversion barrier emerges.

Structs§

DecodeResult
Result of a single decode step.
ModelHandle
Opaque handle to a loaded model.
ModelSpec
Specification for loading a model.
PrefillResult
Result of the prefill phase (prompt processing).
Session
Represents an active inference session with its own KV cache state.

Enums§

LlamaError
Top-level error type for all engine operations.

Traits§

LlamaEngine
The core engine trait — everything else plugs into this.

Type Aliases§

Result
TokenId
Token ID type (i32 for FFI compat; logically non-negative).