llama-engine

The "narrow waist" of the llama.rs stack. Defines the core [LlamaEngine] trait and associated types that all other crates depend on. Implementations can swap CPU/Metal/FFI backends without changing application code.

Design Notes

Interior Mutability

LlamaEngine methods take &self (not &mut self) to allow shared access across multiple sessions and to enable concurrent inference without requiring exclusive borrows or external synchronization at call sites. Backends using interior mutability (e.g., Mutex, Arc<RwLock>) are still responsible for performing any necessary internal synchronization to ensure thread-safe access to shared state.

Token Type

TokenId is aliased as i32 for FFI compatibility, though token IDs are logically non-negative. This will be reconsidered if a u32/usize conversion barrier emerges.

llama-engine 0.1.0

llama-engine

Design Notes

Interior Mutability

Token Type