llama-engine 0.1.0

Narrow-waist engine trait and core types for llama.rs
Documentation
  • Coverage
  • 76.92%
    20 out of 26 items documented0 out of 16 items with examples
  • Size
  • Source code size: 17.82 kB This is the summed size of all the files inside the crates.io package for this release.
  • Documentation size: 3.46 MB This is the summed size of all files generated by rustdoc for all configured targets
  • Ø build duration
  • this release: 19s Average build duration of successful builds.
  • all releases: 18s Average build duration of successful builds in releases after 2024-10-23.
  • Links
  • stevedores-org/llama.rs
    0 0 18
  • crates.io
  • Dependencies
  • Versions
  • Owners
  • community-stevedores-org

llama-engine

The "narrow waist" of the llama.rs stack. Defines the core [LlamaEngine] trait and associated types that all other crates depend on. Implementations can swap CPU/Metal/FFI backends without changing application code.

Design Notes

Interior Mutability

LlamaEngine methods take &self (not &mut self) to allow shared access across multiple sessions and to enable concurrent inference without requiring exclusive borrows or external synchronization at call sites. Backends using interior mutability (e.g., Mutex, Arc<RwLock>) are still responsible for performing any necessary internal synchronization to ensure thread-safe access to shared state.

Token Type

TokenId is aliased as i32 for FFI compatibility, though token IDs are logically non-negative. This will be reconsidered if a u32/usize conversion barrier emerges.