slm_inference
Backend-agnostic trait layer for running Small Language Model (SLM) inference in Rust.
Idea
This crate defines a set of composable traits that abstract over the full inference pipeline — from loading a GGUF model file to producing text — without being tied to any specific backend (llama.cpp, bitnet, etc.).
SlmModelConfig → load_gguf() → SlmModel
↓
SlmContextBuilder → SlmContext
↓
SlmInference::inference()
SlmModelConfig— knows how to load a GGUF file and produce aSlmModel.SlmModel— owns the loaded weights and creates aSlmContextBuilder.SlmContextBuilder— configures sampling (temperature, top-k, top-p) and builds aSlmContext.SlmContext— the stateful session: tokenizes input, runs batched decode, and samples tokens.SlmBatch/SlmToken— low-level primitives for feeding tokens to the context.SlmInference— a blanketinference(prompt, max_tokens) -> Stringimpl provided automatically for anySlmContext.HfModelInfo— thin helper that downloads (or returns a cached) GGUF file from Hugging Face Hub.
Concrete backends (e.g. slm_llama, slm_bitnet) implement these traits against their own FFI layers.