slm_inference

Backend-agnostic trait layer for running Small Language Model (SLM) inference in Rust.

Idea

This crate defines a set of composable traits that abstract over the full inference pipeline — from loading a GGUF model file to producing text — without being tied to any specific backend (llama.cpp, bitnet, etc.).

SlmModelConfig  →  load_gguf()  →  SlmModel
                                        ↓
                               SlmContextBuilder  →  SlmContext
                                                          ↓
                                                    SlmInference::inference()

SlmModelConfig — knows how to load a GGUF file and produce a SlmModel.
SlmModel — owns the loaded weights and creates a SlmContextBuilder.
SlmContextBuilder — configures sampling (temperature, top-k, top-p) and builds a SlmContext.
SlmContext — the stateful session: tokenizes input, runs batched decode, and samples tokens.
SlmBatch / SlmToken — low-level primitives for feeding tokens to the context.
SlmInference — a blanket inference(prompt, max_tokens) -> String impl provided automatically for any SlmContext.
HfModelInfo — thin helper that downloads (or returns a cached) GGUF file from Hugging Face Hub.

Concrete backends (e.g. slm_llama, slm_bitnet) implement these traits against their own FFI layers.

slm_inference 0.1.0

slm_inference

Idea