Skip to main content

InferenceBackend

spn_native::inference

Trait InferenceBackend

pub trait InferenceBackend: Send + Sync {
    // Required methods
    fn load(
        &mut self,
        model_path: PathBuf,
        config: LoadConfig,
    ) -> impl Future<Output = Result<(), NativeError>> + Send;
    fn unload(&mut self) -> impl Future<Output = Result<(), NativeError>> + Send;
    fn is_loaded(&self) -> bool;
    fn model_info(&self) -> Option<&ModelInfo>;
    fn infer(
        &self,
        prompt: &str,
        options: ChatOptions,
    ) -> impl Future<Output = Result<ChatResponse, NativeError>> + Send;
    fn infer_stream(
        &self,
        prompt: &str,
        options: ChatOptions,
    ) -> impl Future<Output = Result<impl Stream<Item = Result<String, NativeError>> + Send, NativeError>> + Send;
}

Expand description

Trait for any inference backend (mistral.rs, llama.cpp, etc.).

This trait provides a unified interface for loading and running local LLM inference. Implementations can use different backends while presenting the same API to consumers.

Required Methods§

fn load( &mut self, model_path: PathBuf, config: LoadConfig, ) -> impl Future<Output = Result<(), NativeError>> + Send

Load a model from disk.

§Arguments

model_path - Path to the GGUF model file
config - Load configuration (context size, GPU layers, etc.)

§Returns

Ok(()) if the model was loaded successfully.

fn unload(&mut self) -> impl Future<Output = Result<(), NativeError>> + Send

Unload the model from memory.

Frees GPU/CPU memory used by the model.

fn is_loaded(&self) -> bool

Check if a model is currently loaded.

fn model_info(&self) -> Option<&ModelInfo>

Get metadata about the loaded model.

Returns None if no model is loaded.

fn infer( &self, prompt: &str, options: ChatOptions, ) -> impl Future<Output = Result<ChatResponse, NativeError>> + Send

Generate a response (non-streaming).

§Arguments

prompt - The input prompt
options - Generation options (temperature, max_tokens, etc.)

§Returns

The complete chat response.

fn infer_stream( &self, prompt: &str, options: ChatOptions, ) -> impl Future<Output = Result<impl Stream<Item = Result<String, NativeError>> + Send, NativeError>> + Send

Generate a response (streaming).

Returns a stream of token strings as they are generated.

§Arguments

prompt - The input prompt
options - Generation options (temperature, max_tokens, etc.)

Dyn Compatibility§

This trait is not dyn compatible.

In older versions of Rust, dyn compatibility was called "object safety", so this trait is not object safe.

Implementors§

impl InferenceBackend for NativeRuntime

Available on non-crate feature inference only.