Skip to main content

InferenceBackend

Trait InferenceBackend 

Source
pub trait InferenceBackend: Send + Sync {
    // Required methods
    fn load(
        &mut self,
        model_path: PathBuf,
        config: LoadConfig,
    ) -> impl Future<Output = Result<(), NativeError>> + Send;
    fn unload(&mut self) -> impl Future<Output = Result<(), NativeError>> + Send;
    fn is_loaded(&self) -> bool;
    fn model_info(&self) -> Option<&ModelInfo>;
    fn infer(
        &self,
        prompt: &str,
        options: ChatOptions,
    ) -> impl Future<Output = Result<ChatResponse, NativeError>> + Send;
    fn infer_stream(
        &self,
        prompt: &str,
        options: ChatOptions,
    ) -> impl Future<Output = Result<impl Stream<Item = Result<String, NativeError>> + Send, NativeError>> + Send;
}
Expand description

Trait for any inference backend (mistral.rs, llama.cpp, etc.).

This trait provides a unified interface for loading and running local LLM inference. Implementations can use different backends while presenting the same API to consumers.

Required Methods§

Source

fn load( &mut self, model_path: PathBuf, config: LoadConfig, ) -> impl Future<Output = Result<(), NativeError>> + Send

Load a model from disk.

§Arguments
  • model_path - Path to the GGUF model file
  • config - Load configuration (context size, GPU layers, etc.)
§Returns

Ok(()) if the model was loaded successfully.

Source

fn unload(&mut self) -> impl Future<Output = Result<(), NativeError>> + Send

Unload the model from memory.

Frees GPU/CPU memory used by the model.

Source

fn is_loaded(&self) -> bool

Check if a model is currently loaded.

Source

fn model_info(&self) -> Option<&ModelInfo>

Get metadata about the loaded model.

Returns None if no model is loaded.

Source

fn infer( &self, prompt: &str, options: ChatOptions, ) -> impl Future<Output = Result<ChatResponse, NativeError>> + Send

Generate a response (non-streaming).

§Arguments
  • prompt - The input prompt
  • options - Generation options (temperature, max_tokens, etc.)
§Returns

The complete chat response.

Source

fn infer_stream( &self, prompt: &str, options: ChatOptions, ) -> impl Future<Output = Result<impl Stream<Item = Result<String, NativeError>> + Send, NativeError>> + Send

Generate a response (streaming).

Returns a stream of token strings as they are generated.

§Arguments
  • prompt - The input prompt
  • options - Generation options (temperature, max_tokens, etc.)

Dyn Compatibility§

This trait is not dyn compatible.

In older versions of Rust, dyn compatibility was called "object safety", so this trait is not object safe.

Implementors§

Source§

impl InferenceBackend for NativeRuntime

Available on non-crate feature inference only.