Skip to main content

Module inference

Module inference 

Source
Expand description

Native LLM inference module.

This module provides local model inference via mistral.rs when the inference feature is enabled.

§Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│  Inference Module                                                           │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  InferenceBackend (trait)                                                   │
│  ├── load(path, config)      Load GGUF model into memory                    │
│  ├── unload()                Unload model from memory                       │
│  ├── is_loaded()             Check if model is loaded                       │
│  ├── model_info()            Get metadata about loaded model                │
│  ├── infer(prompt, opts)     Generate response (non-streaming)              │
│  └── infer_stream(...)       Generate response (streaming)                  │
│                                                                             │
│  NativeRuntime (struct)                                                     │
│  └── Implements InferenceBackend using mistral.rs                           │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

§Example

use spn_native::inference::{NativeRuntime, InferenceBackend};
use spn_core::{LoadConfig, ChatOptions};
use std::path::PathBuf;

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let mut runtime = NativeRuntime::new();

    // Load a GGUF model
    let model_path = PathBuf::from("~/.spn/models/qwen3-8b-q4_k_m.gguf");
    runtime.load(model_path, LoadConfig::default()).await?;

    // Run inference
    let response = runtime.infer(
        "What is 2+2?",
        ChatOptions::default().with_temperature(0.7)
    ).await?;

    println!("{}", response.content);
    Ok(())
}

Structs§

ChatOptions
Options for chat completion.
ChatResponse
Response from a chat completion.
LoadConfig
Configuration for loading a model.
ModelInfo
Information about an installed model.
NativeRuntime
Native runtime for local LLM inference.

Traits§

DynInferenceBackend
Object-safe version of InferenceBackend for dynamic dispatch.
InferenceBackend
Trait for any inference backend (mistral.rs, llama.cpp, etc.).