spn-native
Native model inference and storage for the SuperNovae ecosystem.
Overview
spn-native provides local GGUF model inference using mistral.rs as the backend. This enables running large language models locally without requiring external services like Ollama.
Features
- Local GGUF Inference: Run quantized models (Q4, Q5, Q8) directly on your hardware
- Hugging Face Integration: Download models from the Hugging Face Hub
- Streaming Support: Stream responses token-by-token
- Cross-Platform: Works on macOS, Linux, and Windows
- Feature-Gated: Compile only what you need
Installation
[]
= "0.1"
# With inference support (requires Rust 1.85+)
= { = "0.1", = ["inference"] }
Usage
Basic Inference
use ;
// Create runtime
let mut runtime = new;
// Load a GGUF model
let config = default;
runtime.load.await?;
// Run inference
let response = runtime.infer.await?;
println!;
Streaming
use NativeRuntime;
let runtime = new;
runtime.load.await?;
let stream = runtime.infer_stream.await?;
while let Some = stream.next.await
Hugging Face Download
use HuggingFaceStorage;
let storage = new?;
let path = storage.download.await?;
Feature Flags
| Feature | Description | Default |
|---|---|---|
inference |
Enable GGUF inference via mistral.rs | No |
huggingface |
Enable HF Hub downloads | No |
Requirements
- Rust 1.85+ (when
inferencefeature enabled) - Metal (macOS) or CUDA (Linux/Windows) for GPU acceleration
Model Compatibility
Supports all GGUF models compatible with mistral.rs:
- Qwen 3
- Llama 3.x
- Mistral / Mixtral
- Phi-3
- And more...
License
AGPL-3.0-or-later