Expand description
Native model inference and storage for the SuperNovae ecosystem.
This crate provides:
HuggingFaceStorage: Download models from HuggingFace Hubdetect_available_ram_gb: Platform-specific RAM detectiondefault_model_dir: Default storage location (~/.spn/models)inference::NativeRuntime: Local LLM inference via mistral.rs (feature:inference)
§Architecture
┌─────────────────────────────────────────────────────────────────────────────┐
│ spn-native │
│ ├── HuggingFaceStorage Download GGUF models from HuggingFace Hub │
│ ├── detect_available_ram_gb() Platform-specific RAM detection │
│ ├── default_model_dir() Default storage path (~/.spn/models) │
│ └── NativeRuntime (inference) mistral.rs inference integration │
└─────────────────────────────────────────────────────────────────────────────┘§Features
progress: Enable terminal progress bars for downloadsinference: Enable local LLM inference via mistral.rsnative: Alias forinferencefull: All features
§Example: Download
ⓘ
use spn_native::{HuggingFaceStorage, default_model_dir, detect_available_ram_gb};
use spn_core::{find_model, auto_select_quantization, DownloadRequest};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Detect RAM and select quantization
let ram_gb = detect_available_ram_gb();
let model = find_model("qwen3:8b").unwrap();
let quant = auto_select_quantization(model, ram_gb);
// Create storage and download
let storage = HuggingFaceStorage::new(default_model_dir());
let request = DownloadRequest::curated(model).with_quantization(quant);
let result = storage.download(&request, |progress| {
println!("{}: {:.1}%", progress.status, progress.percent());
}).await?;
println!("Downloaded to: {:?}", result.path);
Ok(())
}§Example: Inference (requires inference feature)
ⓘ
use spn_native::inference::{NativeRuntime, InferenceBackend};
use spn_core::{LoadConfig, ChatOptions};
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let mut runtime = NativeRuntime::new();
// Load a downloaded model
runtime.load("~/.spn/models/qwen3-8b-q4_k_m.gguf".into(), LoadConfig::default()).await?;
// Run inference
let response = runtime.infer("What is 2+2?", ChatOptions::default()).await?;
println!("{}", response.message.content);
Ok(())
}Re-exports§
pub use inference::DynInferenceBackend;pub use inference::InferenceBackend;pub use inference::NativeRuntime;
Modules§
- inference
- Native LLM inference module.
Structs§
- Chat
Options - Options for chat completion.
- Chat
Response - Response from a chat completion.
- Download
Request - Request to download a model.
- Download
Result - Result of a model download.
- Hugging
Face Storage - Storage backend for HuggingFace Hub models.
- Known
Model - A curated model in the registry.
- Load
Config - Configuration for loading a model.
- Model
Info - Information about an installed model.
- Pull
Progress - Progress information during model pull/download.
Enums§
- Backend
Error - Error types for backend operations.
- Model
Architecture - Architecture supported by mistral.rs v0.7.0.
- Model
Type - Model capability type.
- Native
Error - Errors that can occur in spn-native operations.
- Quantization
- Quantization levels for GGUF models.
- Resolved
Model - Result of model resolution.
Traits§
- Model
Storage - Model storage backend (sync version).
Functions§
- auto_
select_ quantization - Auto-select quantization based on available RAM.
- default_
model_ dir - Default model storage directory.
- detect_
available_ ram_ gb - Detect available system RAM in gigabytes.
- extract_
quantization - Extract quantization type from a filename.
- find_
model - Find a curated model by ID.
- resolve_
model - Resolve a model ID to a
KnownModelor HuggingFace passthrough.
Type Aliases§
- Progress
Callback - Type alias for download progress callbacks.
- Result
- Result type alias for spn-native operations.