# spn-native Implementation Checklist
**Current Version:** v0.1.0 (Storage & Download)
**Next Version:** v0.2.0 (Inference with mistral.rs)
**Prepared by:** Rust Architect Review
---
## Immediate Actions (This Sprint)
### Code Quality & Documentation
- [ ] **Add comprehensive module docs**
```rust
```
- [ ] **Review error handling gaps**
- [ ] Add `NativeError::ModelNotLoaded`
- [ ] Add `NativeError::InferenceFailed(String)`
- [ ] Add `NativeError::UnsupportedArchitecture(String)`
- [ ] Add proper error message for missing "inference" feature
- [ ] **Fix HTTP status code handling**
```rust
match response.status() {
StatusCode::NOT_FOUND => {
return Err(NativeError::ModelNotFound { repo, filename });
}
_ if !response.status().is_success() => {
return Err(NativeError::Http(
format!("HTTP {}: {}", response.status(), ...)
));
}
_ => {}
}
```
- [ ] **Extract magic numbers to constants**
```rust
const BYTES_PER_GB: u64 = 1_073_741_824;
const KB_PER_GB: u64 = 1_048_576;
const WINDOWS_RAM_DEFAULT: u32 = 16;
const FALLBACK_RAM_DEFAULT: u32 = 8;
```
- [ ] **Add path traversal safety validation** (optional, low priority)
```rust
fn validate_model_id(model_id: &str) -> Result<()> {
if model_id.contains("..") {
return Err(NativeError::InvalidConfig(".. not allowed in model ID".into()));
}
Ok(())
}
```
### Testing
- [ ] **Add more comprehensive error tests**
```rust
#[test]
fn test_checksum_mismatch_cleanup() {
}
#[test]
fn test_interrupted_download_cleanup() {
}
```
- [ ] **Add property-based tests** (optional)
```rust
use proptest::prelude::*;
proptest! {
#[test]
fn prop_model_path_safe(repo in "[a-z0-9/-]{1,50}") {
}
}
```
- [ ] **Mark integration tests with `#[ignore]`**
```rust
#[tokio::test]
#[ignore] async fn test_real_huggingface_download() {
}
```
---
## Phase 2 Preparation (Pre-Implementation)
### Architectural Decisions
- [ ] **Confirm mistral.rs version**
- Target: v0.7.0 or later
- Check: https://github.com/EricLBuehler/mistral.rs/releases
- Action: Document minimum supported version
- [ ] **Decide on streaming architecture**
- Option A: Return `Box<dyn Stream<Item = Token>>`
- Option B: Use `futures::channel::mpsc` with spawned task
- Option C: Callback-based (like current `download()`)
- **Recommendation:** Option B (cleaner for async/await)
- [ ] **Plan GPU layer allocation strategy**
- How to auto-detect available VRAM?
- How to gracefully fall back to CPU?
- Should we expose this in LoadConfig?
### Dependency Management
- [ ] **Create feature flag matrix**
```toml
[features]
default = []
inference = ["dep:mistral-rs"]
metal = ["candle-core/metal"]
cuda = ["candle-core/cuda"]
all = ["inference", "metal", "cuda"]
```
- [ ] **Pin mistral.rs version**
- Add to Cargo.toml with exact version
- Document breaking changes if any
- Create MISTRAL_RS_CHANGELOG.md tracking compatibility
- [ ] **Test feature combinations**
```bash
cargo test --no-default-features
cargo test --features inference
cargo test --all-features
```
### Documentation
- [ ] **Create MISTRAL_RS_INTEGRATION_PLAN.md** ✅ (Done)
- [ ] **Create GPU_SETUP.md**
```markdown
# GPU Setup Guide
## macOS (Metal)
- Automatic via candle-core/metal feature
- Works on all Apple Silicon & Intel with Metal support
## Linux (CUDA)
- Requires CUDA 12.0+
- NVIDIA driver 525.0+
- cargo build --features cuda
## Windows
- TODO: Document
```
- [ ] **Create TROUBLESHOOTING.md**
```markdown
# Troubleshooting
## "Model not found" errors
- Check HuggingFace repo is public
- Verify exact model name
## Out of memory
- Try lower quantization (Q4 instead of F16)
- Check system RAM with: spn-native detect-ram
- Reduce context_size in LoadConfig
## GPU not detected
- macOS: Check System Report > GPU
- Linux: Run nvidia-smi
```
---
## Phase 2 Implementation (Detailed Task Breakdown)
### Module: inference/mod.rs
- [ ] **Define InferenceBackend trait**
```rust
pub trait InferenceBackend: Send + Sync {
async fn load(&mut self, model_path: PathBuf, config: LoadConfig) -> Result<()>;
async fn unload(&mut self) -> Result<()>;
fn is_loaded(&self) -> bool;
fn model_info(&self) -> Option<ModelInfo>;
async fn infer(&self, prompt: &str, opts: ChatOptions) -> Result<ChatResponse>;
}
```
- [ ] **Re-export public types**
```rust
pub use spn_core::{ChatOptions, ChatResponse, LoadConfig, ModelArchitecture};
pub use crate::NativeError;
#[cfg(feature = "inference")]
pub use self::runtime::NativeRuntime;
```
- [ ] **Write module documentation with examples**
### Module: inference/runtime.rs
- [ ] **Create NativeRuntime struct**
```rust
pub struct NativeRuntime {
model_path: Option<PathBuf>,
config: LoadConfig,
#[cfg(feature = "inference")]
model: Option<GgufModel>,
#[cfg(feature = "inference")]
tokenizer: Option<Tokenizer>,
device: Option<Device>,
metadata: Option<ModelInfo>,
}
```
- [ ] **Implement load()**
- [ ] Read GGUF file from disk
- [ ] Parse GGUF header (detect architecture)
- [ ] Select device (CPU/GPU)
- [ ] Create appropriate builder based on architecture
- [ ] Load tokenizer
- [ ] Populate metadata
- [ ] Handle errors gracefully
- [ ] **Implement infer()**
- [ ] Validate model is loaded
- [ ] Tokenize input
- [ ] Run forward pass
- [ ] Decode output
- [ ] Return ChatResponse
- [ ] **Implement infer_stream()** (Phase 2b)
- [ ] Return async stream of tokens
- [ ] Handle interrupts gracefully
- [ ] **Implement unload()**
- [ ] Clear model from memory
- [ ] Release GPU resources
### Module: inference/builder.rs (Optional, Phase 2+)
- [ ] **Create builder pattern for NativeRuntime**
```rust
pub struct NativeRuntimeBuilder {
model_path: Option<PathBuf>,
config: LoadConfig,
device: Option<Device>,
}
impl NativeRuntimeBuilder {
pub fn new() -> Self { ... }
pub fn model_path(mut self, path: PathBuf) -> Self { ... }
pub fn with_gpu_layers(mut self, layers: i32) -> Self { ... }
pub fn force_cpu(mut self) -> Self { ... }
pub fn build(self) -> Result<NativeRuntime> { ... }
}
```
### Error Handling: error.rs
- [ ] **Add InferenceError variants**
```rust
#[derive(Error, Debug)]
pub enum NativeError {
#[error("Model not loaded")]
#[cfg(feature = "inference")]
ModelNotLoaded,
#[error("Inference failed: {0}")]
#[cfg(feature = "inference")]
InferenceFailed(String),
#[error("Unsupported architecture: {0}")]
#[cfg(feature = "inference")]
UnsupportedArchitecture(String),
#[error("Device error: {0}")]
#[cfg(feature = "inference")]
DeviceError(String),
#[error("Tokenizer error: {0}")]
#[cfg(feature = "inference")]
TokenizerError(String),
}
```
- [ ] **Add feature-gated conversions**
```rust
#[cfg(feature = "inference")]
impl From<mistral_rs::MistralError> for NativeError {
fn from(err: mistral_rs::MistralError) -> Self {
NativeError::InferenceFailed(err.to_string())
}
}
```
### Tests: tests/inference.rs
- [ ] **Create test module with `#[ignore]` tests**
```rust
#[cfg(feature = "inference")]
mod inference_tests {
#[tokio::test]
#[ignore]
async fn test_load_gguf() { ... }
#[tokio::test]
#[ignore]
async fn test_infer_text() { ... }
#[tokio::test]
#[ignore]
async fn test_unload() { ... }
}
```
- [ ] **Document how to run ignored tests**
```bash
cargo test --test inference -- --ignored --test-threads 1
```
### Integration: lib.rs
- [ ] **Update public API**
```rust
mod inference;
#[cfg(feature = "inference")]
pub use inference::NativeRuntime;
#[cfg(feature = "inference")]
pub use spn_core::{ChatOptions, ChatResponse};
```
- [ ] **Update module-level documentation**
```rust
```
---
## Phase 2b: Nika Integration
### In nika/Cargo.toml
- [ ] **Add spn-native with inference feature**
```toml
[dependencies]
spn-native = { path = "../supernovae-cli/crates/spn-native", features = ["inference"] }
```
### In nika/src/
- [ ] **Create model_executor module**
```rust
pub mod model_executor {
use spn_native::NativeRuntime;
use spn_core::LoadConfig;
pub struct ModelExecutor {
runtime: NativeRuntime,
loaded_model: Option<String>,
}
impl ModelExecutor {
pub async fn load_and_infer(&mut self, prompt: &str) -> Result<String> { ... }
}
}
```
- [ ] **Wire into Nika's execution engine**
- [ ] Hook model loading commands
- [ ] Hook infer commands
- [ ] Handle errors from NativeRuntime
- [ ] **Update Nika tests to use local inference**
- [ ] Remove Ollama dependency tests
- [ ] Add feature guards: `#[cfg(feature = "native-inference")]`
### In nika/docs/
- [ ] **Create NATIVE_INFERENCE.md**
- [ ] How to enable locally
- [ ] GPU setup instructions
- [ ] Troubleshooting guide
- [ ] Performance expectations
---
## Phase 3: Streaming & Advanced
### Streaming Support
- [ ] **Implement infer_stream() properly**
- [ ] Return async stream of tokens
- [ ] Handle cancellation
- [ ] Test with long outputs
- [ ] **Update Nika TUI to show streaming output**
- [ ] Display tokens as they arrive
- [ ] Show estimated tokens/sec
- [ ] Allow user to stop generation
### Performance Optimization
- [ ] **Implement token caching** (KV cache management)
- [ ] **Add batch processing** for multiple requests
- [ ] **Profile GPU memory usage**
- [ ] **Optimize context window allocation**
---
## Quality Gates: Before Release
### Code Quality
- [ ] **Zero clippy warnings**
```bash
cargo clippy --all-targets --all-features -- -D warnings
```
- [ ] **Format check**
```bash
cargo fmt --check
```
- [ ] **Documentation tests pass**
```bash
cargo test --doc
```
- [ ] **All unit tests pass**
```bash
cargo test --lib
```
### Integration Testing
- [ ] **Manual test on macOS (Metal)**
```bash
cargo run --example load-infer --features inference,metal
```
- [ ] **Manual test on Linux (CUDA, if available)**
```bash
cargo run --example load-infer --all-features
```
- [ ] **Manual test on Windows (CPU fallback)**
```bash
cargo run --example load-infer --features inference
```
### Documentation
- [ ] **All public items have doc comments**
```bash
cargo doc --no-deps --open ```
- [ ] **Examples compile and run**
```bash
cargo test --example '*' --features inference
```
- [ ] **README.md is up-to-date**
### Performance Benchmarks
- [ ] **Model load time < 30 seconds** (for typical 8B model)
- [ ] **Inference latency < 2 seconds/token** (on 16GB RAM)
- [ ] **Memory footprint reasonable** (no memory leaks)
---
## Release Checklist
### Pre-Release (v0.2.0)
- [ ] **Bump version in Cargo.toml**
```toml
[package]
name = "spn-native"
version = "0.2.0"
```
- [ ] **Update CHANGELOG.md**
```markdown
## [0.2.0] - 2024-MM-DD
### Added
- Inference support via mistral.rs
- NativeRuntime for local model execution
- GPU acceleration (Metal, CUDA)
- Streaming inference support
### Changed
- Re-export ChatOptions, ChatResponse from spn-core
### Fixed
- HTTP status code handling (distinguish 404 vs 500)
```
- [ ] **Tag release**
```bash
git tag -a v0.2.0 -m "Add mistral.rs inference support"
git push origin v0.2.0
```
- [ ] **Publish to crates.io**
```bash
cargo publish
```
### Post-Release
- [ ] **Update downstream repos**
- [ ] nika: bump spn-native dependency
- [ ] docs: add inference examples
---
## Known Limitations & TODOs
### Current (v0.1.0)
- [ ] Windows RAM detection not implemented (uses 16GB default)
- [ ] No retry logic for network failures
- [ ] No streaming downloads
### Phase 2 (v0.2.0)
- [ ] VLMs not yet supported (text models only)
- [ ] No fine-tuning or LoRA support
- [ ] Single model loaded at a time
### Phase 3+
- [ ] Multi-model swapping (hot-swap)
- [ ] Distributed inference across devices
- [ ] Advanced quantization techniques
---
## Success Metrics
### Before Phase 2 Release
- [ ] 100% unit test pass rate
- [ ] Zero clippy warnings
- [ ] Documentation coverage > 95%
- [ ] Inference latency < 2 sec/token on reference hardware
### After Nika Integration
- [ ] Nika can run workflows offline
- [ ] Performance parity with Ollama ± 10%
- [ ] User feedback rating > 4.0/5.0
---
## Quick Reference: Command Summary
```bash
# Development
cargo test # Unit tests
cargo test -- --ignored # Integration tests
cargo clippy -- -D warnings # Lint
cargo fmt # Format
cargo doc --no-deps --open # View docs
# Build
cargo build --release # Native
cargo build --features inference # With inference
cargo build --all-features # All features
# Publish
cargo publish --dry-run # Verify
cargo publish # Publish to crates.io
# Testing inference (Phase 2+)
cargo test --test inference -- --ignored
cargo run --example load-infer --features inference
```
---
**Last Updated:** 2026-03-10
**Status:** Ready for Phase 2 implementation
**Reviewed by:** Rust Architect (Claude)