# Automatic Model Quantization Proposal
## Overview
Integrate automatic INT8 quantization into the model download pipeline, triggered by model config.
## Changes Required
### 1. Add Quantization Config to Model Schema
```rust
// In model_service.rs, ModelCatalogEntry:
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ModelCatalogEntry {
pub name: String,
pub description: String,
pub task: String,
pub repo: String,
pub filename: String,
// NEW: Quantization configuration
#[serde(default)]
pub quantize: Option<QuantizeConfig>,
...
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct QuantizeConfig {
/// Enable automatic quantization after download
pub enabled: bool,
/// Quantization type (only "dynamic_int8" supported for now)
pub method: String, // "dynamic_int8", "static_int8", etc.
}
```
### 2. Modify ModelService::pull() Method
```rust
pub async fn pull(&self, name: &str, progress: Option<&ProgressBar>) -> Result<PathBuf> {
let entry = self.get_catalog_entry(name)
.context(format!("Model '{}' not found in catalog", name))?;
// Create model directory
let model_dir = self.models_dir.join(name);
fs::create_dir_all(&model_dir).await?;
// Download FP32 model
let model_path = self
.pull_from_repo(&entry.repo, &entry.filename, name, progress)
.await?;
// Download labels...
self.pull_labels_from_repo(entry, name, progress).await?;
// Generate config...
self.generate_model_config(entry, &model_path, progress).await?;
// NEW: Auto-quantize if configured
if let Some(quantize_config) = &entry.quantize {
if quantize_config.enabled {
self.quantize_model(&model_path, quantize_config, progress).await?;
}
}
Ok(model_path)
}
```
### 3. Add Quantization Method
```rust
impl ModelService {
/// Quantize a model to INT8 (calls Python script)
async fn quantize_model(
&self,
model_path: &Path,
config: &QuantizeConfig,
progress: Option<&ProgressBar>,
) -> Result<PathBuf> {
let int8_path = model_path.with_file_name("model-int8.onnx");
// Skip if already quantized
if int8_path.exists() {
if let Some(pb) = progress {
pb.set_message("INT8 model already cached");
}
return Ok(int8_path);
}
if let Some(pb) = progress {
pb.set_message("Quantizing model to INT8...");
}
match config.method.as_str() {
"dynamic_int8" => {
self.quantize_dynamic_int8(model_path, &int8_path).await?;
}
_ => {
anyhow::bail!("Unsupported quantization method: {}", config.method);
}
}
if let Some(pb) = progress {
pb.set_message("✅ INT8 model ready");
}
Ok(int8_path)
}
/// Perform dynamic INT8 quantization using ONNX Runtime tools
async fn quantize_dynamic_int8(&self, input: &Path, output: &Path) -> Result<()> {
// Check if Python + onnxruntime available
let python = self.find_python()?;
// Embed quantization script in binary
let script = include_str!("../../scripts/quantize_int8.py");
let script_path = std::env::temp_dir().join("quantize_int8.py");
fs::write(&script_path, script).await?;
// Run Python quantization script
let status = tokio::process::Command::new(&python)
.arg(&script_path)
.arg(input)
.arg(output)
.status()
.await?;
if !status.success() {
anyhow::bail!("Quantization failed with exit code: {:?}", status.code());
}
// Cleanup temp script
let _ = fs::remove_file(&script_path).await;
Ok(())
}
/// Find Python 3 executable
fn find_python(&self) -> Result<String> {
for candidate in &["python3", "python"] {
if which::which(candidate).is_ok() {
return Ok(candidate.to_string());
}
}
anyhow::bail!(
"Python 3 not found. Install with: brew install python3 (macOS) or apt install python3 (Linux)"
)
}
}
```
### 4. Update Model Catalog
```toml
# model_catalog.toml
[[models]]
name = "yolov8n"
description = "YOLOv8 Nano - Fast object detection"
task = "object-detection"
repo = "deepghs/yolos"
filename = "yolov8n/model.onnx"
preprocessing_preset = "yolo"
# NEW: Enable automatic quantization
[models.quantize]
enabled = true
method = "dynamic_int8"
```
## Benefits
✅ **Zero manual steps** - Quantization happens automatically
✅ **Config-driven** - Enable per-model via catalog
✅ **Cached** - INT8 model cached like FP32
✅ **Portable** - Works on any machine with Python
✅ **Fallback** - Uses FP32 if quantization fails
✅ **Progressive enhancement** - Old models work without changes
## User Experience
### Before:
```bash
mecha10 models pull yolov8n
# Wait for download...
cd models/yolov8n/
pip install onnx onnxruntime
python quantize_to_int8.py
# Edit config: "use_int8": true
mecha10 dev
```
### After:
```bash
mecha10 models pull yolov8n
# Wait for download...
# Automatically quantizes to INT8!
# Edit config: "use_int8": true
mecha10 dev
```
## Dependencies
- **Python 3** (already required for Godot scripts)
- **pip packages:** `onnx onnxruntime` (auto-install or warn)
## Rollout Plan
1. **Phase 1:** Add optional quantization to ModelService
2. **Phase 2:** Enable for yolov8n in catalog
3. **Phase 3:** Document in GETTING_STARTED.md
4. **Phase 4:** Enable for all object detection models