# **Enhanced Model Integration Plan for Kandil Code**
**Version 2.0 - Security-First, Production-Ready Specification**
---
## **1. Architecture: The Secure Universal Model Registry (UMR)**
### **1.1 Core Design Principles**
- **Security First**: No API keys in CLI args, environment variables, or files
- **Fail-Fast Validation**: Block incompatible models before download
- **Explicit Over Magic**: Curated profiles > brittle auto-detection
- **Performance**: Cached discovery, connection pooling, lazy loading
### **1.2 Registry Structure**
```rust
// src/model/registry/mod.rs
use std::sync::Arc;
use tokio::sync::RwLock;
use lazy_static::lazy_static;
/// Central registry managing all model interactions
pub struct UniversalModelRegistry {
/// Known model profiles (curated, verified)
profiles: Arc<RwLock<HashMap<String, ModelProfile>>>,
/// User-added custom models
custom_models: Arc<RwLock<HashMap<String, ModelConfig>>>,
/// Hardware compatibility cache
compatibility_cache: Arc<RwLock<HashMap<String, CompatibilityReport>>>,
/// Connection pool for API clients
connection_pool: Arc<ConnectionPool>,
}
lazy_static! {
pub static ref REGISTRY: UniversalModelRegistry = UniversalModelRegistry::new();
}
impl UniversalModelRegistry {
fn new() -> Self {
Self {
profiles: Arc::new(RwLock::new(Self::load_builtin_profiles())),
custom_models: Arc::new(RwLock::new(Self::load_custom_models())),
compatibility_cache: Arc::new(RwLock::new(HashMap::new())),
connection_pool: Arc::new(ConnectionPool::new()),
}
}
/// Load 50+ verified model profiles at compile time
fn load_builtin_profiles() -> HashMap<String, ModelProfile> {
include!("builtin_profiles.rs") // Compile-time inclusion
}
}
```
### **1.3 Secure Model Configuration**
```rust
// src/model/config.rs
use serde::{Deserialize, Serialize};
use keyring::Entry;
/// Public configuration (safe to serialize)
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ModelConfig {
pub name: String,
pub provider: Provider,
pub base_url: String,
pub protocol: Protocol,
pub context_window: usize,
pub supports_vision: bool,
pub supports_tools: bool,
pub recommended_for: Vec<TaskType>,
pub api_key_required: bool,
}
/// Sensitive credentials (never serialized)
pub struct ModelCredentials {
pub api_key: SecretString,
pub organization: Option<String>,
}
impl ModelCredentials {
/// Load from OS keyring
pub fn load(provider: &str) -> Result<Self> {
let entry = Entry::new("kandil", &format!("api_key_{}", provider))?;
let api_key = entry.get_password()?;
let org_entry = Entry::new("kandil", &format!("org_{}", provider)).ok();
let organization = org_entry.and_then(|e| e.get_password().ok());
Ok(Self {
api_key: SecretString::new(api_key),
organization: organization.map(SecretString::new),
})
}
/// Store securely
pub fn store(provider: &str, api_key: String, organization: Option<String>) -> Result<()> {
let entry = Entry::new("kandil", &format!("api_key_{}", provider))?;
entry.set_password(&api_key)?;
if let Some(org) = organization {
let org_entry = Entry::new("kandil", &format!("org_{}", provider))?;
org_entry.set_password(&org)?;
}
Ok(())
}
}
// src/types.rs
use secrecy::{Secret, SecretString};
/// Secure wrapper for sensitive data
pub type SecretString = Secret<String>;
```
---
## **2. Secure Model Addition Workflow**
### **2.1 CLI Command (Safe by Design)**
```bash
# â
SECURE: No API key in command
$ kandil model add qwen2.5-coder-32b
# Interactive prompt for API key
đ API key required for qwen2.5-coder-32b
Provider: Alibaba DashScope
Security: Key will be stored in OS keyring
âšī¸ Get key from: https://dashscope.console.aliyun.com/apiKey
? Enter API key (hidden): ************
? Confirm API key: ************
â
Key validated and stored securely
# Hardware check
đĨī¸ Hardware Check:
- RAM: 32GB available (Need 24GB) â
- GPU: RTX 4090 24GB â
- Disk: 45GB free (Need 18.2GB) â
â
Compatible
# Download verification
đĻ Downloading Qwen2.5-Coder-32B-Q4.gguf...
â
Checksum verified: ef3a3b2c...
â
Model signature valid (Alibaba official)
â
No malicious metadata detected
â
Sandboxed load test passed
â
Model ready: Use 'kandil /model use qwen32b'
```
### **2.2 Implementation**
```rust
// src/cli/commands/model_add.rs
use rpassword::prompt_password;
pub async fn execute(args: ModelAddArgs) -> Result<()> {
// 1. Resolve model name to profile
let profile = REGISTRY.resolve_profile(&args.name).await?;
// 2. Hardware compatibility check (with caching)
let report = REGISTRY.check_compatibility(&profile.name).await?;
if report.blocked {
return Err(anyhow::anyhow!(
"Model incompatible:\n{}",
report.reasons.join("\n")
));
}
// 3. Secure API key input if required
if profile.api_key_required {
let entry = keyring::Entry::new("kandil", &format!("api_key_{}", profile.provider))?;
if matches!(entry.get_password(), Err(keyring::Error::NoEntry)) {
eprintln!("đ API key required for {}", profile.name);
eprintln!(" âšī¸ Get key from: {}", profile.api_key_url);
let key = prompt_password("? Enter API key: ")?;
let confirm = prompt_password("? Confirm API key: ")?;
if key != confirm {
bail!("Keys don't match");
}
// 4. Validate key before storing
validate_api_key(&profile, &key).await?;
// 5. Store securely
ModelCredentials::store(&profile.provider, key, None)?;
eprintln!("â
Key validated and stored in keyring");
}
}
// 6. Download and verify local models
if profile.is_local {
download_and_verify(&profile, &report.warnings).await?;
}
// 7. Test actual API call
test_model_connection(&profile).await?;
// 8. Save to user config
ConfigManager::add_model(&profile).await?;
Ok(())
}
/// Validate API key with a minimal test request
async fn validate_api_key(profile: &ModelProfile, key: &str) -> Result<()> {
let client = reqwest::Client::new();
let res = client
.post(&profile.base_url)
.header("Authorization", format!("Bearer {}", key))
.json(&json!({
"model": profile.name,
"messages": [{"role": "user", "content": "hi"}],
"max_tokens": 1
}))
.send()
.await?;
if res.status() == 401 {
bail!("Invalid API key");
} else if res.status() == 429 {
bail!("Rate limit exceeded - key may be valid but overused");
} else if !res.status().is_success() {
bail!("Key validation failed: {}", res.text().await?);
}
Ok(())
}
```
---
## **3. Hardware Compatibility Engine**
### **3.1 Real-Time Detection**
```rust
// src/hardware/detector.rs
use sysinfo::{System, SystemExt, DiskExt};
use nvml_wrapper::{NVML, Device};
pub struct HardwareDetector {
sys: Arc<RwLock<System>>,
nvml: Option<NVML>,
}
impl HardwareDetector {
pub fn new() -> Result<Self> {
let nvml = NVML::init().ok(); // Optional GPU support
Ok(Self {
sys: Arc::new(RwLock::new(System::new_all())),
nvml,
})
}
pub fn get_report(&self) -> HardwareReport {
let sys = self.sys.read().unwrap();
let total_ram = sys.total_memory(); // in KB
let free_disk = self.get_free_disk();
let gpu_info = self.detect_gpu();
HardwareReport {
total_ram_gb: total_ram as f64 / 1024.0 / 1024.0,
free_disk_gb: free_disk,
cpu_cores: sys.cpus().len(),
gpu: gpu_info,
}
}
fn detect_gpu(&self) -> Option<GPUInfo> {
let nvml = self.nvml.as_ref()?;
let device = nvml.device_by_index(0).ok()?;
Some(GPUInfo {
name: device.name().ok()?,
memory_gb: device.memory_info().ok()?.total as f64 / 1024.0 / 1024.0 / 1024.0,
cuda_cores: device.num_cores().ok()?,
})
}
}
#[derive(Debug, Clone)]
pub struct HardwareReport {
pub total_ram_gb: f64,
pub free_disk_gb: f64,
pub cpu_cores: usize,
pub gpu: Option<GPUInfo>,
}
#[derive(Debug, Clone)]
pub struct GPUInfo {
pub name: String,
pub memory_gb: f64,
pub cuda_cores: u32,
}
```
### **3.2 Compatibility Scoring**
```rust
// src/model/compatibility.rs
pub struct CompatibilityEngine;
impl CompatibilityEngine {
pub fn check(&self, profile: &ModelProfile, hardware: &HardwareReport) -> CompatibilityReport {
let mut report = CompatibilityReport::default();
// RAM requirement: Model size * 1.5 for overhead
let required_ram = profile.size_gb * 1.5;
if hardware.total_ram_gb < required_ram {
report.blocked = true;
report.reasons.push(format!(
"Insufficient RAM: {:.1}GB required, {:.1}GB available",
required_ram, hardware.total_ram_gb
));
} else if hardware.total_ram_gb < required_ram * 1.2 {
report.warnings.push("RAM usage will be high".to_string());
}
// Disk space
if hardware.free_disk_gb < profile.size_gb {
report.blocked = true;
report.reasons.push(format!(
"Insufficient disk: {:.1}GB required, {:.1}GB free",
profile.size_gb, hardware.free_disk_gb
));
}
// GPU recommendation
if profile.recommends_gpu && hardware.gpu.is_none() {
report.warnings.push(
"Model performs best with GPU acceleration. CPU-only will be slow.".to_string()
);
}
// GPU memory for local models
if profile.is_local {
if let Some(ref gpu) = hardware.gpu {
if gpu.memory_gb < profile.size_gb {
report.warnings.push(format!(
"GPU VRAM ({:.1}GB) < Model size ({:.1}GB). Will use CPU inference.",
gpu.memory_gb, profile.size_gb
));
}
}
}
report
}
}
pub struct CompatibilityReport {
pub blocked: bool,
pub reasons: Vec<String>,
pub warnings: Vec<String>,
}
```
---
## **4. Smart Discovery & Caching**
### **4.1 Cached Discovery Engine**
```rust
// src/model/discovery/cache.rs
use tokio::sync::RwLock;
use std::time::{Instant, Duration};
pub struct CachedDiscoveryEngine {
inner: DiscoveryEngine,
cache: Arc<RwLock<HashMap<String, CacheEntry>>>,
ttl: Duration,
}
struct CacheEntry {
results: Vec<ModelCandidate>,
timestamp: Instant,
}
impl CachedDiscoveryEngine {
pub async fn discover(&self, query: &str) -> Result<Vec<ModelCandidate>> {
// Check cache
{
let cache = self.cache.read().await;
if let Some(entry) = cache.get(query) {
if entry.timestamp.elapsed() < self.ttl {
return Ok(entry.results.clone());
}
}
}
// Parallel prioritized discovery
let results = self.discover_prioritized(query).await?;
// Store in cache
{
let mut cache = self.cache.write().await;
cache.insert(query.to_string(), CacheEntry {
results: results.clone(),
timestamp: Instant::now(),
});
}
Ok(results)
}
async fn discover_prioritized(&self, query: &str) -> Result<Vec<ModelCandidate>> {
// Priority 1: Local Ollama (fastest)
if let Ok(local) = self.discover_ollama(query).await {
if !local.is_empty() {
return Ok(local);
}
}
// Priority 2: Known cloud models (no discovery needed)
if let Some(profile) = REGISTRY.get_builtin_profile(query) {
return Ok(vec![ModelCandidate::from_profile(profile)]);
}
// Priority 3: Hugging Face (async, cached)
if query.contains('/') || query.len() > 10 {
if let Ok(hf) = self.discover_huggingface(query).await {
if !hf.is_empty() {
return Ok(hf);
}
}
}
bail!("No models found for '{}'", query)
}
}
```
---
## **5. Curated Model Profiles (Built-in)**
### **5.1 Compile-Time Profiles**
```rust
// src/model/builtin_profiles.rs
pub const BUILTIN_PROFILES: &[(&str, ModelProfile)] = &[
// Anthropic
("claude-3.5-sonnet", ModelProfile {
provider: Provider::Anthropic,
base_url: "https://api.anthropic.com/v1/messages",
protocol: Protocol::Anthropic,
context_window: 200_000,
max_tokens: 4096,
supports_vision: true,
supports_tools: true,
api_key_required: true,
api_key_url: "https://console.anthropic.com/settings/keys",
size_gb: 0.0, // Cloud model
is_local: false,
recommends_gpu: false,
recommended_for: vec![TaskType::Coding, TaskType::Reasoning],
}),
// Alibaba Qwen
("qwen2.5-coder-32b", ModelProfile {
provider: Provider::Dashscope,
base_url: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
protocol: Protocol::OpenAI,
context_window: 32_768,
max_tokens: 8192,
supports_vision: false,
supports_tools: true,
api_key_required: true,
api_key_url: "https://dashscope.console.aliyun.com/apiKey",
size_gb: 18.2,
is_local: false,
recommends_gpu: true,
recommended_for: vec![TaskType::Coding, TaskType::Speed],
}),
// Ollama local
("llama3.1:70b", ModelProfile {
provider: Provider::Ollama,
base_url: "http://localhost:11434",
protocol: Protocol::OpenAI,
context_window: 128_000,
max_tokens: 4096,
supports_vision: false,
supports_tools: false,
api_key_required: false,
api_key_url: String::new(),
size_gb: 40.5,
is_local: true,
recommends_gpu: true,
recommended_for: vec![TaskType::Coding, TaskType::Privacy],
}),
// 50+ more models...
];
```
### **5.2 Fuzzy Name Resolution**
```rust
// src/model/resolver.rs
pub struct ModelResolver;
impl ModelResolver {
/// Resolve user input to exact model name
pub fn resolve(&self, input: &str) -> Result<String> {
let input = input.to_lowercase();
// 1. Exact match
if BUILTIN_PROFILES.iter().any(|(name, _)| name == &input) {
return Ok(input);
}
// 2. Alias match
match input.as_str() {
"claude" => return Ok("claude-3.5-sonnet".to_string()),
"gpt4" => return Ok("gpt-4-turbo".to_string()),
"qwen32b" => return Ok("qwen2.5-coder-32b".to_string()),
_ => {}
}
// 3. Fuzzy match
let matches = self.fuzzy_search(&input);
if matches.len() == 1 {
return Ok(matches[0].clone());
} else if matches.len() > 1 {
bail!("Ambiguous name. Did you mean: {:?}?", matches);
}
// 4. Unknown model - require explicit manual add
bail!(
"Unknown model '{}'. Run: kandil model add {} --manual\n\
Or see available models: kandil model list",
input, input
)
}
fn fuzzy_search(&self, input: &str) -> Vec<String> {
BUILTIN_PROFILES
.iter()
.filter(|(name, _)| name.contains(input) || input.contains(name))
.map(|(name, _)| name.to_string())
.collect()
}
}
```
---
## **6. Connection Pooling & Performance**
### **6.1 Per-Model Connection Pool**
```rust
// src/client/pool.rs
use reqwest::{Client, ClientBuilder};
use std::time::Duration;
pub struct ConnectionPool {
clients: Arc<RwLock<HashMap<String, Arc<Client>>>>,
}
impl ConnectionPool {
pub fn new() -> Self {
Self {
clients: Arc::new(RwLock::new(HashMap::new())),
}
}
pub async fn get(&self, model: &str) -> Arc<Client> {
let mut clients = self.clients.write().await;
if let Some(client) = clients.get(model) {
return client.clone();
}
// Create optimized client for this model
let profile = REGISTRY.get_profile(model).await.unwrap();
let client = Arc::new(
ClientBuilder::new()
.connect_timeout(Duration::from_secs(10))
.timeout(Duration::from_secs(120)) // For long completions
.pool_idle_timeout(Duration::from_secs(30))
.pool_max_idle_per_host(10)
.tcp_keepalive(Duration::from_secs(60))
.user_agent(format!("Kandil-Code/{}", env!("CARGO_PKG_VERSION")))
.default_headers({
let mut headers = HeaderMap::new();
if let Ok(creds) = ModelCredentials::load(&profile.provider) {
headers.insert(
"Authorization",
format!("Bearer {}", creds.api_key.expose_secret()).parse().unwrap()
);
}
headers
})
.build()
.unwrap()
);
clients.insert(model.to_string(), client.clone());
client
}
}
```
### **6.2 Request Batching**
```rust
// src/client/batch.rs
pub struct BatchClient {
pool: Arc<ConnectionPool>,
queue: Arc<RwLock<Vec<BatchRequest>>>,
}
struct BatchRequest {
model: String,
prompt: String,
response_tx: oneshot::Sender<Result<String>>,
}
impl BatchClient {
pub async fn batch_complete(&self, requests: Vec<(String, String)>) -> Result<Vec<String>> {
// Group by model
let mut by_model: HashMap<String, Vec<_>> = HashMap::new();
for (model, prompt) in requests {
by_model.entry(model).or_default().push(prompt);
}
// Execute in parallel
let mut results = vec![];
let futures = by_model.into_iter().map(|(model, prompts)| {
let pool = self.pool.clone();
async move {
let client = pool.get(&model).await;
let mut outputs = vec![];
for prompt in prompts {
let res = client
.post(format!("{}/chat/completions", model))
.json(&json!({
"model": model,
"messages": [{"role": "user", "content": prompt}],
"stream": false,
}))
.send()
.await?
.json::<serde_json::Value>()
.await?;
outputs.push(res["choices"][0]["message"]["content"]
.as_str()
.unwrap_or("")
.to_string());
}
Ok::<_, anyhow::Error>(outputs)
}
});
let batch_results = futures::future::join_all(futures).await;
for res in batch_results {
results.extend(res?);
}
Ok(results)
}
}
```
---
## **7. Safety & Validation Infrastructure**
### **7.1 Model File Security**
```rust
// src/model/security/validator.rs
use sha2::{Sha256, Digest};
use gguf::GGUFFile;
pub struct ModelSecurityValidator;
impl ModelSecurityValidator {
/// Comprehensive validation before loading any model file
pub async fn validate_model_file(path: &Path, expected_hash: &str) -> Result<()> {
// 1. Verify checksum
let mut file = File::open(path)?;
let mut hasher = Sha256::new();
std::io::copy(&mut file, &mut hasher)?;
let hash = format!("{:x}", hasher.finalize());
if hash != expected_hash {
bail!(
"Checksum mismatch!\n\
Expected: {}\n\
Actual: {}\n\
Possible causes: corrupted download or tampering",
expected_hash, hash
);
}
// 2. Parse and validate GGUF structure
let gguf = GGUFFile::read(path)?;
self.validate_gguf_metadata(&gguf)?;
// 3. Sandboxed load test (in separate process)
self.sandboxed_load_test(path).await?;
Ok(())
}
fn validate_gguf_metadata(&self, gguf: &GGUFFile) -> Result<()> {
// Check for suspicious metadata keys
for key in gguf.metadata.keys() {
if key.starts_with("system.") || key.contains("exec") || key.contains("cmd") {
bail!("Suspicious metadata key found: '{}'", key);
}
}
// Validate tensor names
for tensor in &gguf.tensors {
if tensor.name.contains("..") || tensor.name.starts_with('/') {
bail!("Invalid tensor name: '{}'", tensor.name);
}
}
Ok(())
}
async fn sandboxed_load_test(&self, path: &Path) -> Result<()> {
let path_str = path.to_string_lossy();
// Spawn separate process with restricted permissions
let mut child = tokio::process::Command::new("kandil-sandbox")
.arg("load-test")
.arg(&*path_str)
.stdout(Stdio::piped())
.stderr(Stdio::piped())
.kill_on_drop(true)
.spawn()?;
let result = tokio::time::timeout(Duration::from_secs(30), child.wait_with_output()).await;
match result {
Ok(Ok(output)) if output.status.success() => Ok(()),
Ok(Ok(output)) => {
bail!("Sandboxed load test failed: {}", String::from_utf8_lossy(&output.stderr))
}
Ok(Err(e)) => Err(e.into()),
Err(_) => {
child.kill().ok();
bail!("Sandboxed load test timed out (possible corrupted model)")
}
}
}
}
```
### **7.2 API Safety Checker**
```rust
// src/model/security/api_checker.rs
pub struct APISafetyChecker;
impl APISafetyChecker {
pub async fn validate_endpoint(profile: &ModelProfile) -> Result<ValidationReport> {
let mut report = ValidationReport::default();
// 1. DNS and connectivity check
let url = Url::parse(&profile.base_url)?;
if let Err(e) = tokio::net::lookup_host(url.host_str().unwrap()).await {
bail!("DNS resolution failed: {}", e);
}
// 2. TLS certificate validation
let client = reqwest::Client::builder()
.danger_accept_invalid_certs(false) // Enforce valid certs
.build()?;
// 3. Test endpoint with minimal request
let start = Instant::now();
let test_res = client
.post(&profile.base_url)
.json(&json!({
"model": profile.name,
"messages": [{"role": "user", "content": "test"}],
"max_tokens": 1
}))
.send()
.await;
report.latency_ms = start.elapsed().as_millis();
match test_res {
Ok(res) => {
report.status_code = res.status().as_u16();
// Check rate limit headers
if let Some(limit) = res.headers().get("x-ratelimit-limit") {
report.rate_limit = limit.to_str().ok().map(|s| s.to_string());
}
// Check cost headers
if let Some(cost) = res.headers().get("x-cost-estimate") {
report.estimated_cost = cost.to_str().ok().and_then(|s| s.parse().ok());
}
// Validate response format
let body: serde_json::Value = res.json().await?;
Self::validate_response_format(&body, &profile.protocol)?;
}
Err(e) => {
bail!("API endpoint unreachable: {}", e);
}
}
Ok(report)
}
}
```
---
## **8. User Experience: Interactive Setup**
### **8.1 Guided Configuration**
```bash
$ kandil setup
⨠Kandil Code Setup Wizard
đĨī¸ Hardware Detected:
CPU: AMD Ryzen 9 7950X (16 cores)
RAM: 64GB DDR5
GPU: NVIDIA RTX 4090 (24GB VRAM)
Disk: 1TB NVMe (342GB free)
â Choose your primary model provider:
1) Local Models (Ollama) - Best for privacy
2) Alibaba Qwen - Best cost/performance
3) Anthropic Claude - Best quality
4) Google Gemini - Best for multi-modal
> 2
đ Please provide your DashScope API key:
âšī¸ Get key from: https://dashscope.console.aliyun.com/apiKey
? Enter key (hidden): ************
â
Key validated
â Which models to install?
â
qwen2.5-coder-7b (4.5GB, recommended)
â
qwen2.5-coder-14b (9.2GB, recommended)
â qwen2.5-coder-32b (18.2GB)
â qwen-max (API only)
> Enter
đĻ Downloading qwen2.5-coder-7b...
â
Checksum verified
â
Load test passed
â
Model ready
đĻ Downloading qwen2.5-coder-14b...
â
Checksum verified
â
Load test passed
â
Model ready
â
Setup complete! Try: kandil /ask "Hello world in Rust"
```
### **8.2 Model Listing with Status**
```bash
$ kandil model list
â
Local Models (Ollama):
llama3.1:8b 4.5GB â
Ready CPU inference
qwen2.5-coder:7b 4.5GB â
Ready GPU accelerated
qwen2.5-coder:14b 9.2GB â
Ready GPU accelerated
mistral-large:123b 68GB â Incompatible (Need 102GB RAM)
â
Cloud Models (Configured):
qwen2.5-coder-32b API â
Ready Low latency
claude-3.5-sonnet API â
Ready Rate: 40 req/min
gemini-1.5-pro API â ī¸ Slow Latency: 850ms
â
Other Available:
deepseek-coder:33b API â Not configured
gpt-4-turbo API â Not configured
đĄ Run 'kandil model add <name>' to configure
```
---
## **9. Testing & Quality Assurance**
### **9.1 Integration Test Suite**
```rust
// tests/model_integration.rs
#[tokio::test]
async fn test_full_model_lifecycle() {
// Setup test registry
let registry = UniversalModelRegistry::new();
// Test 1: Add model
registry.add_model("qwen-test").await.unwrap();
// Test 2: Verify credentials stored
let creds = ModelCredentials::load("qwen").unwrap();
assert_eq!(creds.api_key.expose_secret(), "test-key");
// Test 3: Hardware check
let report = registry.check_compatibility("qwen-test").await.unwrap();
assert!(!report.blocked);
// Test 4: API validation
let profile = registry.get_profile("qwen-test").await.unwrap();
let validation = APISafetyChecker::validate_endpoint(&profile).await.unwrap();
assert_eq!(validation.status_code, 200);
// Test 5: Actual completion
let result = registry.complete("qwen-test", "Hello").await.unwrap();
assert!(!result.is_empty());
// Cleanup
registry.remove_model("qwen-test").await.unwrap();
}
#[test]
fn test_malicious_model_detection() {
let validator = ModelSecurityValidator;
// Create fake malicious GGUF
let malicious_file = create_malicious_gguf();
let result = block_on(validator.validate_model_file(&malicious_file, "fake_hash"));
assert!(result.is_err());
assert!(result.unwrap_err().to_string().contains("Suspicious metadata"));
}
```
### **9.2 Rate Limit Testing**
```rust
// tests/rate_limit.rs
#[tokio::test]
async fn test_rate_limit_protection() {
let registry = UniversalModelRegistry::new();
// Spawn many concurrent requests
let mut tasks = vec![];
for _ in 0..100 {
tasks.push(registry.complete("qwen-test", "test"));
}
let results = futures::future::join_all(tasks).await;
// Should not all succeed due to rate limiting
let successes = results.iter().filter(|r| r.is_ok()).count();
let rate_limited = results.iter().filter(|r| {
r.as_ref().err().map(|e| e.to_string().contains("rate limit")).unwrap_or(false)
}).count();
assert!(rate_limited > 0, "Should have rate limited some requests");
assert!(successes <= 50, "Should not exceed rate limit");
}
```
---
## **10. Configuration Schema**
### **10.1 User Config File**
```toml
# ~/.config/kandil/config.toml
# Auto-generated, versioned config
version = 2
[defaults]
model = "qwen2.5-coder-7b" # Auto-detected if not set
theme = "dark"
telemetry = false
[api] # API endpoints (safe to commit, no keys)
qwen = { base_url = "https://dashscope-intl.aliyuncs.com/compatible-mode/v1", protocol = "openai" }
anthropic = { base_url = "https://api.anthropic.com/v1/messages", protocol = "anthropic" }
[models] # Per-model overrides
[qwen2.5-coder-7b]
max_tokens = 4096
temperature = 0.2
top_p = 0.9
[cache]
ttl_seconds = 3600
max_size_mb = 100
[limits]
max_concurrent_requests = 5
rate_limit_retry = true
```
### **10.2 Model Metadata Storage**
```json
// ~/.local/share/kandil/models/qwen32b.metadata.json
{
"schema_version": 2,
"name": "qwen2.5-coder-32b",
"source": "huggingface:Qwen/Qwen2.5-Coder-32B-Instruct-GGUF",
"sha256": "ef3a3b2c4d5e6f7890a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3",
"size_bytes": 18200000000,
"verified_at": "2025-01-15T10:30:00Z",
"capabilities": ["code", "reasoning", "long-context"],
"context_window": 32768,
"last_used": "2025-01-15T11:45:00Z",
"usage_count": 42,
"average_latency_ms": 340
}
```
---
## **11. Implementation Roadmap**
### **Phase 0: Foundation (Weeks 1-2)**
- [ ] Create `UniversalModelRegistry` with built-in profiles
- [ ] Implement hardware detection (`sysinfo`, `nvml-wrapper`)
- [ ] Set up secure credential storage (`keyring`, `secrecy`)
- [ ] Add connection pooling (`reqwest` with custom config)
### **Phase 1: Core Commands (Weeks 3-4)**
- [ ] Implement `kandil model add <name>` with interactive prompts
- [ ] Hardware compatibility checker
- [ ] API key validation
- [ ] Basic model listing
### **Phase 2: Security Hardening (Weeks 5-6)**
- [ ] GGUF file validator with checksums
- [ ] Sandboxed load testing
- [ ] API safety checker
- [ ] Rate limit implementation
### **Phase 3: Discovery & Caching (Weeks 7-8)**
- [ ] Cached discovery engine
- [ ] Ollama registry integration
- [ ] Hugging Face registry
- [ ] Fuzzy name resolution
### **Phase 4: UX Polish (Weeks 9-10)**
- [ ] Interactive setup wizard
- [ ] Enhanced CLI output with emojis
- [ ] Model performance metrics tracking
- [ ] Configuration versioning
### **Phase 5: Testing & Release (Weeks 11-12)**
- [ ] Integration test suite
- [ ] Rate limit testing
- [ ] Security test suite (malicious files)
- [ ] Documentation and examples
---
## **12. Command Reference (Final)**
```bash
# Model Management
kandil model add <name> # Add model (interactive, secure)
kandil model add <name> --manual # Manual configuration for custom models
kandil model list # List all models with status
kandil model rm <name> # Remove model
kandil model test <name> # Test model connectivity
kandil model benchmark <name> # Benchmark latency/throughput
# Configuration
kandil setup # Interactive initial setup
kandil config show # Show current configuration
kandil config set <key> <value> # Set config value
kandil config reload # Reload from disk
# Usage (AI commands)
kandil /ask "question" # Ask question (auto-select model)
kandil /ask "question" --model <name> # Use specific model
kandil /model use <name> # Set default for session
kandil /model default <name> # Set default for project
kandil /refactor <file> # Refactor code
kandil /test <path> # Generate tests
kandil /review <path> # Code review
# Advanced
kandil model discover <query> # Search for models
kandil model aliases # Show available aliases
kandil model update <name> # Update to latest version
```
---
## **13. Key Security Guarantees**
This enhanced plan ensures:
1. **No API Key Exposure**: Keys never appear in CLI, logs, or files
2. **Malicious Model Prevention**: Checksum validation + sandboxed loading
3. **Hardware Safety**: Block models that could cause OOM crashes
4. **Network Safety**: TLS validation, rate limiting, circuit breakers
5. **Supply Chain Security**: Curated profiles for known models
6. **User Consent**: Explicit confirmation for large downloads/incompatible models
---
**This enhanced plan transforms the original proposal into a production-ready system that prioritizes security, reliability, and user experience while maintaining the ambitious goal of universal model access.**