vectorlite 0.1.4

A high-performance, in-memory vector database optimized for AI agent workloads
Documentation
# 🤖 Model Loading Guide for VectorLite

This guide explains how to find and load real BERT models for your VectorLite project.

## 1. Finding the Right Model

### 🎯 **Best Models for VectorLite**

| Model | Dimensions | Size | Speed | Quality | Best For |
|-------|------------|------|-------|---------|----------|
| `sentence-transformers/all-MiniLM-L6-v2` | 384 | 22MB | ⚡⚡⚡ | ⭐⭐⭐⭐ | **Recommended** - Fast & efficient |
| `sentence-transformers/all-mpnet-base-v2` | 768 | 420MB | ⚡⚡ | ⭐⭐⭐⭐⭐ | High quality similarity |
| `microsoft/DialoGPT-small` | 768 | 117MB | ⚡⚡ | ⭐⭐⭐ | Conversational text |
| `distilbert-base-uncased` | 768 | 250MB | ⚡⚡ | ⭐⭐⭐⭐ | General purpose |

### 🔍 **How to Research Models**

1. **Visit Hugging Face Model Hub**: https://huggingface.co/models
2. **Filter by**:
   - Library: `sentence-transformers`
   - Task: `sentence-similarity`
   - Language: `en`
3. **Check model cards** for:
   - Performance metrics (accuracy, speed)
   - Model size and memory requirements
   - Supported languages
   - Example usage

### 📊 **Model Selection Criteria**

- **Size**: Smaller = faster download, less memory
- **Speed**: Important for real-time search
- **Quality**: Better embeddings = better search results
- **Dimensions**: 384 is more efficient than 768
- **Specialization**: Some models are better for specific domains

## 2. Loading Models in Code

### 🚀 **Step-by-Step Implementation**

#### Step 1: Add Dependencies

Add these to your `Cargo.toml`:

```toml
[dependencies]
candle-core = { version = "0.9.1", features = ["accelerate"] }
candle-nn = "0.9.1"
candle-transformers = "0.9.1"
tokenizers = "0.19"
hf-hub = "0.3"
tokio = { version = "1.0", features = ["full"] }
serde_json = "1.0"
```

#### Step 2: Download Model Files

```rust
use hf_hub::api::tokio::Api;

async fn download_model(model_id: &str) -> Result<(), Box<dyn std::error::Error>> {
    let api = Api::new()?;
    let repo = api.model(model_id.to_string());
    
    // Download required files
    let config_path = repo.get("config.json").await?;
    let tokenizer_path = repo.get("tokenizer.json").await?;
    let model_path = repo.get("pytorch_model.bin").await?;
    
    println!("✅ Model files downloaded to: {:?}", config_path.parent().unwrap());
    Ok(())
}
```

#### Step 3: Load Tokenizer

```rust
use tokenizers::Tokenizer;

fn load_tokenizer(tokenizer_path: &str) -> Result<Tokenizer, Box<dyn std::error::Error>> {
    let tokenizer = Tokenizer::from_file(tokenizer_path)?;
    Ok(tokenizer)
}
```

#### Step 4: Load Model Config

```rust
use candle_transformers::models::bert::Config;
use serde_json;

fn load_config(config_path: &str) -> Result<Config, Box<dyn std::error::Error>> {
    let config_str = std::fs::read_to_string(config_path)?;
    let config: Config = serde_json::from_str(&config_str)?;
    Ok(config)
}
```

#### Step 5: Load Model Weights

```rust
use candle_core::{Device, DType};
use candle_nn::VarBuilder;
use candle_transformers::models::bert::BertModel;

fn load_model(model_path: &str, config: &Config, device: &Device) -> Result<BertModel, Box<dyn std::error::Error>> {
    let weights = VarBuilder::from_pth(model_path, DType::F32, device)?;
    let model = BertModel::load(&weights, config)?;
    Ok(model)
}
```

#### Step 6: Generate Embeddings

```rust
use candle_core::Tensor;

fn generate_embedding(
    text: &str,
    tokenizer: &Tokenizer,
    model: &BertModel,
    device: &Device,
) -> Result<Vec<f64>, Box<dyn std::error::Error>> {
    // 1. Tokenize
    let encoding = tokenizer.encode(text, true)?;
    let token_ids = encoding.get_ids();
    
    // 2. Convert to tensor
    let input_ids = Tensor::new(token_ids, device)?;
    let input_ids = input_ids.unsqueeze(0)?; // Add batch dimension
    
    // 3. Run through model
    let outputs = model.forward(&input_ids)?;
    let embeddings = outputs.hidden_states.unwrap_or(outputs.last_hidden_state);
    
    // 4. Extract [CLS] token (first token)
    let cls_embedding = embeddings.i((0, 0))?;
    
    // 5. Convert to Vec<f64>
    let embedding: Vec<f64> = cls_embedding.to_vec1()?;
    
    // 6. L2 normalize
    let norm: f64 = embedding.iter().map(|x| x * x).sum::<f64>().sqrt();
    let normalized: Vec<f64> = if norm > 0.0 {
        embedding.iter().map(|x| x / norm).collect()
    } else {
        embedding
    };
    
    Ok(normalized)
}
```

## 3. Complete Working Example

See `examples/real_model_loading.rs` for a complete implementation.

### 🏃 **Quick Start**

```bash
# Run the example
cargo run --example real_model_loading

# This will:
# 1. Download the model from Hugging Face
# 2. Load it into memory
# 3. Generate embeddings for sample texts
```

## 4. Production Considerations

### 💾 **Caching Models**

```rust
use std::path::Path;

fn get_model_cache_path(model_id: &str) -> PathBuf {
    let cache_dir = std::env::var("HF_HOME")
        .unwrap_or_else(|_| ".cache/huggingface".to_string());
    Path::new(&cache_dir).join("hub").join(model_id)
}
```

### 🚀 **Performance Optimization**

1. **Use GPU if available**:
   ```rust
   let device = Device::Cpu; // or Device::Cuda(0) for GPU
   ```

2. **Batch processing**:
   ```rust
   fn generate_embeddings_batch(texts: &[String]) -> Result<Vec<Vec<f64>>> {
       // Process multiple texts at once for better performance
   }
   ```

3. **Model quantization** (for smaller models):
   ```rust
   // Use quantized models for faster inference
   let weights = VarBuilder::from_pth(model_path, DType::F16, device)?;
   ```

### 🔧 **Error Handling**

```rust
#[derive(Error, Debug)]
pub enum ModelError {
    #[error("Failed to download model: {0}")]
    DownloadFailed(String),
    #[error("Failed to load tokenizer: {0}")]
    TokenizerError(String),
    #[error("Failed to load model: {0}")]
    ModelLoadError(String),
    #[error("Inference failed: {0}")]
    InferenceError(String),
}
```

## 5. Next Steps

1. **Choose your model** based on your needs
2. **Implement the loading code** using the examples above
3. **Test with your data** to ensure quality
4. **Optimize for performance** based on your use case
5. **Add error handling** for production use

## 🎯 **Recommended Path**

For VectorLite, I recommend starting with `sentence-transformers/all-MiniLM-L6-v2` because:
- ✅ Small and fast
- ✅ Designed for similarity search
- ✅ 384 dimensions (efficient)
- ✅ Great performance on most tasks

This will give you a solid foundation that you can optimize later!