rag-module 0.6.7

# RAG Module Publishing and Usage Guide

## Overview

This document explains how to use the RAG module after converting from embedded Qdrant to Qdrant server mode, how to publish the module as a cargo crate, and how end users (backend developers) integrate it into their applications to serve their UI/frontend.

**Key Point**: When you publish your RAG module as a cargo crate, users will install it in their **backend servers**, NOT in their frontend/UI. The UI communicates with the backend via REST APIs that the backend developer creates using your RAG module.

---

## Table of Contents

1. [Architecture Overview](#architecture-overview)
2. [Converting to Qdrant Server Mode](#converting-to-qdrant-server-mode)
3. [Publishing Your RAG Module](#publishing-your-rag-module)
4. [How Backend Developers Use Your Published Crate](#how-backend-developers-use-your-published-crate)
5. [Do You Need to Expose Qdrant APIs?](#do-you-need-to-expose-qdrant-apis)
6. [Complete Usage Example](#complete-usage-example)
7. [Configuration Options](#configuration-options)
8. [Deployment Scenarios](#deployment-scenarios)
9. [FAQ](#faq)

---

## Architecture Overview

### Recommended Architecture

```
┌─────────────────┐         ┌──────────────────────┐         ┌─────────────────────┐
│                 │  HTTP   │                      │  Rust   │                     │
│  Frontend (UI)  ├────────►│  Backend Server      ├────────►│   RAG Module        │
│  (React/Vue/    │         │  (Actix/Axum/Rocket) │  API    │   (Your Crate)      │
│   Angular)      │         │                      │  Calls  │                     │
│                 │         │  - REST endpoints    │         │  - save_chat()      │
│  User's browser │         │  - Business logic    │         │  - get_history()    │
│                 │         │  - Auth middleware   │         │  - search()         │
└─────────────────┘         └──────────────────────┘         └──────────┬──────────┘
                                                                         │
                                                                         │ gRPC/HTTP
                                                                         │ (6334/6333)
                                                                         ↓
                                                               ┌─────────────────────┐
                                                               │  Qdrant Server      │
                                                               │  (EC2/Docker/K8s)   │
                                                               │                     │
                                                               │  - Vector storage   │
                                                               │  - Vector search    │
                                                               │  - Persistence      │
                                                               └─────────────────────┘
```

### Component Responsibilities

| Component | Responsibility | Technology | Owner |
|-----------|---------------|------------|-------|
| **Frontend** | User interface, forms, display | React/Vue/Angular | End User (UI Developer) |
| **Backend Server** | REST APIs, business logic, auth | Actix-web/Axum/Rocket | End User (Backend Developer) |
| **RAG Module** | Chat history, encryption, search | Rust (your crate) | **You** (Crate Publisher) |
| **Qdrant Server** | Vector storage and retrieval | Qdrant | Infrastructure Team |

### Key Insight

**You provide**: The RAG module as a cargo crate with high-level APIs for chat management
**Users provide**: Backend server with REST endpoints for their specific UI needs
**Infrastructure**: Qdrant server (deployed separately on EC2/Docker/K8s)

**You do NOT need to expose Qdrant APIs directly** - your RAG module abstracts all Qdrant interactions.

---

## Converting to Qdrant Server Mode

### Step 1: Implement QdrantClientVectorStore

Create a new struct that implements your `VectorStore` trait using `qdrant-client`:

```rust
// src/vector_store/qdrant_client_store.rs

use qdrant_client::{Qdrant, QdrantError};
use qdrant_client::qdrant::{
    CreateCollectionBuilder, Distance, VectorParamsBuilder,
    SearchPointsBuilder, PointStruct, PointId,
};
use crate::vector_store::{VectorStore, Document, SearchOptions, SearchResult};
use anyhow::Result;

pub struct QdrantClientVectorStore {
    client: Qdrant,
    collection_name: String,
}

impl QdrantClientVectorStore {
    pub async fn new(url: &str, collection_name: &str) -> Result<Self> {
        let client = Qdrant::from_url(url).build()?;

        Ok(Self {
            client,
            collection_name: collection_name.to_string(),
        })
    }
}

#[async_trait::async_trait]
impl VectorStore for QdrantClientVectorStore {
    async fn create_collection(&mut self, vector_size: usize) -> Result<()> {
        let collection_config = CreateCollectionBuilder::new(&self.collection_name)
            .vectors_config(VectorParamsBuilder::new(vector_size as u64, Distance::Cosine))
            .build();

        self.client.create_collection(collection_config).await?;
        Ok(())
    }

    async fn upsert(&mut self, documents: Vec<Document>) -> Result<()> {
        let points: Vec<PointStruct> = documents.into_iter()
            .map(|doc| {
                PointStruct::new(
                    PointId::from(doc.id),
                    doc.vector,
                    doc.metadata,
                )
            })
            .collect();

        self.client.upsert_points(&self.collection_name, points, None).await?;
        Ok(())
    }

    async fn search(&self, query_vector: Vec<f32>, options: SearchOptions) -> Result<Vec<SearchResult>> {
        let search_request = SearchPointsBuilder::new(
            &self.collection_name,
            query_vector,
            options.limit.unwrap_or(10) as u64,
        ).build();

        let results = self.client.search_points(search_request).await?;

        let search_results = results.result.into_iter()
            .map(|point| SearchResult {
                id: point.id.unwrap().to_string(),
                score: point.score,
                payload: point.payload,
            })
            .collect();

        Ok(search_results)
    }

    // Implement other VectorStore trait methods...
}
```

### Step 2: Add Auto-Detection Logic

Update your RAG module initialization to auto-detect mode:

```rust
// src/lib.rs

use crate::vector_store::qdrant_embedded_store::QdrantEmbeddedVectorStore;
use crate::vector_store::qdrant_client_store::QdrantClientVectorStore;
use std::env;

impl RagModule {
    pub async fn new(data_dir: &str) -> Result<Self> {
        let vector_store: Box<dyn VectorStore> = if let Ok(qdrant_url) = env::var("QDRANT_URL") {
            println!("🌐 Using Qdrant server mode: {}", qdrant_url);
            Box::new(QdrantClientVectorStore::new(&qdrant_url, "chat_history").await?)
        } else {
            println!("💾 Using Qdrant embedded mode");
            Box::new(QdrantEmbeddedVectorStore::new(data_dir).await?)
        };

        Ok(Self {
            vector_store,
            data_dir: PathBuf::from(data_dir),
            // ... other fields
        })
    }
}
```

### Step 3: Update Cargo.toml

Add Qdrant client dependencies:

```toml
[dependencies]
# Existing dependencies...
qdrant = "0.11"           # Keep for embedded mode
qdrant-client = "1.11"     # Add for server mode
tonic = "0.12"
prost = "0.13"

[features]
default = ["embedded"]
embedded = ["qdrant"]
server = ["qdrant-client"]
```

### Step 4: Test Both Modes

```bash
# Test embedded mode (existing)
cargo test

# Test server mode
docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant
export QDRANT_URL="http://localhost:6333"
cargo test
```

**Result**: Same API, different backend. No code changes needed in examples or user code.

---

## Publishing Your RAG Module

### Step 1: Prepare Cargo.toml

```toml
[package]
name = "rag-module"
version = "0.1.0"
edition = "2021"
authors = ["Your Name <your.email@example.com>"]
description = "RAG (Retrieval-Augmented Generation) module with encrypted chat history and semantic search"
license = "MIT OR Apache-2.0"
repository = "https://github.com/yourusername/rag-module"
documentation = "https://docs.rs/rag-module"
keywords = ["rag", "vector-search", "chat", "encryption", "qdrant"]
categories = ["database", "cryptography", "web-programming"]
readme = "README.md"

[dependencies]
# Your dependencies...

[[example]]
name = "complete_chat_example"
path = "examples/complete_chat_example.rs"
required-features = []
```

### Step 2: Create README.md

```markdown
# RAG Module

Encrypted chat history and semantic search using Qdrant vector database.

## Features

- 🔐 AES-256-GCM encryption for chat data
- 🗄️ Dual collection architecture (chat + estate data)
- 🔍 Semantic search with BGE-M3 embeddings
- 🌐 Qdrant embedded mode OR server mode
- 📊 Session management with context IDs
- 🔑 macOS Keychain integration

## Quick Start

```rust
use rag_module::{RagModule, StartSessionOptions};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    // Initialize RAG module
    let mut rag = RagModule::new("./data").await?;
    rag.initialize().await?;

    // Start a chat session
    let session = rag.start_session(StartSessionOptions {
        user_id: "user123".to_string(),
        chat_title: Some("My Chat".to_string()),
        context_id: None,
    }).await?;

    // Add messages
    rag.add_prompt(&session.id, "Hello!", "user123").await?;
    rag.add_response(&session.id, "Hi there!", "user123").await?;

    // Retrieve history
    let history = rag.get_session_chat_history(&session.context_id).await?;

    Ok(())
}
```

## Qdrant Server Mode

```bash
# Set environment variable
export QDRANT_URL="http://your-qdrant-server:6333"

# Same code works!
cargo run --example complete_chat_example
```
```

### Step 3: Publish to crates.io

```bash
# Login to crates.io
cargo login

# Dry run to check for issues
cargo publish --dry-run

# Publish
cargo publish
```

---

## How Backend Developers Use Your Published Crate

### Step 1: Add Dependency

Backend developers add your crate to their `Cargo.toml`:

```toml
[dependencies]
rag-module = "0.1"
actix-web = "4"
tokio = { version = "1", features = ["full"] }
serde = { version = "1", features = ["derive"] }
serde_json = "1"
```

### Step 2: Initialize RAG Module in Backend

```rust
// backend/src/main.rs

use actix_web::{web, App, HttpServer, HttpResponse};
use rag_module::RagModule;
use std::sync::Arc;
use tokio::sync::Mutex;

// Application state
struct AppState {
    rag: Arc<Mutex<RagModule>>,
}

#[actix_web::main]
async fn main() -> std::io::Result<()> {
    // Initialize RAG module (auto-detects Qdrant mode via QDRANT_URL env var)
    let mut rag = RagModule::new("./data")
        .await
        .expect("Failed to initialize RAG module");
    rag.initialize().await.expect("Failed to initialize collections");

    let app_state = web::Data::new(AppState {
        rag: Arc::new(Mutex::new(rag)),
    });

    // Start HTTP server with REST endpoints
    HttpServer::new(move || {
        App::new()
            .app_data(app_state.clone())
            .route("/api/chat/history/{user_id}", web::get().to(get_chat_history))
            .route("/api/chat/message", web::post().to(send_message))
            .route("/api/search", web::post().to(search_messages))
    })
    .bind("0.0.0.0:8080")?
    .run()
    .await
}
```

### Step 3: Create REST Endpoints

```rust
// backend/src/handlers.rs

use actix_web::{web, HttpResponse};
use serde::{Deserialize, Serialize};

#[derive(Deserialize)]
struct SendMessageRequest {
    user_id: String,
    session_id: String,
    content: String,
    role: String, // "user" or "assistant"
}

#[derive(Serialize)]
struct ChatHistoryResponse {
    user_id: String,
    contexts: serde_json::Value,
}

// GET /api/chat/history/:user_id
async fn get_chat_history(
    user_id: web::Path<String>,
    state: web::Data<AppState>,
) -> HttpResponse {
    let rag = state.rag.lock().await;

    match rag.get_decrypted_chat_history(&user_id).await {
        Ok(history) => HttpResponse::Ok().json(history),
        Err(e) => HttpResponse::InternalServerError().json(json!({
            "error": e.to_string()
        })),
    }
}

// POST /api/chat/message
async fn send_message(
    req: web::Json<SendMessageRequest>,
    state: web::Data<AppState>,
) -> HttpResponse {
    let mut rag = state.rag.lock().await;

    let result = if req.role == "user" {
        rag.add_prompt(&req.session_id, &req.content, &req.user_id).await
    } else {
        rag.add_response(&req.session_id, &req.content, &req.user_id).await
    };

    match result {
        Ok(message_id) => HttpResponse::Ok().json(json!({
            "message_id": message_id,
            "status": "success"
        })),
        Err(e) => HttpResponse::InternalServerError().json(json!({
            "error": e.to_string()
        })),
    }
}

// POST /api/search
async fn search_messages(
    query: web::Json<SearchRequest>,
    state: web::Data<AppState>,
) -> HttpResponse {
    let rag = state.rag.lock().await;

    match rag.search_estate_data(&query.text, query.limit).await {
        Ok(results) => HttpResponse::Ok().json(results),
        Err(e) => HttpResponse::InternalServerError().json(json!({
            "error": e.to_string()
        })),
    }
}
```

### Step 4: Frontend Calls Backend REST API

```typescript
// frontend/src/api/chat.ts

interface ChatHistoryResponse {
  user_id: string;
  contexts: Record<string, {
    context_id: string;
    session_id: string;
    chat_title: string | null;
    conversations: Array<{
      prompt: Message;
      response: Message;
    }>;
  }>;
}

// Fetch chat history
export async function getChatHistory(userId: string): Promise<ChatHistoryResponse> {
  const response = await fetch(`http://localhost:8080/api/chat/history/${userId}`);
  return response.json();
}

// Send message
export async function sendMessage(
  sessionId: string,
  userId: string,
  content: string,
  role: 'user' | 'assistant'
) {
  const response = await fetch('http://localhost:8080/api/chat/message', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ session_id: sessionId, user_id: userId, content, role }),
  });
  return response.json();
}
```

### Step 5: React Component Example

```tsx
// frontend/src/components/ChatHistory.tsx

import React, { useEffect, useState } from 'react';
import { getChatHistory } from '../api/chat';

export function ChatHistory({ userId }: { userId: string }) {
  const [history, setHistory] = useState<ChatHistoryResponse | null>(null);

  useEffect(() => {
    getChatHistory(userId).then(setHistory);
  }, [userId]);

  if (!history) return <div>Loading...</div>;

  return (
    <div className="chat-history">
      {Object.values(history.contexts).map((context) => (
        <div key={context.context_id} className="conversation">
          <h3>{context.chat_title || 'Untitled Chat'}</h3>
          {context.conversations.map((conv, idx) => (
            <div key={idx} className="conversation-pair">
              <div className="user-message">{conv.prompt.content}</div>
              <div className="assistant-message">{conv.response.content}</div>
            </div>
          ))}
        </div>
      ))}
    </div>
  );
}
```

---

## Do You Need to Expose Qdrant APIs?

### Short Answer: **NO**

Your RAG module already provides high-level abstractions that handle all Qdrant interactions. Backend developers never need to interact with Qdrant directly.

### Why Not?

| If You Expose Qdrant APIs | If You Use RAG Module APIs |
|---------------------------|---------------------------|
| ❌ UI developers need to understand vectors | ✅ UI developers work with chat messages |
| ❌ Need to implement encryption in UI | ✅ Encryption handled by RAG module |
| ❌ Complex Qdrant queries in frontend | ✅ Simple REST calls to backend |
| ❌ Security risk (direct DB access) | ✅ Backend controls access |
| ❌ Tight coupling to Qdrant | ✅ Can swap Qdrant for other DB |

### What Your RAG Module Abstracts

```
┌─────────────────────────────────────────────────────────────┐
│  RAG Module High-Level APIs                                 │
│  (What backend developers use)                              │
├─────────────────────────────────────────────────────────────┤
│  - start_session(options)                                   │
│  - add_prompt(session_id, content, user_id)                 │
│  - add_response(session_id, content, user_id)               │
│  - get_session_chat_history(context_id)                     │
│  - get_decrypted_chat_history(user_id)                      │
│  - search_estate_data(query, limit)                         │
│  - get_query_response_pairs(context_id, limit)              │
└─────────────────────────────────────────────────────────────┘
                          ↓
┌─────────────────────────────────────────────────────────────┐
│  Internal RAG Module Logic                                  │
│  (Hidden from users)                                        │
├─────────────────────────────────────────────────────────────┤
│  - Encryption/Decryption (AES-256-GCM)                      │
│  - Keychain integration                                     │
│  - Vector embedding generation                              │
│  - Document ID management                                   │
│  - Metadata formatting                                      │
│  - Collection routing (chat vs estate)                      │
└─────────────────────────────────────────────────────────────┘
                          ↓
┌─────────────────────────────────────────────────────────────┐
│  Qdrant Low-Level APIs                                      │
│  (Completely abstracted away)                               │
├─────────────────────────────────────────────────────────────┤
│  - create_collection(vector_config)                         │
│  - upsert_points(collection, points)                        │
│  - search_points(collection, query_vector, limit)           │
│  - delete_points(collection, filter)                        │
│  - scroll(collection, filter, limit)                        │
└─────────────────────────────────────────────────────────────┘
```

### What Backend Developers Get

When they import your crate, they get:

✅ **High-level chat APIs** - `add_prompt()`, `get_history()`
✅ **Automatic encryption** - Transparent to the user
✅ **Session management** - Context IDs, chat titles
✅ **Semantic search** - Just pass text, get results
✅ **Mode flexibility** - Works with embedded or server Qdrant
✅ **Type safety** - Rust structs and enums

They do NOT need to know:
- How vectors are stored
- How encryption works
- Qdrant query syntax
- Collection schemas
- Vector dimensions

---

## Complete Usage Example

### Infrastructure Setup

```bash
# 1. Deploy Qdrant server on EC2 (see QDRANT_EMBEDDED_VS_SERVER.md)
docker run -p 6333:6333 -p 6334:6334 \
  -v /mnt/qdrant-data:/qdrant/storage \
  qdrant/qdrant

# 2. Note the EC2 public IP
EC2_IP="54.123.45.67"
```

### Backend Server Setup

```rust
// backend/Cargo.toml
[dependencies]
rag-module = "0.1"
actix-web = "4"
tokio = { version = "1", features = ["full"] }
serde = { version = "1", features = ["derive"] }
serde_json = "1"
env_logger = "0.10"

// backend/.env
QDRANT_URL=http://54.123.45.67:6333
RUST_LOG=info

// backend/src/main.rs
use actix_web::{web, App, HttpServer, middleware::Logger};
use rag_module::RagModule;
use std::sync::Arc;
use tokio::sync::Mutex;

struct AppState {
    rag: Arc<Mutex<RagModule>>,
}

#[actix_web::main]
async fn main() -> std::io::Result<()> {
    env_logger::init();
    dotenv::dotenv().ok();

    // RAG module auto-detects Qdrant server from QDRANT_URL env var
    let mut rag = RagModule::new("./backend-data")
        .await
        .expect("Failed to initialize RAG");
    rag.initialize().await.expect("Failed to init collections");

    let state = web::Data::new(AppState {
        rag: Arc::new(Mutex::new(rag)),
    });

    println!("🚀 Backend server running on http://0.0.0.0:8080");

    HttpServer::new(move || {
        App::new()
            .app_data(state.clone())
            .wrap(Logger::default())
            .route("/health", web::get().to(health_check))
            .route("/api/chat/history/{user_id}", web::get().to(get_chat_history))
            .route("/api/chat/message", web::post().to(send_message))
            .route("/api/chat/session", web::post().to(start_session))
            .route("/api/search", web::post().to(search))
    })
    .bind("0.0.0.0:8080")?
    .run()
    .await
}

async fn health_check() -> actix_web::HttpResponse {
    actix_web::HttpResponse::Ok().json(json!({ "status": "healthy" }))
}

// ... implement handlers ...
```

### Frontend Setup

```bash
# frontend/package.json
npm install axios react-query
```

```typescript
// frontend/src/api/client.ts
import axios from 'axios';

const API_BASE = process.env.REACT_APP_API_URL || 'http://localhost:8080';

export const chatApi = {
  getHistory: (userId: string) =>
    axios.get(`${API_BASE}/api/chat/history/${userId}`).then(r => r.data),

  sendMessage: (data: { session_id: string; user_id: string; content: string; role: string }) =>
    axios.post(`${API_BASE}/api/chat/message`, data).then(r => r.data),

  startSession: (data: { user_id: string; chat_title?: string }) =>
    axios.post(`${API_BASE}/api/chat/session`, data).then(r => r.data),
};
```

```tsx
// frontend/src/App.tsx
import { useQuery, useMutation } from 'react-query';
import { chatApi } from './api/client';

function App() {
  const userId = 'user123';

  // Fetch chat history
  const { data: history } = useQuery(['chatHistory', userId], () =>
    chatApi.getHistory(userId)
  );

  // Send message mutation
  const sendMessage = useMutation(chatApi.sendMessage);

  return (
    <div className="App">
      <h1>Chat Application</h1>
      {history && (
        <div>
          {Object.values(history.contexts).map((ctx: any) => (
            <div key={ctx.context_id}>
              <h2>{ctx.chat_title}</h2>
              {ctx.conversations.map((conv: any, i: number) => (
                <div key={i}>
                  <p><strong>You:</strong> {conv.prompt.content}</p>
                  <p><strong>AI:</strong> {conv.response.content}</p>
                </div>
              ))}
            </div>
          ))}
        </div>
      )}
    </div>
  );
}
```

### Running the Complete Stack

```bash
# Terminal 1: Qdrant server (EC2 or local Docker)
docker run -p 6333:6333 qdrant/qdrant

# Terminal 2: Backend server
cd backend
export QDRANT_URL="http://localhost:6333"
cargo run --release

# Terminal 3: Frontend
cd frontend
npm start
```

**Data Flow**:
1. User types message in React UI
2. React calls `POST /api/chat/message` on backend
3. Backend calls `rag.add_prompt()` from your crate
4. Your crate encrypts message and stores in Qdrant server
5. Backend returns success to React
6. React refreshes chat history

---

## Configuration Options

Backend developers can configure your RAG module in multiple ways:

### Option 1: Environment Variable (Recommended)

```bash
# .env file
QDRANT_URL=http://54.123.45.67:6333
ENCRYPTION_KEY_ID=my-app-encryption-key
```

```rust
// Automatically detected
let rag = RagModule::new("./data").await?;
```

### Option 2: Programmatic Configuration

```rust
// Explicit server mode
let rag = RagModule::with_qdrant_server(
    "./data",
    "http://54.123.45.67:6333"
).await?;

// Explicit embedded mode
let rag = RagModule::with_qdrant_embedded("./data").await?;
```

### Option 3: Config File

```toml
# rag-config.toml
[qdrant]
mode = "server"
url = "http://54.123.45.67:6333"

[encryption]
key_id = "my-app-key"
keychain_service = "com.mycompany.myapp"

[collections]
chat_collection = "chat_history"
estate_collection = "aws_estate"
```

```rust
let config = RagConfig::from_file("rag-config.toml")?;
let rag = RagModule::from_config(config).await?;
```

---

## Deployment Scenarios

### Scenario 1: Small Startup (Single Server)

```
┌─────────────────────────────────────────────────────────┐
│  EC2 Instance (t3.large)                                │
│                                                         │
│  ┌──────────────┐   ┌──────────────┐   ┌────────────┐ │
│  │  Backend     │   │  Qdrant      │   │  Nginx     │ │
│  │  (Port 8080) │──▶│  (Port 6333) │   │  (Port 80) │ │
│  └──────────────┘   └──────────────┘   └────────────┘ │
│                                              ▲          │
└──────────────────────────────────────────────┼──────────┘
                                               │
                                    ┌──────────┴──────────┐
                                    │  Users' Browsers    │
                                    │  (React App)        │
                                    └─────────────────────┘
```

**Setup**:
```bash
# On EC2
docker-compose up -d  # Starts Qdrant
./backend --release  # Starts backend server
nginx  # Reverse proxy
```

**Cost**: ~$50-100/month

### Scenario 2: Growing Company (Separate Services)

```
                    ┌──────────────┐
                    │  CloudFront  │
                    │  (CDN)       │
                    └──────┬───────┘
                           │
                    ┌──────┴───────┐
                    │  S3          │
                    │  (React App) │
                    └──────────────┘
                           │
                           │ API calls
                           ▼
┌────────────────────────────────────────────────────────┐
│  ECS/EKS Cluster                                       │
│                                                        │
│  ┌──────────────┐   ┌──────────────┐   ┌───────────┐ │
│  │  Backend     │   │  Backend     │   │  Backend  │ │
│  │  Replica 1   │   │  Replica 2   │   │  Replica 3│ │
│  └──────┬───────┘   └──────┬───────┘   └─────┬─────┘ │
│         │                  │                  │       │
└─────────┼──────────────────┼──────────────────┼───────┘
          │                  │                  │
          └──────────────────┴──────────────────┘
                             │
                      ┌──────┴──────┐
                      │  Qdrant     │
                      │  EC2/ECS    │
                      │  (Dedicated)│
                      └─────────────┘
```

**Setup**:
```yaml
# docker-compose.yml
version: '3.8'
services:
  backend:
    image: mycompany/backend:latest
    environment:
      QDRANT_URL: http://qdrant.internal:6333
    deploy:
      replicas: 3
```

**Cost**: ~$500-1000/month

### Scenario 3: Enterprise (High Availability)

```
                    ┌──────────────┐
                    │  CloudFront  │
                    └──────┬───────┘
                           │
                    ┌──────┴───────┐
                    │  API Gateway │
                    │  (Rate limit)│
                    └──────┬───────┘
                           │
        ┌──────────────────┴──────────────────┐
        │  Load Balancer (ALB)                │
        └──────┬────────────────────┬─────────┘
               │                    │
    ┌──────────┴────────┐  ┌────────┴──────────┐
    │  ECS/EKS          │  │  ECS/EKS          │
    │  Cluster (US-East)│  │  Cluster (US-West)│
    │                   │  │                   │
    │  Backend Replicas │  │  Backend Replicas │
    └──────────┬────────┘  └─────────┬─────────┘
               │                     │
    ┌──────────┴─────────┐  ┌────────┴─────────┐
    │  Qdrant Cluster    │  │  Qdrant Cluster  │
    │  (Multi-node)      │  │  (Multi-node)    │
    │  - Node 1          │  │  - Node 1        │
    │  - Node 2          │  │  - Node 2        │
    │  - Node 3          │  │  - Node 3        │
    └────────────────────┘  └──────────────────┘
```

**Cost**: $5,000-20,000/month

---

## FAQ

### Q1: Do I need to modify my RAG module code when switching to Qdrant server?

**A**: No, if you implement the auto-detection pattern shown above. Your examples like `complete_chat_example.rs` will work with both modes without any code changes.

### Q2: Can users run my RAG module in embedded mode for testing?

**A**: Yes! If they don't set `QDRANT_URL` environment variable, it will use embedded mode automatically.

```bash
# Embedded mode (for local development/testing)
cargo run --example complete_chat_example

# Server mode (for production)
export QDRANT_URL="http://production-qdrant:6333"
cargo run --example complete_chat_example
```

### Q3: Do frontend developers need to install Rust?

**A**: No. The frontend is JavaScript/TypeScript calling REST APIs. Only backend developers need Rust.

### Q4: Can the backend be written in Python instead of Rust?

**A**: Not directly with your Rust crate. You would need to either:
1. Create Python bindings using PyO3
2. Wrap your Rust backend in a REST API and have Python call it
3. Rewrite the RAG module in Python (not recommended)

### Q5: How do I handle authentication?

**A**: Backend developers add auth middleware in their server:

```rust
use actix_web::middleware::from_fn;

async fn auth_middleware(
    req: ServiceRequest,
    next: Next<impl MessageBody>,
) -> Result<ServiceResponse<impl MessageBody>, Error> {
    // Verify JWT token
    let token = req.headers().get("Authorization")?;
    verify_jwt(token)?;
    next.call(req).await
}

HttpServer::new(|| {
    App::new()
        .wrap(from_fn(auth_middleware))
        .route("/api/chat/history/{user_id}", web::get().to(get_chat_history))
})
```

This is NOT part of your RAG module - it's the backend developer's responsibility.

### Q6: What if users want to use a different vector database (not Qdrant)?

**A**: Thanks to your `VectorStore` trait abstraction, they can implement:

```rust
impl VectorStore for PineconeVectorStore { ... }
impl VectorStore for WeaviateVectorStore { ... }
impl VectorStore for PostgresVectorStore { ... }
```

And pass it to your RAG module if you expose a constructor that accepts `Box<dyn VectorStore>`.

### Q7: How do I version the API when publishing updates?

**A**: Use semantic versioning:

- `0.1.0` → `0.1.1`: Bug fixes (backwards compatible)
- `0.1.0` → `0.2.0`: New features (backwards compatible)
- `0.1.0` → `1.0.0`: Breaking changes

```toml
# Users specify version constraints
[dependencies]
rag-module = "0.1"  # Allow 0.1.x updates
# or
rag-module = "0.1.5"  # Exact version
```

### Q8: Should I expose the encryption keys in my API?

**A**: No. Your RAG module handles encryption internally using macOS Keychain. Users don't need to manage keys directly. For cross-platform support, you might add a configuration option:

```rust
pub struct EncryptionConfig {
    pub key_id: String,
    pub keychain_service: Option<String>,  // macOS only
    pub key_file_path: Option<PathBuf>,    // Linux/Windows
}
```

### Q9: What about rate limiting and quotas?

**A**: That's handled by the backend developer, not your RAG module:

```rust
use actix_governor::{Governor, GovernorConfigBuilder};

let governor_conf = GovernorConfigBuilder::default()
    .per_second(10)
    .burst_size(20)
    .finish()
    .unwrap();

HttpServer::new(|| {
    App::new()
        .wrap(Governor::new(&governor_conf))
        .route("/api/chat/message", web::post().to(send_message))
})
```

### Q10: How do I test the published crate before releasing?

**A**: Use local path dependencies:

```toml
# Backend's Cargo.toml for testing
[dependencies]
rag-module = { path = "../rag-module" }
```

Or publish to a test registry:
```bash
cargo publish --registry my-test-registry
```

---

## Summary

### What You Publish
- **Cargo crate** with high-level RAG APIs
- **Documentation** (README, docs.rs, examples)
- **Dual mode support** (embedded/server)
- **Encryption** (built-in, transparent)

### What Users Provide
- **Backend server** (Actix/Axum/Rocket)
- **REST endpoints** for their specific UI needs
- **Authentication** and authorization logic
- **Business logic** around chat features

### What's Separate
- **Qdrant server** (deployed on EC2/Docker/K8s)
- **Frontend** (React/Vue/Angular calling backend REST APIs)

### Key Takeaway
**You do NOT expose Qdrant APIs**. Your RAG module is the abstraction layer that backend developers use to build REST APIs for their frontends. The architecture is:

```
UI → Backend (REST) → Your RAG Module → Qdrant
```

NOT:

```
UI → Qdrant (❌ Never do this)
```

---

## Next Steps

1. ✅ Implement `QdrantClientVectorStore` with auto-detection
2. ✅ Test both modes thoroughly
3. ✅ Write comprehensive README
4. ✅ Add examples for common use cases
5. ✅ Publish to crates.io
6. ✅ Create example backend server (optional but helpful)
7. ✅ Write migration guide for embedded → server conversion

---

## Additional Resources

- See `docs/QDRANT_EMBEDDED_VS_SERVER.md` for detailed comparison
- See `docs/UI_HANDOFF_get_decrypted_chat_history.md` for API specs
- See `examples/complete_chat_example.rs` for usage patterns
- See Qdrant docs: https://qdrant.tech/documentation/
- See crates.io publishing guide: https://doc.rust-lang.org/cargo/reference/publishing.html

---

**Document Version**: 1.0
**Last Updated**: 2025-11-03
**Author**: RAG Module Development Team