# Sanctum Deployment Guide
This guide covers deployment scenarios for Sanctum's production-ready Qdrant adapter across various environments.
## Table of Contents
- [Prerequisites](#prerequisites)
- [Local Development](#local-development)
- [Docker Compose](#docker-compose)
- [Kubernetes](#kubernetes)
- [Cloud Deployments](#cloud-deployments)
- [Production Best Practices](#production-best-practices)
- [Monitoring](#monitoring)
- [Backup and Recovery](#backup-and-recovery)
## Prerequisites
### For Qdrant Deployment
- Docker 20.10+ (for Docker deployments)
- Kubernetes 1.21+ (for K8s deployments)
- Minimum 2GB RAM for Qdrant
- Sufficient disk space (estimate ~1KB per vector with 1536 dimensions)
### Resource Estimation
| 10,000 | 1536 | ~15 MB | 512 MB |
| 100,000 | 1536 | ~150 MB | 1 GB |
| 1,000,000 | 1536 | ~1.5 GB | 4 GB |
| 10,000,000 | 1536 | ~15 GB | 16 GB |
## Local Development
### Using InMemory Adapter
The simplest option for development - no infrastructure needed:
```yaml
# config.yml
sanctum:
enabled: true
adapter_type: "in_memory"
```
```rust
use paladin::infrastructure::adapters::sanctum::InMemorySanctum;
#[tokio::main]
async fn main() {
let sanctum = InMemorySanctum::new();
// Ready to use immediately
}
```
### Local Qdrant Instance
For testing Qdrant locally:
```bash
# Pull and run Qdrant
docker run -p 6333:6333 -p 6334:6334 \
-v $(pwd)/qdrant_storage:/qdrant/storage \
qdrant/qdrant:latest
```
```yaml
# config.yml
sanctum:
enabled: true
adapter_type: "qdrant"
qdrant:
url: "http://localhost:6334"
collection_name: "dev_memories"
vector_dimension: 1536
```
Access Qdrant dashboard at: http://localhost:6333/dashboard
## Docker Compose
### Basic Setup
```yaml
# docker-compose.yml
version: '3.8'
services:
qdrant:
image: qdrant/qdrant:v1.7.4
container_name: paladin-qdrant
ports:
- "6333:6333" # HTTP API
- "6334:6334" # gRPC API
volumes:
- qdrant_data:/qdrant/storage
environment:
QDRANT__SERVICE__HTTP_PORT: 6333
QDRANT__SERVICE__GRPC_PORT: 6334
restart: unless-stopped
paladin:
build: .
container_name: paladin-app
depends_on:
- qdrant
environment:
APP_SANCTUM_ENABLED: "true"
APP_SANCTUM_ADAPTER_TYPE: "qdrant"
APP_SANCTUM_QDRANT_URL: "http://qdrant:6334"
APP_SANCTUM_QDRANT_COLLECTION_NAME: "paladin_memories"
APP_SANCTUM_QDRANT_VECTOR_DIMENSION: "1536"
volumes:
- ./config.yml:/app/config.yml
restart: unless-stopped
volumes:
qdrant_data:
driver: local
```
Start services:
```bash
docker-compose up -d
```
Verify Qdrant health:
```bash
curl http://localhost:6333/health
```
### Production Docker Compose
Enhanced with resource limits and monitoring:
```yaml
# docker-compose.prod.yml
version: '3.8'
services:
qdrant:
image: qdrant/qdrant:v1.7.4
container_name: paladin-qdrant-prod
ports:
- "6333:6333"
- "6334:6334"
volumes:
- qdrant_data:/qdrant/storage
- ./qdrant-config.yaml:/qdrant/config/production.yaml
environment:
QDRANT__SERVICE__HTTP_PORT: 6333
QDRANT__SERVICE__GRPC_PORT: 6334
QDRANT__LOG_LEVEL: INFO
deploy:
resources:
limits:
cpus: '4'
memory: 8G
reservations:
cpus: '2'
memory: 4G
restart: always
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:6333/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
paladin:
build:
context: .
dockerfile: Dockerfile.prod
container_name: paladin-app-prod
depends_on:
qdrant:
condition: service_healthy
environment:
APP_SANCTUM_ENABLED: "true"
APP_SANCTUM_ADAPTER_TYPE: "qdrant"
APP_SANCTUM_QDRANT_URL: "http://qdrant:6334"
APP_SANCTUM_QDRANT_COLLECTION_NAME: "production_memories"
APP_SANCTUM_QDRANT_VECTOR_DIMENSION: "1536"
RUST_LOG: "info,paladin=debug"
volumes:
- ./config.prod.yml:/app/config.yml:ro
deploy:
resources:
limits:
cpus: '2'
memory: 4G
reservations:
cpus: '1'
memory: 2G
restart: always
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 3
volumes:
qdrant_data:
driver: local
```
## Kubernetes
### Qdrant StatefulSet
```yaml
# k8s/qdrant-statefulset.yaml
apiVersion: v1
kind: Service
metadata:
name: qdrant
namespace: paladin
spec:
selector:
app: qdrant
ports:
- name: http
port: 6333
targetPort: 6333
- name: grpc
port: 6334
targetPort: 6334
clusterIP: None
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: qdrant
namespace: paladin
spec:
serviceName: qdrant
replicas: 1
selector:
matchLabels:
app: qdrant
template:
metadata:
labels:
app: qdrant
spec:
containers:
- name: qdrant
image: qdrant/qdrant:v1.7.4
ports:
- containerPort: 6333
name: http
- containerPort: 6334
name: grpc
env:
- name: QDRANT__SERVICE__HTTP_PORT
value: "6333"
- name: QDRANT__SERVICE__GRPC_PORT
value: "6334"
- name: QDRANT__LOG_LEVEL
value: "INFO"
volumeMounts:
- name: qdrant-storage
mountPath: /qdrant/storage
resources:
requests:
memory: "2Gi"
cpu: "500m"
limits:
memory: "8Gi"
cpu: "4000m"
livenessProbe:
httpGet:
path: /health
port: 6333
initialDelaySeconds: 30
periodSeconds: 30
readinessProbe:
httpGet:
path: /readyz
port: 6333
initialDelaySeconds: 10
periodSeconds: 5
volumeClaimTemplates:
- metadata:
name: qdrant-storage
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: "standard"
resources:
requests:
storage: 50Gi
```
### Paladin Deployment
```yaml
# k8s/paladin-deployment.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: paladin-config
namespace: paladin
data:
config.yml: |
sanctum:
enabled: true
adapter_type: "qdrant"
qdrant:
url: "http://qdrant:6334"
collection_name: "k8s_memories"
vector_dimension: 1536
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: paladin
namespace: paladin
spec:
replicas: 3
selector:
matchLabels:
app: paladin
template:
metadata:
labels:
app: paladin
spec:
containers:
- name: paladin
image: paladin:latest
ports:
- containerPort: 8080
env:
- name: APP_SANCTUM_ENABLED
value: "true"
- name: APP_SANCTUM_ADAPTER_TYPE
value: "qdrant"
- name: APP_SANCTUM_QDRANT_URL
value: "http://qdrant:6334"
- name: APP_SANCTUM_QDRANT_COLLECTION_NAME
value: "k8s_memories"
- name: APP_SANCTUM_QDRANT_VECTOR_DIMENSION
value: "1536"
volumeMounts:
- name: config
mountPath: /app/config.yml
subPath: config.yml
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "4Gi"
cpu: "2000m"
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 30
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
volumes:
- name: config
configMap:
name: paladin-config
```
Deploy to Kubernetes:
```bash
# Create namespace
kubectl create namespace paladin
# Apply configurations
kubectl apply -f k8s/qdrant-statefulset.yaml
kubectl apply -f k8s/paladin-deployment.yaml
# Verify deployment
kubectl get pods -n paladin
kubectl logs -n paladin -l app=paladin
```
## Cloud Deployments
### AWS (EKS + Qdrant)
#### Option 1: Self-Hosted on EKS
Use the Kubernetes manifests above with EKS-specific storage class:
```yaml
# Use AWS EBS for storage
volumeClaimTemplates:
- metadata:
name: qdrant-storage
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: "gp3" # AWS EBS GP3
resources:
requests:
storage: 100Gi
```
#### Option 2: Qdrant Cloud
```yaml
# config.yml
sanctum:
enabled: true
adapter_type: "qdrant"
qdrant:
url: "https://your-cluster.qdrant.io:6334"
collection_name: "aws_memories"
vector_dimension: 1536
```
Set API key via environment:
```bash
export QDRANT_API_KEY=your_api_key_here
```
### GCP (GKE + Qdrant)
Use GCP persistent disk:
```yaml
volumeClaimTemplates:
- metadata:
name: qdrant-storage
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: "standard-rwo" # GCP persistent disk
resources:
requests:
storage: 100Gi
```
### Azure (AKS + Qdrant)
Use Azure managed disk:
```yaml
volumeClaimTemplates:
- metadata:
name: qdrant-storage
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: "managed-premium" # Azure premium SSD
resources:
requests:
storage: 100Gi
```
## Production Best Practices
### 1. High Availability
**Qdrant Cluster Mode** (v1.2.0+):
```yaml
# qdrant-config.yaml
cluster:
enabled: true
consensus:
tick_period_ms: 100
p2p:
port: 6335
```
Deploy multiple Qdrant replicas:
```yaml
replicas: 3 # Minimum for HA
```
### 2. Resource Allocation
**CPU Guidelines**:
- Development: 0.5-1 CPU
- Production: 2-4 CPUs
- High load: 4-8 CPUs
**Memory Guidelines**:
- Base: 2 GB + (vectors * dimension * 4 bytes)
- Example: 1M vectors × 1536 dim = ~6 GB + 2 GB buffer = 8 GB
**Storage**:
- Use SSD for production (NVMe preferred)
- Plan for 2x growth capacity
- Enable compression (built into Qdrant)
### 3. Network Configuration
**Firewall Rules**:
- Port 6333: HTTP API (internal only)
- Port 6334: gRPC API (application access)
- Port 6335: P2P cluster communication (Qdrant cluster only)
**TLS Configuration**:
```yaml
service:
http_port: 6333
grpc_port: 6334
enable_tls: true
tls_cert: /path/to/cert.pem
tls_key: /path/to/key.pem
```
### 4. Collection Configuration
**Optimal Settings**:
```rust
use qdrant_client::prelude::*;
// Configure collection for production
let collection_config = CreateCollection {
collection_name: "production_memories".to_string(),
vectors_config: Some(VectorsConfig {
params: Some(VectorParams {
size: 1536,
distance: Distance::Cosine,
hnsw_config: Some(HnswConfig {
m: 16, // Number of edges per node (higher = better recall, more memory)
ef_construct: 200, // Build-time accuracy (higher = better quality, slower build)
full_scan_threshold: 10000,
}),
quantization_config: Some(QuantizationConfig {
scalar: Some(ScalarQuantization {
type_: ScalarType::Int8, // Reduce memory by 4x
quantile: 0.99,
always_ram: true,
}),
}),
on_disk: false, // Keep vectors in RAM for speed
}),
}),
// ... other settings
};
```
### 5. Security
**Authentication**:
```yaml
# qdrant-config.yaml
service:
api_key: ${QDRANT_API_KEY} # Use environment variable
```
**Network Policies (Kubernetes)**:
```yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: qdrant-network-policy
namespace: paladin
spec:
podSelector:
matchLabels:
app: qdrant
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app: paladin
ports:
- protocol: TCP
port: 6334
```
### 6. Backup Strategy
**Automated Snapshots**:
```bash
# Create snapshot
curl -X POST 'http://localhost:6333/collections/paladin_memories/snapshots'
# List snapshots
curl 'http://localhost:6333/collections/paladin_memories/snapshots'
# Download snapshot
curl -O 'http://localhost:6333/collections/paladin_memories/snapshots/snapshot-2024-01-30.snapshot'
```
**Kubernetes CronJob**:
```yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: qdrant-backup
namespace: paladin
spec:
schedule: "0 2 * * *" # Daily at 2 AM
jobTemplate:
spec:
template:
spec:
containers:
- name: backup
image: curlimages/curl:latest
command:
- sh
- -c
- |
curl -X POST http://qdrant:6333/collections/paladin_memories/snapshots
# Upload to S3/GCS/Azure Storage
restartPolicy: OnFailure
```
## Monitoring
### Metrics to Track
**Qdrant Metrics**:
- Collection size (number of vectors)
- Search latency (p50, p95, p99)
- Memory usage
- CPU utilization
- Disk I/O
**Application Metrics**:
- Store operation latency
- Search operation latency
- Error rates
- Cache hit rates
### Prometheus Integration
```yaml
# prometheus-config.yaml
scrape_configs:
- job_name: 'qdrant'
static_configs:
- targets: ['qdrant:6333']
metrics_path: '/metrics'
```
### Grafana Dashboard
Key panels:
1. **Search Performance**: p95 latency over time
2. **Storage Growth**: Collection size trend
3. **Resource Usage**: CPU/Memory utilization
4. **Error Rates**: Failed operations per minute
## Backup and Recovery
### Full Backup
```bash
#!/bin/bash
# backup-qdrant.sh
COLLECTION="paladin_memories"
BACKUP_DIR="/backups/$(date +%Y%m%d)"
QDRANT_URL="http://localhost:6333"
# Create snapshot
SNAPSHOT=$(curl -s -X POST "${QDRANT_URL}/collections/${COLLECTION}/snapshots" | jq -r '.result.name')
# Download snapshot
curl -o "${BACKUP_DIR}/${SNAPSHOT}" \
"${QDRANT_URL}/collections/${COLLECTION}/snapshots/${SNAPSHOT}"
# Upload to S3
aws s3 cp "${BACKUP_DIR}/${SNAPSHOT}" \
"s3://paladin-backups/qdrant/${COLLECTION}/${SNAPSHOT}"
```
### Restore from Backup
```bash
#!/bin/bash
# restore-qdrant.sh
COLLECTION="paladin_memories"
SNAPSHOT_FILE="$1"
QDRANT_URL="http://localhost:6333"
# Upload snapshot to Qdrant
curl -X POST "${QDRANT_URL}/collections/${COLLECTION}/snapshots/upload" \
-F "snapshot=@${SNAPSHOT_FILE}"
# Restore from snapshot
curl -X PUT "${QDRANT_URL}/collections/${COLLECTION}/snapshots/recover" \
-H "Content-Type: application/json" \
-d "{\"location\": \"${SNAPSHOT_FILE}\"}"
```
### Disaster Recovery Plan
1. **Regular Backups**: Daily automated snapshots
2. **Off-site Storage**: Copy to cloud storage (S3/GCS/Azure)
3. **Test Restores**: Monthly restore validation
4. **RPO/RTO**: Define acceptable data loss and recovery time
5. **Runbook**: Document recovery procedures
## Troubleshooting
### High Memory Usage
**Symptoms**: OOM kills, swapping
**Solutions**:
1. Enable quantization to reduce memory 4x:
```rust
quantization_config: Some(QuantizationConfig {
scalar: Some(ScalarQuantization {
type_: ScalarType::Int8,
}),
})
```
2. Move vectors to disk:
```rust
on_disk: true ```
3. Increase node resources
### Slow Search Performance
**Symptoms**: Search > 500ms consistently
**Solutions**:
1. Increase HNSW ef parameter:
```rust
ef_construct: 200 ```
2. Tune search parameters:
```rust
search_params: Some(SearchParams {
hnsw_ef: Some(128), exact: false,
})
```
3. Add filters to reduce search space
### Connection Timeouts
**Symptoms**: "Failed to connect to Qdrant"
**Solutions**:
1. Verify Qdrant is running:
```bash
curl http://localhost:6333/health
```
2. Check network connectivity:
```bash
telnet qdrant 6334
```
3. Increase timeouts:
```rust
QdrantClient::builder()
.with_timeout(Duration::from_secs(30))
.build()
```
## Cost Optimization
### Resource Right-Sizing
**Start Small**:
- 2 GB RAM for <100K vectors
- 4 GB RAM for <1M vectors
- Scale based on metrics
### Storage Optimization
**Techniques**:
1. **Quantization**: Reduce memory/storage by 75%
2. **Compression**: Built into Qdrant (ZSTD)
3. **Pruning**: Delete old/unused memories
### Cloud Cost Management
**Tips**:
- Use spot/preemptible instances for non-critical workloads
- Scale down non-prod environments off-hours
- Use Qdrant Cloud for predictable costs
- Monitor and set budget alerts
---
**Next Steps**:
- [Migration Guide](sanctum-migration.md)
- [Main Documentation](../user-guides/sanctum-vector-memory.md)
- [Performance Tuning](../operations/performance-tuning.md)