vecboost 0.1.2

High-performance embedding vector service written in Rust
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
<div align="center">

<img src="image/vecboost.png" alt="VecBoost Logo" width="200"/>

[![Rust 2024](https://img.shields.io/badge/Rust-2024-edded?logo=rust&style=for-the-badge)](https://www.rust-lang.org/) [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg?style=for-the-badge)](https://opensource.org/licenses/MIT) [![Version 0.1.0](https://img.shields.io/badge/Version-0.1.0-green.svg?style=for-the-badge)](https://github.com/Kirky-X/vecboost) [![Rustc 1.75+](https://img.shields.io/badge/Rustc-1.75+-orange.svg?style=for-the-badge)](https://www.rust-lang.org/)

*A high-performance, production-ready embedding vector service written in Rust. VecBoost provides efficient text vectorization with support for multiple inference engines, GPU acceleration, and enterprise-grade features.*

</div>

---

## ✨ Core Features

| Category | Features |
|----------|----------|
| **🚀 Performance** | Optimized Rust codebase with batch processing and concurrent request handling |
| **🔧 Multi-Engine** | Support for Candle (native Rust) and ONNX Runtime inference engines |
| **🎮 GPU Support** | Native CUDA (NVIDIA), Metal (Apple Silicon), and ROCm (AMD) acceleration |
| **📊 Smart Caching** | Multi-tier caching with LRU, LFU, ARC, and KV cache strategies |
| **🔐 Enterprise Security** | JWT authentication, CSRF protection, role-based access control, and audit logging |
| **⚡ Rate Limiting** | Configurable rate limiting with token bucket algorithm (global/IP/user/API key) |
| **📈 Priority Queue** | Request prioritization with configurable priority weights and weighted fair queuing |
| **🌐 Dual APIs** | gRPC and HTTP/REST interfaces with OpenAPI/Swagger documentation |
| **📦 Cloud Ready** | Production deployment configurations for Kubernetes, Docker, and cloud platforms |
| **📈 Observability** | Prometheus metrics, health checks, structured logging, and Grafana dashboards |

> **💡 Quick Start**: Get up and running in 2 minutes! [See Quick Start]#-quick-start

## 🚀 Quick Start

### 📋 Prerequisites

| Requirement | Version | Description |
|-------------|---------|-------------|
| **Rust** | 1.75+ | Edition 2024 required |
| **Cargo** | 1.75+ | Comes with Rust |
| **CUDA Toolkit** | 12.x | Optional, for NVIDIA GPU support |
| **Metal SDK** | Latest | Optional, for Apple Silicon GPU |

> **💡 Tip**: Run `rustc --version` to verify your Rust installation.

### 🔧 Installation

```bash
# 1. Clone the repository
git clone https://github.com/Kirky-X/vecboost.git
cd vecboost

# 2. Build with default features (CPU only)
cargo build --release

# 3. Build with GPU support
#    Linux (CUDA):
cargo build --release --features cuda

#    macOS (Metal):
cargo build --release --features metal

# 4. Build with all features enabled
cargo build --release --features cuda,onnx,grpc,auth,redis
```

### ⚙️ Configuration

```bash
# Copy and customize the configuration
cp config.toml config_custom.toml
# Edit config_custom.toml with your settings
```

### ▶️ Running

```bash
# Run with default configuration
./target/release/vecboost

# Run with custom configuration
./target/release/vecboost --config config_custom.toml
```

> **✅ Success**: The service will start on `http://localhost:9002` by default.

### 🐳 Docker

```bash
# Build the image
docker build -t vecboost:latest .

# Run the container
docker run -p 9002:9002 -p 50051:50051 \
  -v $(pwd)/config.toml:/app/config.toml \
  -v $(pwd)/models:/app/models \
  vecboost:latest
```

## 📖 Documentation

| Document | Description | Link |
|----------|-------------|------|
| **📋 User Guide** | Detailed usage instructions, configuration, and deployment | [USER_GUIDE.md]USER_GUIDE.md |
| **🔌 API Reference** | Complete REST API and gRPC documentation | [API_REFERENCE.md]API_REFERENCE.md |
| **🏗️ Architecture** | System design, components, and data flow | [ARCHITECTURE.md]ARCHITECTURE.md |
| **🤝 Contributing** | Contribution guidelines and best practices | [docs/CONTRIBUTING.md]docs/CONTRIBUTING.md |

## 🔌 API Usage

### 🌐 HTTP REST API

**Generate embeddings via HTTP:**

```bash
curl -X POST http://localhost:9002/api/v1/embed \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello, world!"}'
```

**Response:**

```json
{
  "embedding": [0.123, 0.456, 0.789, ...],
  "dimension": 1024,
  "processing_time_ms": 15.5
}
```

### 📡 gRPC API

The service exposes a gRPC interface on port `50051` (configurable):

```protobuf
service EmbeddingService {
  // Single text embedding
  rpc Embed(EmbedRequest) returns (EmbedResponse);

  // Batch text embeddings
  rpc EmbedBatch(BatchEmbedRequest) returns (BatchEmbedResponse);

  // Compute similarity between vectors
  rpc ComputeSimilarity(SimilarityRequest) returns (SimilarityResponse);
}
```

### 📚 OpenAPI Documentation

Access interactive API documentation:

| Tool | URL |
|------|-----|
| **Swagger UI** | `http://localhost:9002/swagger-ui/` |
| **ReDoc** | `http://localhost:9002/redoc/` |

## ⚙️ Configuration

### Key Configuration Options

```toml
[server]
host = "0.0.0.0"
port = 9002

[model]
model_repo = "BAAI/bge-m3"  # HuggingFace model ID
use_gpu = true
batch_size = 32
expected_dimension = 1024

[embedding]
cache_enabled = true
cache_size = 1024

[auth]
enabled = true
jwt_secret = "your-secret-key"
```

| Section | Key | Default | Description |
|---------|-----|---------|-------------|
| **server** | `host` | `"0.0.0.0"` | Bind address |
| | `port` | `9002` | HTTP server port |
| **model** | `model_repo` | `"BAAI/bge-m3"` | HuggingFace model ID |
| | `use_gpu` | `false` | Enable GPU acceleration |
| | `batch_size` | `32` | Batch processing size |
| **embedding** | `cache_enabled` | `true` | Enable response caching |
| | `cache_size` | `1024` | Maximum cache entries |
| **auth** | `enabled` | `false` | Enable authentication |
| | `jwt_secret` | - | JWT signing secret |

> **📖 Full Configuration**: See [`config.toml`]config.toml for all available options.

## 🏗️ Architecture

```mermaid
graph TB
    subgraph Client_Layer["Client Layer"]
        Client[Client Requests]
    end

    subgraph Gateway["Gateway Layer"]
        HTTP["HTTP/gRPC Endpoints"]
        Auth["Auth (JWT/CSRF)"]
        RateLim["Rate Limiting"]
    end

    subgraph Pipeline["Request Pipeline"]
        Queue["Priority Queue"]
        Workers["Request Workers"]
        Response["Response Channel"]
    end

    subgraph Service["Embedding Service"]
        Text["Text Chunking"]
        Engine["Inference Engine"]
        Cache["Vector Cache LRU/LFU/ARC/KV"]
    end

    subgraph Engine["Inference Engine"]
        Candle["Candle (Native Rust)"]
        ONNX["ONNX Runtime"]
    end

    subgraph Device["Compute Devices"]
        CPU["CPU"]
        CUDA["CUDA GPU"]
        Metal["Metal GPU"]
    end

    Client --> HTTP
    HTTP --> Auth
    HTTP --> RateLim
    Auth --> Queue
    RateLim --> Queue

    Queue --> Workers
    Workers --> Response

    Text --> Engine
    Engine --> Cache

    Engine --> Candle
    Engine --> ONNX

    Candle --> CPU
    Candle --> CUDA
    ONNX --> CPU
    ONNX --> Metal
```

## 📦 Project Structure

```
vecboost/
├── src/                          # Core source code
│   ├── audit/          # Audit logging & compliance
│   ├── auth/           # Authentication (JWT, CSRF, RBAC)
│   ├── cache/          # Multi-tier caching (LRU, LFU, ARC, KV)
│   ├── config/         # Configuration management
│   ├── device/         # Device management (CPU, CUDA, Metal, ROCm)
│   ├── engine/         # Inference engines (Candle, ONNX Runtime)
│   ├── grpc/           # gRPC server & protocol
│   ├── metrics/        # Prometheus metrics & observability
│   ├── model/          # Model downloading, loading & recovery
│   ├── pipeline/       # Request pipeline, priority & scheduling
│   ├── rate_limit/     # Rate limiting (token bucket, sliding window)
│   ├── routes/         # HTTP routes & handlers
│   ├── security/       # Security utilities (encryption, sanitization)
│   ├── service/        # Core embedding service & business logic
│   └── text/           # Text processing (chunking, tokenization)
├── examples/           # Example programs
│   └── gpu/            # GPU-specific examples & benchmarks
├── proto/              # gRPC protocol definitions (`.proto` files)
├── deployments/        # Kubernetes & Docker deployment configs
├── tests/              # Integration & performance tests
└── config.toml         # Default configuration file
```

## 🎯 Performance Benchmarks

| Metric | CPU | GPU (CUDA) | Notes |
|--------|-----|------------|-------|
| **Embedding Dimension** | Up to 4096 | Up to 4096 | Model dependent |
| **Max Batch Size** | 64 | 256 | Memory dependent |
| **Requests/Second** | 1,000+ | 10,000+ | Throughput |
| **Latency (p50)** | < 25ms | < 5ms | Single request |
| **Latency (p99)** | < 100ms | < 50ms | Single request |
| **Cache Hit Ratio** | > 90% | > 90% | With 1024 entries |

### 🚀 Optimization Features

- **⚡ Batch Processing**: Dynamic batching with configurable wait timeout
- **💾 Memory Pool**: Pre-allocated tensor buffers to reduce allocation overhead
- **🔄 Zero-Copy**: Shared references where possible
- **📊 Adaptive Batching**: Automatic batch size adjustment based on load

## 🔒 Security Features

| Layer | Feature | Description |
|-------|---------|-------------|
| **🔐 Authentication** | JWT Tokens | Configurable expiration, refresh tokens |
| **👥 Authorization** | Role-Based Access | User tiers: free, basic, pro, enterprise |
| **📝 Audit Logging** | Request Tracking | User, action, resource, IP, timestamp |
| **⚡ Rate Limiting** | Multi-Layer | Global, per-IP, per-user, per-API key |
| **🔒 Encryption** | AES-256-GCM | Sensitive data at rest |
| **🛡️ Input Sanitization** | XSS/CSRF Protection | Request validation & sanitization |

> **⚠️ Security Best Practice**: Always use HTTPS in production and rotate JWT secrets regularly.

## 📈 Observability

| Tool | Endpoint | Description |
|------|----------|-------------|
| **Prometheus** | `/metrics` | Metrics endpoint for Prometheus scraping |
| **Health Check** | `/health` | Service liveness and readiness probe |
| **Detailed Health** | `/health/detailed` | Full health status with component checks |
| **OpenAPI Docs** | `/swagger-ui/` | Interactive Swagger UI documentation |
| **Grafana** | - | Pre-configured dashboards in `deployments/` |

### 📊 Key Metrics

- `vecboost_requests_total` - Total request count by endpoint
- `vecboost_embedding_latency_seconds` - Embedding generation latency
- `vecboost_cache_hit_ratio` - Cache hit ratio percentage
- `vecboost_batch_size` - Current batch processing size
- `vecboost_gpu_memory_bytes` - GPU memory usage

## 🚀 Deployment Options

### ☸️ Kubernetes

```bash
# Deploy to Kubernetes
kubectl apply -f deployments/kubernetes/

# Deploy with GPU support
kubectl apply -f deployments/kubernetes/gpu-deployment.yaml

# View deployment status
kubectl get pods -n vecboost
```

| Resource | Description |
|----------|-------------|
| `configmap.yaml` | Configuration as code |
| `deployment.yaml` | Main deployment manifest |
| `gpu-deployment.yaml` | GPU node selector deployment |
| `hpa.yaml` | Horizontal Pod Autoscaler |
| `model-cache.yaml` | Persistent volume for model caching |
| `service.yaml` | Cluster IP service |

> **📖 Full Guide**: See [Deployment Guide]deployments/kubernetes/README.md for detailed instructions.

### 🐳 Docker Compose

```yaml
version: '3.8'

services:
  vecboost:
    image: vecboost:latest
    ports:
      - "9002:9002"    # HTTP API
      - "50051:50051"  # gRPC
      - "9090:9090"    # Prometheus metrics
    volumes:
      - ./config.toml:/app/config.toml
      - ./models:/app/models
      - ./logs:/app/logs
    environment:
      - VECBOOST_JWT_SECRET=${JWT_SECRET}
      - VECBOOST_LOG_LEVEL=info
    restart: unless-stopped
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
```

## 🤝 Contributing

Contributions are welcome! Please read our [Contributing Guide](docs/CONTRIBUTING.md) for details.

### 🛠️ Development Setup

```bash
# Install development dependencies
cargo install cargo-audit cargo-clippy cargo fmt

# Run tests
cargo test --all-features

# Run linter
cargo clippy --all-targets --all-features -- -D warnings

# Format code
cargo fmt --all
```

## 📄 License

This project is licensed under the **MIT License** - see the [LICENSE](LICENSE) file for details.

## 🙏 Acknowledgments

| Project | Description | Link |
|---------|-------------|------|
| **Candle** | Native Rust ML framework | [GitHub]https://github.com/huggingface/candle |
| **ONNX Runtime** | Cross-platform ML inference runtime | [Website]https://onnxruntime.ai/ |
| **Hugging Face Hub** | Model repository and distribution | [Website]https://huggingface.co/models |
| **Axum** | Ergonomic web framework for Rust | [GitHub]https://github.com/tokio-rs/axum |
| **Tonic** | gRPC implementation for Rust | [GitHub]https://github.com/hyperium/tonic |

---

<div align="center">

**⭐ Star us on GitHub if you find VecBoost useful!**

[![GitHub stars](https://img.shields.io/github/stars/Kirky-X/vecboost?style=social)](https://github.com/Kirky-X/vecboost)

</div>