vecboost 0.1.0

High-performance embedding vector service written in Rust
# VecBoost

<p align="left">
    <img src="https://img.shields.io/badge/Rust-2024-edded?logo=rust&style=flat-square" alt="Rust Edition">
    <img src="https://img.shields.io/badge/License-MIT-blue.svg?style=flat-square" alt="MIT License">
    <img src="https://img.shields.io/badge/Version-0.1.0-green.svg?style=flat-square" alt="Version">
</p>

A high-performance, production-ready embedding vector service written in Rust. VecBoost provides efficient text vectorization with support for multiple inference engines, GPU acceleration, and enterprise-grade features.

## โœจ Features

- **๐Ÿš€ High Performance**: Optimized Rust codebase with batch processing and concurrent request handling
- **๐Ÿ”ง Multiple Engines**: Support for Candle (native Rust) and ONNX Runtime inference engines
- **๐ŸŽฎ GPU Acceleration**: Native CUDA support (NVIDIA) and Metal support (Apple Silicon)
- **๐Ÿ“Š Smart Caching**: Multi-tier caching with LRU, LFU, and KV cache strategies
- **๐Ÿ” Enterprise Security**: JWT authentication, CSRF protection, and audit logging
- **โšก Rate Limiting**: Configurable rate limiting with token bucket algorithm
- **๐Ÿ“ˆ Priority Queue**: Request prioritization with configurable priority weights
- **๐ŸŒ Dual APIs**: gRPC and HTTP/REST interfaces with OpenAPI documentation
- **๐Ÿ“ฆ Kubernetes Ready**: Production deployment configurations included

## ๐Ÿš€ Quick Start

### Prerequisites

- Rust 1.75+ (edition 2024)
- CUDA Toolkit 12.x (for GPU support on Linux)
- Metal (for GPU support on macOS)

### Installation

```bash
# Clone the repository
git clone https://github.com/Kirky-X/vecboost.git
cd vecboost

# Build with default features (CPU only)
cargo build --release

# Build with CUDA support (Linux)
cargo build --release --features cuda

# Build with Metal support (macOS)
cargo build --release --features metal

# Build with all features
cargo build --release --features cuda,onnx,grpc,auth,redis
```

### Configuration

Copy the example configuration and customize:

```bash
cp config.toml config_custom.toml
# Edit config_custom.toml with your settings
```

### Running

```bash
# Run with default configuration
./target/release/vecboost

# Run with custom configuration
./target/release/vecboost --config config_custom.toml
```

The service will start on `http://localhost:9002` by default.

### Docker

```bash
# Build the image
docker build -t vecboost:latest .

# Run the container
docker run -p 9002:9002 -v $(pwd)/config.toml:/app/config.toml vecboost:latest
```

## ๐Ÿ“– Documentation

- [๐Ÿ“‹ User Guide]USER_GUIDE.md - Detailed usage instructions
- [๐Ÿ”Œ API Reference]API_REFERENCE.md - REST API and gRPC documentation
- [๐Ÿ—๏ธ Architecture]ARCHITECTURE.md - System design and components
- [๐Ÿค Contributing]docs/CONTRIBUTING.md - Contribution guidelines

## ๐Ÿ”Œ API Usage

### HTTP REST API

Generate embeddings via HTTP:

```bash
curl -X POST http://localhost:9002/api/v1/embed \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello, world!"}'
```

Response:

```json
{
  "embedding": [0.123, 0.456, ...],
  "dimension": 1024,
  "processing_time_ms": 15.5
}
```

### gRPC API

The service also exposes a gRPC interface on port 50051 (configurable):

```protobuf
service EmbeddingService {
  rpc Embed(EmbedRequest) returns (EmbedResponse);
  rpc EmbedBatch(BatchEmbedRequest) returns (BatchEmbedResponse);
  rpc ComputeSimilarity(SimilarityRequest) returns (SimilarityResponse);
}
```

### OpenAPI Documentation

Access the interactive API documentation at:
- Swagger UI: `http://localhost:9002/swagger-ui/`
- ReDoc: `http://localhost:9002/redoc/`

## โš™๏ธ Configuration

### Key Configuration Options

```toml
[server]
host = "0.0.0.0"
port = 9002

[model]
model_repo = "BAAI/bge-m3"  # HuggingFace model ID
use_gpu = true
batch_size = 32
expected_dimension = 1024

[embedding]
cache_enabled = true
cache_size = 1024

[auth]
enabled = true
jwt_secret = "your-secret-key"
```

See [Configuration Guide](config.toml) for all options.

## ๐Ÿ—๏ธ Architecture

```
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                      VecBoost Service                        โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚   HTTP/gRPC โ”‚  โ”‚  Auth Layer โ”‚  โ”‚  Rate Limiting      โ”‚  โ”‚
โ”‚  โ”‚   Endpoints โ”‚  โ”‚  (JWT/CSRF) โ”‚  โ”‚  (Token Bucket)     โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ”‚         โ”‚                โ”‚                   โ”‚               โ”‚
โ”‚         โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜               โ”‚
โ”‚                          โ”‚                                   โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”‚
โ”‚  โ”‚              Request Pipeline                        โ”‚    โ”‚
โ”‚  โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚    โ”‚
โ”‚  โ”‚  โ”‚ Priorityโ”‚  โ”‚ Request   โ”‚  โ”‚ Response        โ”‚   โ”‚    โ”‚
โ”‚  โ”‚  โ”‚ Queue   โ”‚โ†’ โ”‚ Workers   โ”‚โ†’ โ”‚ Channel         โ”‚   โ”‚    โ”‚
โ”‚  โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚    โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ”‚
โ”‚                          โ”‚                                   โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”‚
โ”‚  โ”‚              Embedding Service                       โ”‚    โ”‚
โ”‚  โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚    โ”‚
โ”‚  โ”‚  โ”‚ Text    โ”‚  โ”‚ Inference โ”‚  โ”‚ Vector Cache    โ”‚   โ”‚    โ”‚
โ”‚  โ”‚  โ”‚ Chunkingโ”‚โ†’ โ”‚ Engine    โ”‚โ†’ โ”‚ (LRU/LFU/KV)    โ”‚   โ”‚    โ”‚
โ”‚  โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚    โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ”‚
โ”‚                          โ”‚                                   โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”‚
โ”‚  โ”‚              Inference Engine                        โ”‚    โ”‚
โ”‚  โ”‚    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                 โ”‚    โ”‚
โ”‚  โ”‚    โ”‚   Candle    โ”‚  โ”‚    ONNX     โ”‚                 โ”‚    โ”‚
โ”‚  โ”‚    โ”‚  (Native)   โ”‚  โ”‚  Runtime    โ”‚                 โ”‚    โ”‚
โ”‚  โ”‚    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                 โ”‚    โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ”‚
โ”‚                          โ”‚                                   โ”‚
โ”‚         โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                 โ”‚
โ”‚         โ–ผ                โ–ผ                โ–ผ                 โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”              โ”‚
โ”‚  โ”‚   CPU    โ”‚    โ”‚   CUDA   โ”‚    โ”‚  Metal   โ”‚              โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜              โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
```

## ๐Ÿ“ฆ Project Structure

```
vecboost/
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ audit/          # Audit logging
โ”‚   โ”œโ”€โ”€ auth/           # Authentication (JWT, CSRF)
โ”‚   โ”œโ”€โ”€ cache/          # Multi-tier caching (LRU, LFU, KV)
โ”‚   โ”œโ”€โ”€ config/         # Configuration management
โ”‚   โ”œโ”€โ”€ device/         # Device management (CPU, CUDA, Metal)
โ”‚   โ”œโ”€โ”€ engine/         # Inference engines (Candle, ONNX)
โ”‚   โ”œโ”€โ”€ grpc/           # gRPC server
โ”‚   โ”œโ”€โ”€ metrics/        # Prometheus metrics
โ”‚   โ”œโ”€โ”€ model/          # Model downloading and management
โ”‚   โ”œโ”€โ”€ pipeline/       # Request pipeline and prioritization
โ”‚   โ”œโ”€โ”€ rate_limit/     # Rate limiting
โ”‚   โ”œโ”€โ”€ routes/         # HTTP routes
โ”‚   โ”œโ”€โ”€ security/       # Security utilities
โ”‚   โ”œโ”€โ”€ service/        # Core embedding service
โ”‚   โ””โ”€โ”€ text/           # Text processing and tokenization
โ”œโ”€โ”€ examples/gpu/       # GPU example programs
โ”œโ”€โ”€ proto/              # gRPC protocol definitions
โ”œโ”€โ”€ deployments/        # Kubernetes deployment configs
โ”œโ”€โ”€ tests/              # Integration tests
โ””โ”€โ”€ config.toml         # Default configuration
```

## ๐ŸŽฏ Performance

| Metric | Value |
|--------|-------|
| Embedding Dimension | Up to 4096 |
| Batch Size | Up to 256 |
| Requests/Second | 1000+ (CPU) |
| Latency (p99) | < 50ms (GPU) |
| Cache Hit Ratio | > 90% (with 1024 entries) |

## ๐Ÿ”’ Security

- **Authentication**: JWT tokens with configurable expiration
- **Authorization**: Role-based access control
- **Audit Logging**: All requests logged with user and action details
- **Rate Limiting**: Per-IP, per-user, and global rate limits
- **Encryption**: AES-256-GCM for sensitive data at rest

## ๐Ÿ“ˆ Monitoring

- **Prometheus Metrics**: `/metrics` endpoint for Prometheus scraping
- **Health Checks**: `/health` endpoint for liveness/readiness
- **OpenAPI Docs**: Swagger UI at `/swagger-ui/`
- **Grafana Dashboards**: Pre-configured dashboards in `deployments/`

## ๐Ÿš€ Deployment

### Kubernetes

```bash
# Deploy to Kubernetes
kubectl apply -f deployments/kubernetes/
```

See [Deployment Guide](deployments/kubernetes/README.md) for detailed instructions.

### Docker Compose

```yaml
services:
  vecboost:
    image: vecboost:latest
    ports:
      - "9002:9002"
    volumes:
      - ./config.toml:/app/config.toml
    environment:
      - MODEL_REPO=BAAI/bge-m3
```

## ๐Ÿค Contributing

Contributions are welcome! Please read our [Contributing Guide](docs/CONTRIBUTING.md) for details.

## ๐Ÿ“„ License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## ๐Ÿ™ Acknowledgments

- [Candle]https://github.com/huggingface/candle - Native Rust ML framework
- [ONNX Runtime]https://onnxruntime.ai/ - Cross-platform ML inference
- [Hugging Face Hub]https://huggingface.co/models - Model repository