# VecBoost
<p align="left">
<img src="https://img.shields.io/badge/Rust-2024-edded?logo=rust&style=flat-square" alt="Rust Edition">
<img src="https://img.shields.io/badge/License-MIT-blue.svg?style=flat-square" alt="MIT License">
<img src="https://img.shields.io/badge/Version-0.1.0-green.svg?style=flat-square" alt="Version">
</p>
A high-performance, production-ready embedding vector service written in Rust. VecBoost provides efficient text vectorization with support for multiple inference engines, GPU acceleration, and enterprise-grade features.
## โจ Features
- **๐ High Performance**: Optimized Rust codebase with batch processing and concurrent request handling
- **๐ง Multiple Engines**: Support for Candle (native Rust) and ONNX Runtime inference engines
- **๐ฎ GPU Acceleration**: Native CUDA support (NVIDIA) and Metal support (Apple Silicon)
- **๐ Smart Caching**: Multi-tier caching with LRU, LFU, and KV cache strategies
- **๐ Enterprise Security**: JWT authentication, CSRF protection, and audit logging
- **โก Rate Limiting**: Configurable rate limiting with token bucket algorithm
- **๐ Priority Queue**: Request prioritization with configurable priority weights
- **๐ Dual APIs**: gRPC and HTTP/REST interfaces with OpenAPI documentation
- **๐ฆ Kubernetes Ready**: Production deployment configurations included
## ๐ Quick Start
### Prerequisites
- Rust 1.75+ (edition 2024)
- CUDA Toolkit 12.x (for GPU support on Linux)
- Metal (for GPU support on macOS)
### Installation
```bash
# Clone the repository
git clone https://github.com/Kirky-X/vecboost.git
cd vecboost
# Build with default features (CPU only)
cargo build --release
# Build with CUDA support (Linux)
cargo build --release --features cuda
# Build with Metal support (macOS)
cargo build --release --features metal
# Build with all features
cargo build --release --features cuda,onnx,grpc,auth,redis
```
### Configuration
Copy the example configuration and customize:
```bash
cp config.toml config_custom.toml
# Edit config_custom.toml with your settings
```
### Running
```bash
# Run with default configuration
./target/release/vecboost
# Run with custom configuration
./target/release/vecboost --config config_custom.toml
```
The service will start on `http://localhost:9002` by default.
### Docker
```bash
# Build the image
docker build -t vecboost:latest .
# Run the container
docker run -p 9002:9002 -v $(pwd)/config.toml:/app/config.toml vecboost:latest
```
## ๐ Documentation
- [๐ User Guide](USER_GUIDE.md) - Detailed usage instructions
- [๐ API Reference](API_REFERENCE.md) - REST API and gRPC documentation
- [๐๏ธ Architecture](ARCHITECTURE.md) - System design and components
- [๐ค Contributing](docs/CONTRIBUTING.md) - Contribution guidelines
## ๐ API Usage
### HTTP REST API
Generate embeddings via HTTP:
```bash
curl -X POST http://localhost:9002/api/v1/embed \
-H "Content-Type: application/json" \
-d '{"text": "Hello, world!"}'
```
Response:
```json
{
"embedding": [0.123, 0.456, ...],
"dimension": 1024,
"processing_time_ms": 15.5
}
```
### gRPC API
The service also exposes a gRPC interface on port 50051 (configurable):
```protobuf
service EmbeddingService {
rpc Embed(EmbedRequest) returns (EmbedResponse);
rpc EmbedBatch(BatchEmbedRequest) returns (BatchEmbedResponse);
rpc ComputeSimilarity(SimilarityRequest) returns (SimilarityResponse);
}
```
### OpenAPI Documentation
Access the interactive API documentation at:
- Swagger UI: `http://localhost:9002/swagger-ui/`
- ReDoc: `http://localhost:9002/redoc/`
## โ๏ธ Configuration
### Key Configuration Options
```toml
[server]
host = "0.0.0.0"
port = 9002
[model]
model_repo = "BAAI/bge-m3" # HuggingFace model ID
use_gpu = true
batch_size = 32
expected_dimension = 1024
[embedding]
cache_enabled = true
cache_size = 1024
[auth]
enabled = true
jwt_secret = "your-secret-key"
```
See [Configuration Guide](config.toml) for all options.
## ๐๏ธ Architecture
```
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ VecBoost Service โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ HTTP/gRPC โ โ Auth Layer โ โ Rate Limiting โ โ
โ โ Endpoints โ โ (JWT/CSRF) โ โ (Token Bucket) โ โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ โ โ
โ โโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Request Pipeline โ โ
โ โ โโโโโโโโโโโ โโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โ โ
โ โ โ Priorityโ โ Request โ โ Response โ โ โ
โ โ โ Queue โโ โ Workers โโ โ Channel โ โ โ
โ โ โโโโโโโโโโโ โโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Embedding Service โ โ
โ โ โโโโโโโโโโโ โโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โ โ
โ โ โ Text โ โ Inference โ โ Vector Cache โ โ โ
โ โ โ Chunkingโโ โ Engine โโ โ (LRU/LFU/KV) โ โ โ
โ โ โโโโโโโโโโโ โโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Inference Engine โ โ
โ โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โ โ
โ โ โ Candle โ โ ONNX โ โ โ
โ โ โ (Native) โ โ Runtime โ โ โ
โ โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโ โ
โ โผ โผ โผ โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โ
โ โ CPU โ โ CUDA โ โ Metal โ โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
```
## ๐ฆ Project Structure
```
vecboost/
โโโ src/
โ โโโ audit/ # Audit logging
โ โโโ auth/ # Authentication (JWT, CSRF)
โ โโโ cache/ # Multi-tier caching (LRU, LFU, KV)
โ โโโ config/ # Configuration management
โ โโโ device/ # Device management (CPU, CUDA, Metal)
โ โโโ engine/ # Inference engines (Candle, ONNX)
โ โโโ grpc/ # gRPC server
โ โโโ metrics/ # Prometheus metrics
โ โโโ model/ # Model downloading and management
โ โโโ pipeline/ # Request pipeline and prioritization
โ โโโ rate_limit/ # Rate limiting
โ โโโ routes/ # HTTP routes
โ โโโ security/ # Security utilities
โ โโโ service/ # Core embedding service
โ โโโ text/ # Text processing and tokenization
โโโ examples/gpu/ # GPU example programs
โโโ proto/ # gRPC protocol definitions
โโโ deployments/ # Kubernetes deployment configs
โโโ tests/ # Integration tests
โโโ config.toml # Default configuration
```
## ๐ฏ Performance
| Embedding Dimension | Up to 4096 |
| Batch Size | Up to 256 |
| Requests/Second | 1000+ (CPU) |
| Latency (p99) | < 50ms (GPU) |
| Cache Hit Ratio | > 90% (with 1024 entries) |
## ๐ Security
- **Authentication**: JWT tokens with configurable expiration
- **Authorization**: Role-based access control
- **Audit Logging**: All requests logged with user and action details
- **Rate Limiting**: Per-IP, per-user, and global rate limits
- **Encryption**: AES-256-GCM for sensitive data at rest
## ๐ Monitoring
- **Prometheus Metrics**: `/metrics` endpoint for Prometheus scraping
- **Health Checks**: `/health` endpoint for liveness/readiness
- **OpenAPI Docs**: Swagger UI at `/swagger-ui/`
- **Grafana Dashboards**: Pre-configured dashboards in `deployments/`
## ๐ Deployment
### Kubernetes
```bash
# Deploy to Kubernetes
kubectl apply -f deployments/kubernetes/
```
See [Deployment Guide](deployments/kubernetes/README.md) for detailed instructions.
### Docker Compose
```yaml
services:
vecboost:
image: vecboost:latest
ports:
- "9002:9002"
volumes:
- ./config.toml:/app/config.toml
environment:
- MODEL_REPO=BAAI/bge-m3
```
## ๐ค Contributing
Contributions are welcome! Please read our [Contributing Guide](docs/CONTRIBUTING.md) for details.
## ๐ License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## ๐ Acknowledgments
- [Candle](https://github.com/huggingface/candle) - Native Rust ML framework
- [ONNX Runtime](https://onnxruntime.ai/) - Cross-platform ML inference
- [Hugging Face Hub](https://huggingface.co/models) - Model repository