docs.rs failed to build vecboost-0.1.0
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.

VecBoost

A high-performance, production-ready embedding vector service written in Rust. VecBoost provides efficient text vectorization with support for multiple inference engines, GPU acceleration, and enterprise-grade features.

✨ Features

🚀 High Performance: Optimized Rust codebase with batch processing and concurrent request handling
🔧 Multiple Engines: Support for Candle (native Rust) and ONNX Runtime inference engines
🎮 GPU Acceleration: Native CUDA support (NVIDIA) and Metal support (Apple Silicon)
📊 Smart Caching: Multi-tier caching with LRU, LFU, and KV cache strategies
🔐 Enterprise Security: JWT authentication, CSRF protection, and audit logging
⚡ Rate Limiting: Configurable rate limiting with token bucket algorithm
📈 Priority Queue: Request prioritization with configurable priority weights
🌐 Dual APIs: gRPC and HTTP/REST interfaces with OpenAPI documentation
📦 Kubernetes Ready: Production deployment configurations included

🚀 Quick Start

Prerequisites

Rust 1.75+ (edition 2024)
CUDA Toolkit 12.x (for GPU support on Linux)
Metal (for GPU support on macOS)

Installation

# Clone the repository
git clone https://github.com/Kirky-X/vecboost.git
cd vecboost

# Build with default features (CPU only)
cargo build --release

# Build with CUDA support (Linux)
cargo build --release --features cuda

# Build with Metal support (macOS)
cargo build --release --features metal

# Build with all features
cargo build --release --features cuda,onnx,grpc,auth,redis

Configuration

Copy the example configuration and customize:

cp config.toml config_custom.toml
# Edit config_custom.toml with your settings

Running

# Run with default configuration
./target/release/vecboost

# Run with custom configuration
./target/release/vecboost --config config_custom.toml

The service will start on http://localhost:9002 by default.

Docker

# Build the image
docker build -t vecboost:latest .

# Run the container
docker run -p 9002:9002 -v $(pwd)/config.toml:/app/config.toml vecboost:latest

📖 Documentation

📋 User Guide - Detailed usage instructions
🔌 API Reference - REST API and gRPC documentation
🏗️ Architecture - System design and components
🤝 Contributing - Contribution guidelines

🔌 API Usage

HTTP REST API

Generate embeddings via HTTP:

curl -X POST http://localhost:9002/api/v1/embed \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello, world!"}'

Response:

{
  "embedding": [0.123, 0.456, ...],
  "dimension": 1024,
  "processing_time_ms": 15.5
}

gRPC API

The service also exposes a gRPC interface on port 50051 (configurable):

service EmbeddingService {
  rpc Embed(EmbedRequest) returns (EmbedResponse);
  rpc EmbedBatch(BatchEmbedRequest) returns (BatchEmbedResponse);
  rpc ComputeSimilarity(SimilarityRequest) returns (SimilarityResponse);
}

OpenAPI Documentation

Access the interactive API documentation at:

Swagger UI: http://localhost:9002/swagger-ui/
ReDoc: http://localhost:9002/redoc/

⚙️ Configuration

Key Configuration Options

[server]
host = "0.0.0.0"
port = 9002

[model]
model_repo = "BAAI/bge-m3"  # HuggingFace model ID
use_gpu = true
batch_size = 32
expected_dimension = 1024

[embedding]
cache_enabled = true
cache_size = 1024

[auth]
enabled = true
jwt_secret = "your-secret-key"

See Configuration Guide for all options.

🏗️ Architecture

┌─────────────────────────────────────────────────────────────┐
│                      VecBoost Service                        │
├─────────────────────────────────────────────────────────────┤
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐  │
│  │   HTTP/gRPC │  │  Auth Layer │  │  Rate Limiting      │  │
│  │   Endpoints │  │  (JWT/CSRF) │  │  (Token Bucket)     │  │
│  └─────────────┘  └─────────────┘  └─────────────────────┘  │
│         │                │                   │               │
│         └────────────────┴───────────────────┘               │
│                          │                                   │
│  ┌─────────────────────────────────────────────────────┐    │
│  │              Request Pipeline                        │    │
│  │  ┌─────────┐  ┌───────────┐  ┌─────────────────┐   │    │
│  │  │ Priority│  │ Request   │  │ Response        │   │    │
│  │  │ Queue   │→ │ Workers   │→ │ Channel         │   │    │
│  │  └─────────┘  └───────────┘  └─────────────────┘   │    │
│  └─────────────────────────────────────────────────────┘    │
│                          │                                   │
│  ┌─────────────────────────────────────────────────────┐    │
│  │              Embedding Service                       │    │
│  │  ┌─────────┐  ┌───────────┐  ┌─────────────────┐   │    │
│  │  │ Text    │  │ Inference │  │ Vector Cache    │   │    │
│  │  │ Chunking│→ │ Engine    │→ │ (LRU/LFU/KV)    │   │    │
│  │  └─────────┘  └───────────┘  └─────────────────┘   │    │
│  └─────────────────────────────────────────────────────┘    │
│                          │                                   │
│  ┌─────────────────────────────────────────────────────┐    │
│  │              Inference Engine                        │    │
│  │    ┌─────────────┐  ┌─────────────┐                 │    │
│  │    │   Candle    │  │    ONNX     │                 │    │
│  │    │  (Native)   │  │  Runtime    │                 │    │
│  │    └─────────────┘  └─────────────┘                 │    │
│  └─────────────────────────────────────────────────────┘    │
│                          │                                   │
│         ┌────────────────┼────────────────┐                 │
│         ▼                ▼                ▼                 │
│  ┌──────────┐    ┌──────────┐    ┌──────────┐              │
│  │   CPU    │    │   CUDA   │    │  Metal   │              │
│  └──────────┘    └──────────┘    └──────────┘              │
└─────────────────────────────────────────────────────────────┘

📦 Project Structure

vecboost/
├── src/
│   ├── audit/          # Audit logging
│   ├── auth/           # Authentication (JWT, CSRF)
│   ├── cache/          # Multi-tier caching (LRU, LFU, KV)
│   ├── config/         # Configuration management
│   ├── device/         # Device management (CPU, CUDA, Metal)
│   ├── engine/         # Inference engines (Candle, ONNX)
│   ├── grpc/           # gRPC server
│   ├── metrics/        # Prometheus metrics
│   ├── model/          # Model downloading and management
│   ├── pipeline/       # Request pipeline and prioritization
│   ├── rate_limit/     # Rate limiting
│   ├── routes/         # HTTP routes
│   ├── security/       # Security utilities
│   ├── service/        # Core embedding service
│   └── text/           # Text processing and tokenization
├── examples/gpu/       # GPU example programs
├── proto/              # gRPC protocol definitions
├── deployments/        # Kubernetes deployment configs
├── tests/              # Integration tests
└── config.toml         # Default configuration

🎯 Performance

Metric	Value
Embedding Dimension	Up to 4096
Batch Size	Up to 256
Requests/Second	1000+ (CPU)
Latency (p99)	< 50ms (GPU)
Cache Hit Ratio	> 90% (with 1024 entries)

🔒 Security

Authentication: JWT tokens with configurable expiration
Authorization: Role-based access control
Audit Logging: All requests logged with user and action details
Rate Limiting: Per-IP, per-user, and global rate limits
Encryption: AES-256-GCM for sensitive data at rest

📈 Monitoring

Prometheus Metrics: /metrics endpoint for Prometheus scraping
Health Checks: /health endpoint for liveness/readiness
OpenAPI Docs: Swagger UI at /swagger-ui/
Grafana Dashboards: Pre-configured dashboards in deployments/

🚀 Deployment

Kubernetes

# Deploy to Kubernetes
kubectl apply -f deployments/kubernetes/

See Deployment Guide for detailed instructions.

Docker Compose

services:
  vecboost:
    image: vecboost:latest
    ports:
      - "9002:9002"
    volumes:
      - ./config.toml:/app/config.toml
    environment:
      - MODEL_REPO=BAAI/bge-m3

🤝 Contributing

Contributions are welcome! Please read our Contributing Guide for details.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Candle - Native Rust ML framework
ONNX Runtime - Cross-platform ML inference
Hugging Face Hub - Model repository

vecboost 0.1.0