docs.rs failed to build vecboost-0.1.0
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
VecBoost
A high-performance, production-ready embedding vector service written in Rust. VecBoost provides efficient text vectorization with support for multiple inference engines, GPU acceleration, and enterprise-grade features.
โจ Features
- ๐ High Performance: Optimized Rust codebase with batch processing and concurrent request handling
- ๐ง Multiple Engines: Support for Candle (native Rust) and ONNX Runtime inference engines
- ๐ฎ GPU Acceleration: Native CUDA support (NVIDIA) and Metal support (Apple Silicon)
- ๐ Smart Caching: Multi-tier caching with LRU, LFU, and KV cache strategies
- ๐ Enterprise Security: JWT authentication, CSRF protection, and audit logging
- โก Rate Limiting: Configurable rate limiting with token bucket algorithm
- ๐ Priority Queue: Request prioritization with configurable priority weights
- ๐ Dual APIs: gRPC and HTTP/REST interfaces with OpenAPI documentation
- ๐ฆ Kubernetes Ready: Production deployment configurations included
๐ Quick Start
Prerequisites
- Rust 1.75+ (edition 2024)
- CUDA Toolkit 12.x (for GPU support on Linux)
- Metal (for GPU support on macOS)
Installation
# Clone the repository
# Build with default features (CPU only)
# Build with CUDA support (Linux)
# Build with Metal support (macOS)
# Build with all features
Configuration
Copy the example configuration and customize:
# Edit config_custom.toml with your settings
Running
# Run with default configuration
# Run with custom configuration
The service will start on http://localhost:9002 by default.
Docker
# Build the image
# Run the container
๐ Documentation
- ๐ User Guide - Detailed usage instructions
- ๐ API Reference - REST API and gRPC documentation
- ๐๏ธ Architecture - System design and components
- ๐ค Contributing - Contribution guidelines
๐ API Usage
HTTP REST API
Generate embeddings via HTTP:
Response:
gRPC API
The service also exposes a gRPC interface on port 50051 (configurable):
service EmbeddingService {
rpc Embed(EmbedRequest) returns (EmbedResponse);
rpc EmbedBatch(BatchEmbedRequest) returns (BatchEmbedResponse);
rpc ComputeSimilarity(SimilarityRequest) returns (SimilarityResponse);
}
OpenAPI Documentation
Access the interactive API documentation at:
- Swagger UI:
http://localhost:9002/swagger-ui/ - ReDoc:
http://localhost:9002/redoc/
โ๏ธ Configuration
Key Configuration Options
[]
= "0.0.0.0"
= 9002
[]
= "BAAI/bge-m3" # HuggingFace model ID
= true
= 32
= 1024
[]
= true
= 1024
[]
= true
= "your-secret-key"
See Configuration Guide for all options.
๐๏ธ Architecture
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ VecBoost Service โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ HTTP/gRPC โ โ Auth Layer โ โ Rate Limiting โ โ
โ โ Endpoints โ โ (JWT/CSRF) โ โ (Token Bucket) โ โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ โ โ
โ โโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Request Pipeline โ โ
โ โ โโโโโโโโโโโ โโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โ โ
โ โ โ Priorityโ โ Request โ โ Response โ โ โ
โ โ โ Queue โโ โ Workers โโ โ Channel โ โ โ
โ โ โโโโโโโโโโโ โโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Embedding Service โ โ
โ โ โโโโโโโโโโโ โโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โ โ
โ โ โ Text โ โ Inference โ โ Vector Cache โ โ โ
โ โ โ Chunkingโโ โ Engine โโ โ (LRU/LFU/KV) โ โ โ
โ โ โโโโโโโโโโโ โโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Inference Engine โ โ
โ โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โ โ
โ โ โ Candle โ โ ONNX โ โ โ
โ โ โ (Native) โ โ Runtime โ โ โ
โ โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโ โ
โ โผ โผ โผ โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โ
โ โ CPU โ โ CUDA โ โ Metal โ โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ฆ Project Structure
vecboost/
โโโ src/
โ โโโ audit/ # Audit logging
โ โโโ auth/ # Authentication (JWT, CSRF)
โ โโโ cache/ # Multi-tier caching (LRU, LFU, KV)
โ โโโ config/ # Configuration management
โ โโโ device/ # Device management (CPU, CUDA, Metal)
โ โโโ engine/ # Inference engines (Candle, ONNX)
โ โโโ grpc/ # gRPC server
โ โโโ metrics/ # Prometheus metrics
โ โโโ model/ # Model downloading and management
โ โโโ pipeline/ # Request pipeline and prioritization
โ โโโ rate_limit/ # Rate limiting
โ โโโ routes/ # HTTP routes
โ โโโ security/ # Security utilities
โ โโโ service/ # Core embedding service
โ โโโ text/ # Text processing and tokenization
โโโ examples/gpu/ # GPU example programs
โโโ proto/ # gRPC protocol definitions
โโโ deployments/ # Kubernetes deployment configs
โโโ tests/ # Integration tests
โโโ config.toml # Default configuration
๐ฏ Performance
| Metric | Value |
|---|---|
| Embedding Dimension | Up to 4096 |
| Batch Size | Up to 256 |
| Requests/Second | 1000+ (CPU) |
| Latency (p99) | < 50ms (GPU) |
| Cache Hit Ratio | > 90% (with 1024 entries) |
๐ Security
- Authentication: JWT tokens with configurable expiration
- Authorization: Role-based access control
- Audit Logging: All requests logged with user and action details
- Rate Limiting: Per-IP, per-user, and global rate limits
- Encryption: AES-256-GCM for sensitive data at rest
๐ Monitoring
- Prometheus Metrics:
/metricsendpoint for Prometheus scraping - Health Checks:
/healthendpoint for liveness/readiness - OpenAPI Docs: Swagger UI at
/swagger-ui/ - Grafana Dashboards: Pre-configured dashboards in
deployments/
๐ Deployment
Kubernetes
# Deploy to Kubernetes
See Deployment Guide for detailed instructions.
Docker Compose
services:
vecboost:
image: vecboost:latest
ports:
- "9002:9002"
volumes:
- ./config.toml:/app/config.toml
environment:
- MODEL_REPO=BAAI/bge-m3
๐ค Contributing
Contributions are welcome! Please read our Contributing Guide for details.
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Acknowledgments
- Candle - Native Rust ML framework
- ONNX Runtime - Cross-platform ML inference
- Hugging Face Hub - Model repository