# Halldyll Deploy Pods
[](https://crates.io/crates/halldyll_deploy_pods)
[](https://docs.rs/halldyll_deploy_pods)
[](https://opensource.org/licenses/MIT)
[](https://www.rust-lang.org)
**A declarative, idempotent, and reconcilable deployment system for [RunPod](https://runpod.io) GPU pods.**
Think of it as **Terraform/Kubernetes for RunPod** — define your GPU infrastructure as code, and let Halldyll handle the rest.
## Features
- **Declarative** — Define your infrastructure in a simple YAML file
- **Idempotent** — Run `apply` multiple times, get the same result
- **Drift Detection** — Automatically detect and fix configuration drift
- **Reconciliation Loop** — Continuously converge to desired state
- **State Management** — Track deployments locally or on S3
- **Multi-environment** — Support for dev, staging, prod environments
- **Guardrails** — Cost limits, GPU limits, TTL auto-stop
- **Auto Model Download** — Automatically download HuggingFace models on pod startup
- **Inference Engines** — Auto-start vLLM, TGI, or Ollama with your models
## Installation
### From Crates.io
```bash
cargo install halldyll_deploy_pods
```
### From Source
```bash
git clone https://github.com/Mr-soloDev/halldyll_deploy_pods.git
cd halldyll_deploy_pods
cargo install --path .
```
## Quick Start
### 1. Initialize a new project
```bash
halldyll init my-project
cd my-project
```
### 2. Configure your deployment
Edit `halldyll.deploy.yaml`:
```yaml
project:
name: "my-ml-stack"
environment: "prod"
cloud_type: SECURE
state:
backend: local
pods:
- name: "inference"
gpu:
type: "NVIDIA A40"
count: 1
runtime:
image: "vllm/vllm-openai:latest"
env:
MODEL_NAME: "meta-llama/Llama-3-8B"
ports:
- "8000/http"
volumes:
- name: "hf-cache"
mount: "/root/.cache/huggingface"
persistent: true
```
### 3. Set your RunPod API key
```bash
export RUNPOD_API_KEY="your-api-key"
```
### 4. Deploy!
```bash
halldyll plan # Preview changes
halldyll apply # Deploy to RunPod
halldyll status # Check deployment status
```
## Commands
| `halldyll init [path]` | Initialize a new project |
| `halldyll validate` | Validate configuration file |
| `halldyll plan` | Show deployment plan (dry-run) |
| `halldyll apply` | Apply the deployment plan |
| `halldyll status` | Show current deployment status |
| `halldyll reconcile` | Auto-fix drift from desired state |
| `halldyll drift` | Detect configuration drift |
| `halldyll destroy` | Destroy all deployed resources |
| `halldyll logs <pod>` | View pod logs |
| `halldyll state` | Manage deployment state |
## Configuration Reference
### Project Configuration
```yaml
project:
name: "my-project" # Required: unique project name
environment: "dev" # Optional: dev, staging, prod (default: dev)
region: "EU" # Optional: EU, US, etc.
cloud_type: SECURE # Optional: SECURE or COMMUNITY
compute_type: GPU # Optional: GPU or CPU
```
### State Backend
```yaml
state:
backend: local # local or s3
# For S3:
bucket: "my-state-bucket"
prefix: "halldyll/my-project"
region: "us-east-1"
```
### Pod Configuration
```yaml
pods:
- name: "my-pod"
gpu:
type: "NVIDIA A40" # GPU type
count: 1 # Number of GPUs
min_vram_gb: 40 # Optional: minimum VRAM
fallback: # Optional: fallback GPU types
- "NVIDIA L40S"
- "NVIDIA RTX A6000"
ports:
- "22/tcp" # SSH
- "8000/http" # HTTP endpoint
volumes:
- name: "data"
mount: "/data"
persistent: true
size_gb: 100
runtime:
image: "runpod/pytorch:2.1.0-py3.10-cuda11.8.0"
env:
MY_VAR: "value"
health_check:
endpoint: "/health"
port: 8000
interval_secs: 30
timeout_secs: 5
```
### Model Configuration (Auto-download and Start)
```yaml
pods:
- name: "llm-server"
gpu:
type: "NVIDIA A40"
count: 1
runtime:
image: "vllm/vllm-openai:latest"
ports:
- "8000/http"
# Models are automatically downloaded and engines started
models:
- id: "llama-3-8b"
provider: huggingface # huggingface, bundle, or custom
repo: "meta-llama/Meta-Llama-3-8B-Instruct"
load:
engine: vllm # vllm, tgi, ollama, or transformers
quant: awq # Optional: awq, gptq, fp8
max_seq_len: 8192 # Optional: max sequence length
options: # Optional: engine-specific options
tensor-parallel-size: 1
```
### Supported Inference Engines
| `vllm` | High-performance LLM serving | Yes | Production LLM APIs, OpenAI-compatible |
| `tgi` | HuggingFace Text Generation Inference | Yes | HuggingFace models, streaming |
| `ollama` | Easy-to-use LLM runner | Yes | Local development, quick testing |
| `transformers` | HuggingFace Transformers library | No | Custom scripts, fine-tuning |
### Multi-Model Deployment Example
Deploy different models on different pods:
```yaml
pods:
# LLM API Server
- name: "llm-api"
gpu:
type: "NVIDIA A40"
count: 1
runtime:
image: "vllm/vllm-openai:latest"
ports:
- "8000/http"
models:
- id: "llama-3-8b"
provider: huggingface
repo: "meta-llama/Meta-Llama-3-8B-Instruct"
load:
engine: vllm
max_seq_len: 8192
# Embedding Server
- name: "embeddings"
gpu:
type: "NVIDIA RTX 4090"
count: 1
runtime:
image: "ghcr.io/huggingface/text-embeddings-inference:latest"
ports:
- "8080/http"
models:
- id: "bge-large"
provider: huggingface
repo: "BAAI/bge-large-en-v1.5"
load:
engine: tgi
# Vision Model
- name: "vision-api"
gpu:
type: "NVIDIA A40"
count: 1
runtime:
image: "ghcr.io/huggingface/text-generation-inference:latest"
ports:
- "8000/http"
models:
- id: "llava"
provider: huggingface
repo: "llava-hf/llava-v1.6-mistral-7b-hf"
load:
engine: tgi
```
### Quantization Options
Reduce memory usage with quantization:
```yaml
models:
- id: "llama-70b-awq"
provider: huggingface
repo: "TheBloke/Llama-2-70B-Chat-AWQ"
load:
engine: vllm
quant: awq # 4-bit AWQ quantization
max_seq_len: 4096
```
| `awq` | ~75% | High | Fast |
| `gptq` | ~75% | High | Medium |
| `fp8` | ~50% | Very High | Fast |
### Guardrails (Optional)
```yaml
guardrails:
max_hourly_cost: 10.0 # Maximum hourly cost in USD
max_gpus: 4 # Maximum total GPUs
ttl_hours: 24 # Auto-stop after N hours
allow_gpu_fallback: false # Allow fallback to other GPU types
```
## Architecture
```
┌─────────────────────────────────────────────────────────┐
│ halldyll.deploy.yaml │
│ (Desired State) │
└───────────────────────┬─────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ ConfigParser + Validator │
└───────────────────────┬─────────────────────────────────┘
│
┌──────────────┴──────────────┐
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ StateStore │ │ PodObserver │
│ (Local or S3) │ │ (RunPod API) │
└────────┬────────┘ └────────┬────────┘
│ │
└──────────────┬─────────────┘
▼
┌─────────────────────────────────────────────────────────┐
│ DiffEngine │
│ (Compare Desired vs Observed) │
└───────────────────────┬─────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────┐
│ Reconciler │
│ (Execute Plan → Converge State) │
└─────────────────────────────────────────────────────────┘
```
## Library Usage
You can also use Halldyll as a library in your Rust projects:
```rust
use halldyll_deploy_pods::{
ConfigParser, ConfigValidator, DeployConfig,
RunPodClient, PodProvisioner, PodObserver, PodExecutor,
Reconciler, StateStore, LocalStateStore,
};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Parse configuration
let config = ConfigParser::parse_file("halldyll.deploy.yaml")?;
// Validate
ConfigValidator::validate(&config)?;
// Create RunPod client
let client = RunPodClient::new(&std::env::var("RUNPOD_API_KEY")?)?;
// Create provisioner and deploy with auto model setup
let provisioner = PodProvisioner::new(client.clone());
let (pod, setup_result) = provisioner.create_pod_with_setup(
&config.pods[0],
&config.project,
"config-hash"
).await?;
// Check model setup results
if let Some(result) = setup_result {
println!("Setup: {}", result.summary());
}
Ok(())
}
```
## Environment Variables
| `RUNPOD_API_KEY` | Your RunPod API key | Yes |
| `HF_TOKEN` | HuggingFace API token (for gated models like Llama) | For gated models |
| `HALLDYLL_CONFIG` | Path to config file | No |
| `AWS_ACCESS_KEY_ID` | AWS credentials (for S3 state) | No |
| `AWS_SECRET_ACCESS_KEY` | AWS credentials (for S3 state) | No |
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
1. Fork the repository
2. Create your feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add some amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Author
**Geryan Roy** ([@Mr-soloDev](https://github.com/Mr-soloDev))
- Email: geryan.roy@icloud.com
## Acknowledgments
- [RunPod](https://runpod.io) for the amazing GPU cloud platform
- Inspired by Terraform, Kubernetes, and other declarative infrastructure tools