# Halldyll Starter RunPod
[](https://crates.io/crates/halldyll_starter_runpod)
[](https://docs.rs/halldyll_starter_runpod)
[](https://opensource.org/licenses/MIT)
A comprehensive Rust library for managing RunPod GPU pods with automatic provisioning, state management, and orchestration.
## Features
- **REST API Client** - Create, start, stop pods via RunPod REST API
- **GraphQL Client** - Full access to RunPod GraphQL API for advanced operations
- **State Management** - Persist pod state and compute idempotent action plans
- **Orchestration** - High-level pod management with automatic reconciliation
- **Fully Configurable** - All settings via environment variables (`.env` file)
- **Strict Linting** - Production-ready code with comprehensive lint rules
## Installation
### From crates.io (recommended)
```toml
[dependencies]
halldyll_starter_runpod = "0.1"
tokio = { version = "1", features = ["macros", "rt-multi-thread"] }
```
### From GitHub
```toml
[dependencies]
halldyll_starter_runpod = { git = "https://github.com/Mr-soloDev/halldyll-starter" }
tokio = { version = "1", features = ["macros", "rt-multi-thread"] }
```
## Configuration
Create a `.env` file in your project root:
```env
# Required
RUNPOD_API_KEY=your_api_key_here
RUNPOD_IMAGE_NAME=runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel
# Optional - Pod Configuration
RUNPOD_POD_NAME=my-gpu-pod
RUNPOD_GPU_TYPE_IDS=NVIDIA A40
RUNPOD_GPU_COUNT=1
RUNPOD_CONTAINER_DISK_GB=20
RUNPOD_VOLUME_GB=50
RUNPOD_VOLUME_MOUNT_PATH=/workspace
RUNPOD_PORTS=22/tcp,8888/http
# Optional - Timeouts
RUNPOD_HTTP_TIMEOUT_MS=30000
RUNPOD_READY_TIMEOUT_MS=300000
RUNPOD_POLL_INTERVAL_MS=5000
# Optional - API URLs
RUNPOD_REST_URL=https://rest.runpod.io/v1
RUNPOD_GRAPHQL_URL=https://api.runpod.io/graphql
# Optional - Behavior
RUNPOD_RECONCILE_MODE=reuse
```
### Environment Variables Reference
| `RUNPOD_API_KEY` | ✓ | - | RunPod API key |
| `RUNPOD_IMAGE_NAME` | ✓ | - | Container image (e.g., `runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel`) |
| `RUNPOD_POD_NAME` | | `halldyll-pod` | Name for the pod |
| `RUNPOD_GPU_TYPE_IDS` | | `NVIDIA A40` | Comma-separated GPU types (e.g., `NVIDIA A40,NVIDIA RTX 4090`) |
| `RUNPOD_GPU_COUNT` | | `1` | Number of GPUs |
| `RUNPOD_CONTAINER_DISK_GB` | | `20` | Container disk size in GB |
| `RUNPOD_VOLUME_GB` | | `0` | Persistent volume size (0 = no volume) |
| `RUNPOD_VOLUME_MOUNT_PATH` | | `/workspace` | Mount path for persistent volume |
| `RUNPOD_PORTS` | | `22/tcp,8888/http` | Exposed ports (format: `port/protocol`) |
| `RUNPOD_HTTP_TIMEOUT_MS` | | `30000` | HTTP request timeout (ms) |
| `RUNPOD_READY_TIMEOUT_MS` | | `300000` | Pod ready timeout (ms) |
| `RUNPOD_POLL_INTERVAL_MS` | | `5000` | Poll interval for readiness (ms) |
| `RUNPOD_RECONCILE_MODE` | | `reuse` | `reuse` or `recreate` existing pods |
### Pod Naming & Multiple Pods
The orchestrator uses the pod name to identify and reuse existing pods:
- **Same name** → Reuses the existing pod (starts it if stopped)
- **Different name** → Creates a new pod
To run multiple pods simultaneously, simply use different names:
```env
# Development pod
RUNPOD_POD_NAME=dev-pod
# Production pod
RUNPOD_POD_NAME=prod-pod
# ML training pod
RUNPOD_POD_NAME=training-pod
```
Each unique name creates a separate pod on RunPod.
## Usage
### Quick Start with Orchestrator
The orchestrator provides the simplest way to get a ready-to-use pod:
```rust
use halldyll_starter_runpod::{RunpodOrchestrator, RunpodOrchestratorConfig};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Load config from .env
let cfg = RunpodOrchestratorConfig::from_env()?;
let orchestrator = RunpodOrchestrator::new(cfg)?;
// Get a ready pod (creates, starts, or reuses as needed)
let pod = orchestrator.ensure_ready_pod().await?;
println!("Pod ready: {} at {}", pod.name, pod.public_ip);
// Get SSH connection info
if let Some((host, port)) = pod.ssh_endpoint() {
println!("SSH: ssh -p {} user@{}", port, host);
}
// Get Jupyter URL
if let Some(url) = pod.jupyter_endpoint() {
println!("Jupyter: {}", url);
}
Ok(())
}
```
### Stopping & Terminating Pods
```rust
use halldyll_starter_runpod::{RunpodOrchestrator, RunpodOrchestratorConfig};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let cfg = RunpodOrchestratorConfig::from_env()?;
let orchestrator = RunpodOrchestrator::new(cfg)?;
// Get a ready pod
let pod = orchestrator.ensure_ready_pod().await?;
println!("Pod running: {}", pod.id);
// Do your work...
// Stop the pod (keeps config, can restart later, stops billing)
orchestrator.stop_pod(&pod.id).await?;
println!("Pod stopped!");
// Or stop by name (uses RUNPOD_POD_NAME from .env)
// orchestrator.stop_current_pod().await?;
// Or terminate completely (deletes the pod)
// orchestrator.terminate(&pod.id).await?;
// orchestrator.terminate_current_pod().await?;
Ok(())
}
```
### Auto-stop after timeout
```rust
use halldyll_starter_runpod::{RunpodOrchestrator, RunpodOrchestratorConfig};
use std::time::Duration;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let cfg = RunpodOrchestratorConfig::from_env()?;
let orchestrator = RunpodOrchestrator::new(cfg)?;
let pod = orchestrator.ensure_ready_pod().await?;
println!("Pod running for max 1 hour...");
// Auto-stop after 1 hour
tokio::select! {
_ = tokio::time::sleep(Duration::from_secs(3600)) => {
println!("Timeout reached, stopping pod...");
orchestrator.stop_pod(&pod.id).await?;
}
// Or wait for your task to complete
// result = your_long_running_task() => { ... }
}
Ok(())
}
```
### Low-Level Provisioner
For direct pod creation:
```rust
use halldyll_starter_runpod::{RunpodProvisioner, RunpodProvisionConfig};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let cfg = RunpodProvisionConfig::from_env()?;
let provisioner = RunpodProvisioner::new(cfg)?;
let pod = provisioner.create_pod().await?;
println!("Created pod: {}", pod.id);
Ok(())
}
```
### Pod Starter (Start/Stop)
For managing existing pods:
```rust
use halldyll_starter_runpod::{RunpodStarter, RunpodStarterConfig};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let cfg = RunpodStarterConfig::from_env()?;
let starter = RunpodStarter::new(cfg)?;
// Start a pod
let status = starter.start("pod_id_here").await?;
println!("Pod status: {}", status.desired_status);
// Stop a pod
let status = starter.stop("pod_id_here").await?;
println!("Pod stopped: {}", status.desired_status);
Ok(())
}
```
### GraphQL Client
For advanced operations:
```rust
use halldyll_starter_runpod::{RunpodClient, RunpodClientConfig};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let cfg = RunpodClientConfig::from_env()?;
let client = RunpodClient::new(cfg)?;
// List all pods
let pods = client.list_pods().await?;
for pod in pods {
println!("Pod: {} ({})", pod.id, pod.desired_status);
}
// List available GPU types
let gpus = client.list_gpu_types().await?;
for gpu in gpus {
println!("GPU: {} - Available: {}", gpu.display_name, gpu.available_count);
}
Ok(())
}
```
### State Management
For persistent state and reconciliation:
```rust
use halldyll_starter_runpod::{RunPodState, JsonFileStateStore, PlannedAction};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let store = JsonFileStateStore::new("./pod_state.json");
// Load existing state
let mut state = store.load()?.unwrap_or_default();
// Record a pod
state.record_pod("pod-123", "my-pod", "runpod/pytorch:latest");
// Compute reconciliation plan
let action = state.reconcile("my-pod", "runpod/pytorch:latest");
match action {
PlannedAction::DoNothing(id) => println!("Pod {} is ready", id),
PlannedAction::Start(id) => println!("Need to start pod {}", id),
PlannedAction::Create => println!("Need to create new pod"),
}
// Save state
store.save(&state)?;
Ok(())
}
```
## Modules
| `runpod_provisioner` | Create new pods via REST API |
| `runpod_starter` | Start/stop existing pods via REST API |
| `runpod_state` | State persistence and reconciliation |
| `runpod_client` | GraphQL client for advanced operations |
| `runpod_orchestrator` | High-level pod management |
## GPU Types
Common GPU types available on RunPod:
| NVIDIA A40 | `NVIDIA A40` |
| NVIDIA A100 80GB | `NVIDIA A100 80GB PCIe` |
| NVIDIA RTX 4090 | `NVIDIA GeForce RTX 4090` |
| NVIDIA RTX 3090 | `NVIDIA GeForce RTX 3090` |
| NVIDIA L40S | `NVIDIA L40S` |
Use `client.list_gpu_types()` to get the full list with availability.
## Running the Example
```bash
# Clone the project
git clone https://github.com/halldyll/starter.git
cd starter
# Create your .env file
cp .env.example .env
# Edit .env with your API key and settings
# Run the example
cargo run
```
## Building
```bash
# Debug build
cargo build
# Release build
cargo build --release
# Check without building
cargo check
# Run with all lints
cargo clippy -- -D warnings
```
## Contributing
1. Fork the repository
2. Create your feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request
## License
MIT