# Creating a Custom HPC Profile
This tutorial walks you through creating a custom HPC profile for a cluster that Torc doesn't have
built-in support for.
## Before You Start
> **Request Built-in Support First!**
>
> If your HPC system is widely used, consider requesting that Torc developers add it as a built-in
> profile. This benefits everyone using that system.
>
> Open an issue at
> [github.com/NatLabRockies/torc/issues](https://github.com/NatLabRockies/torc/issues) with:
>
> - Your HPC system name and organization
> - Partition names and their resource limits (CPUs, memory, walltime, GPUs)
> - How to detect the system (environment variable or hostname pattern)
> - Any special requirements (minimum nodes, exclusive partitions, etc.)
>
> Built-in profiles are maintained by the Torc team and stay up-to-date as systems change.
## Dynamic Slurm Support (The Easiest Way)
Before creating a custom profile, try using Torc's **Dynamic Slurm Support**. Torc can automatically
query your cluster to discover its partitions and resource limits.
If you are on a Slurm system, you can use Torc immediately without any configuration:
1. **Auto-detection**: Torc automatically falls back to dynamic Slurm detection if no other profile
matches.
2. **Explicit use**: You can force dynamic detection by using `--hpc-profile slurm` in any command.
Verify it works on your system:
```bash
# Show partitions detected from your Slurm cluster
torc hpc partitions slurm
```
If the detected partitions look correct, you don't need to create a custom profile! You can jump
straight to [Step 7: Use Your Profile](#step-7-use-your-profile) using `slurm` as the profile name.
## When to Create a Custom Profile
Create a custom profile when:
- Your HPC isn't supported and you need to use it immediately
- You have a private or internal cluster
- You want to test profile configurations before submitting upstream
## Quick Start: Auto-Generate from Slurm
If you're on a Slurm cluster, you can automatically generate a profile from the cluster
configuration:
```bash
# Generate profile from current Slurm cluster
torc hpc generate
# Specify a custom name
torc hpc generate --name mycluster --display-name "My Research Cluster"
# Skip standby/preemptible partitions
torc hpc generate --skip-stdby
# Save to a file
torc hpc generate --skip-stdby -o mycluster-profile.toml
```
This queries `sinfo` and `scontrol` to extract:
- Partition names, CPUs, memory, and time limits
- GPU configuration from GRES
- Node sharing settings
- Hostname-based detection pattern
The generated profile can be added directly to your config file. You may want to review and adjust:
- `requires_explicit_request`: Set to `true` for partitions that shouldn't be auto-selected
- `description`: Add human-readable descriptions for each partition
After generation, skip to [Step 4: Verify the Profile](#step-4-verify-the-profile).
## Manual Profile Creation
If automatic generation isn't available or you need more control, follow these steps.
### Step 1: Gather Partition Information
Collect information about your HPC's partitions. On most Slurm systems:
```bash
# List all partitions
sinfo -s
# Get detailed partition info
sinfo -o "%P %c %m %l %G"
```
For this tutorial, let's say your cluster "ResearchCluster" has these partitions:
| `batch` | 48 | 192 GB | 72 hours | - |
| `short` | 48 | 192 GB | 4 hours | - |
| `gpu` | 32 | 256 GB | 48 hours | 4x A100 |
| `himem` | 48 | 1024 GB | 48 hours | - |
### Step 2: Identify Detection Method
Determine how Torc can detect when you're on this system. Common methods:
**Environment variable** (most common):
```bash
echo $CLUSTER_NAME # e.g., "research"
echo $SLURM_CLUSTER # e.g., "researchcluster"
```
**Hostname pattern**:
```bash
hostname # e.g., "login01.research.edu"
```
For this tutorial, we'll use the environment variable `CLUSTER_NAME=research`.
### Step 3: Create the Configuration File
Create or edit your Torc configuration file:
```bash
# Linux
mkdir -p ~/.config/torc
nano ~/.config/torc/config.toml
# macOS
mkdir -p ~/Library/Application\ Support/torc
nano ~/Library/Application\ Support/torc/config.toml
```
Add your custom profile:
```toml
# Custom HPC Profile for ResearchCluster
[client.hpc.custom_profiles.research]
display_name = "Research Cluster"
description = "University Research HPC System"
detect_env_var = "CLUSTER_NAME=research"
default_account = "my_project"
# Batch partition - general purpose
[[client.hpc.custom_profiles.research.partitions]]
name = "batch"
cpus_per_node = 48
memory_mb = 192000 # 192 GB in MB
max_walltime_secs = 259200 # 72 hours in seconds
shared = false
# Short partition - quick jobs
[[client.hpc.custom_profiles.research.partitions]]
name = "short"
cpus_per_node = 48
memory_mb = 192000
max_walltime_secs = 14400 # 4 hours
shared = true # Allows sharing nodes
# GPU partition
[[client.hpc.custom_profiles.research.partitions]]
name = "gpu"
cpus_per_node = 32
memory_mb = 256000 # 256 GB
max_walltime_secs = 172800 # 48 hours
gpus_per_node = 4
gpu_type = "A100"
shared = false
# High memory partition
[[client.hpc.custom_profiles.research.partitions]]
name = "himem"
cpus_per_node = 48
memory_mb = 1048576 # 1024 GB (1 TB)
max_walltime_secs = 172800 # 48 hours
shared = false
```
## Step 4: Verify the Profile
Check that Torc recognizes your profile:
```bash
# List all profiles
torc hpc list
```
You should see your custom profile:
```
Known HPC profiles:
╭──────────┬──────────────────┬────────────┬──────────╮
│ Name │ Display Name │ Partitions │ Detected │
├──────────┼──────────────────┼────────────┼──────────┤
│ kestrel │ NLR Kestrel │ 15 │ │
│ research │ Research Cluster │ 4 │ ✓ │
╰──────────┴──────────────────┴────────────┴──────────╯
```
View the partitions:
```bash
torc hpc partitions research
```
```
Partitions for research:
╭─────────┬───────────┬───────────┬─────────────┬──────────╮
│ Name │ CPUs/Node │ Mem/Node │ Max Walltime│ GPUs │
├─────────┼───────────┼───────────┼─────────────┼──────────┤
│ batch │ 48 │ 192 GB │ 72h │ - │
│ short │ 48 │ 192 GB │ 4h │ - │
│ gpu │ 32 │ 256 GB │ 48h │ 4 (A100) │
│ himem │ 48 │ 1024 GB │ 48h │ - │
╰─────────┴───────────┴───────────┴─────────────┴──────────╯
```
## Step 5: Test Partition Matching
Verify that Torc correctly matches resource requirements to partitions:
```bash
# Should match 'short' partition
torc hpc match research --cpus 8 --memory 16g --walltime 02:00:00
# Should match 'gpu' partition
torc hpc match research --cpus 16 --memory 64g --walltime 08:00:00 --gpus 2
# Should match 'himem' partition
torc hpc match research --cpus 24 --memory 512g --walltime 24:00:00
```
## Step 6: Test Scheduler Generation
Create a test workflow to verify scheduler generation:
```yaml
# test_workflow.yaml
name: profile_test
description: Test custom HPC profile
resource_requirements:
- name: standard
num_cpus: 16
memory: 64g
runtime: PT2H
- name: gpu_compute
num_cpus: 16
num_gpus: 2
memory: 128g
runtime: PT8H
jobs:
- name: preprocess
command: echo "preprocessing"
resource_requirements: standard
- name: train
command: echo "training"
resource_requirements: gpu_compute
depends_on: [preprocess]
```
Generate schedulers:
```bash
torc slurm generate --account my_project --profile research test_workflow.yaml
```
You should see the generated workflow with appropriate schedulers for each partition.
## Step 7: Use Your Profile
Now you can submit workflows using your custom profile:
```bash
# Auto-detect the profile (if on the cluster)
torc slurm generate --account my_project -o workflow_slurm.yaml workflow.yaml
torc submit workflow_slurm.yaml
# Or explicitly specify the profile
torc slurm generate --account my_project --profile research -o workflow_slurm.yaml workflow.yaml
torc submit workflow_slurm.yaml
```
## Advanced Configuration
### Hostname-Based Detection
If your cluster doesn't set a unique environment variable, use hostname detection:
```toml
[client.hpc.custom_profiles.research]
display_name = "Research Cluster"
detect_hostname = ".*\\.research\\.edu" # Regex pattern
```
### Minimum Node Requirements
Some partitions require a minimum number of nodes:
```toml
[[client.hpc.custom_profiles.research.partitions]]
name = "large_scale"
cpus_per_node = 128
memory_mb = 512000
max_walltime_secs = 172800
min_nodes = 16 # Must request at least 16 nodes
```
### Explicit Request Partitions
Some partitions shouldn't be auto-selected:
```toml
[[client.hpc.custom_profiles.research.partitions]]
name = "priority"
cpus_per_node = 48
memory_mb = 192000
max_walltime_secs = 86400
requires_explicit_request = true # Only used when explicitly requested
```
## Troubleshooting
### Profile Not Detected
If `torc hpc detect` doesn't find your profile:
1. Check the environment variable or hostname:
```bash
echo $CLUSTER_NAME
hostname
```
2. Verify the detection pattern in your config matches exactly
3. Test with explicit profile specification:
```bash
torc hpc show research
```
### No Partition Found for Job
If `torc slurm generate` can't find a matching partition:
1. Check if any partition satisfies all requirements:
```bash
torc hpc match research --cpus 32 --memory 128g --walltime 08:00:00
```
2. Verify memory is specified in MB in the config (not GB)
3. Verify walltime is in seconds (not hours)
### Configuration File Location
Torc looks for config files in these locations:
- **Linux**: `~/.config/torc/config.toml`
- **macOS**: `~/Library/Application Support/torc/config.toml`
- **Windows**: `%APPDATA%\torc\config.toml`
You can also use the `TORC_CONFIG` environment variable to specify a custom path.
## Contributing Your Profile
If your HPC is used by others, please contribute it upstream:
1. Fork the [Torc repository](https://github.com/NatLabRockies/torc)
2. Add your profile as a new module in `src/client/hpc/` (see `kestrel.rs` for an example)
3. Add tests for your profile
4. Submit a pull request
Or simply [open an issue](https://github.com/NatLabRockies/torc/issues) with your partition
information and we'll add it for you.
## See Also
- [Working with HPC Profiles](./hpc-profiles.md) - General HPC profile usage
- [HPC Profiles Reference](./hpc-profiles-reference.md) - Complete configuration options
- [Slurm Overview](./slurm-workflows.md) - Simplified Slurm approach