# Slurm Overview
This document explains how Torc simplifies running workflows on Slurm-based HPC systems. The key
insight is that **you don't need to understand Slurm schedulers or workflow actions** to run
workflows on HPC systems—Torc handles this automatically.
## The Simple Approach
Running a workflow on Slurm requires just two things:
1. **Define your jobs with resource requirements**
2. **Submit with `slurm generate` + `submit`**
That's it. Torc will analyze your workflow, generate appropriate Slurm configurations, and submit
everything for execution.
> **⚠️ Important:** The `slurm generate` + `submit` command uses heuristics to auto-generate Slurm
> schedulers and workflow actions. For complex workflows with unusual dependency patterns, the
> generated configuration may not be optimal and could result in suboptimal allocation timing.
> **Always preview the configuration first** using `torc slurm generate` (see
> [Previewing Generated Configuration](#previewing-generated-configuration)) before submitting
> production workflows.
### Example Workflow
Here's a complete workflow specification that runs on Slurm:
```yaml
name: data_analysis_pipeline
description: Analyze experimental data with preprocessing, training, and evaluation
resource_requirements:
- name: light
num_cpus: 4
memory: 8g
runtime: PT30M
- name: compute
num_cpus: 32
memory: 64g
runtime: PT2H
- name: gpu
num_cpus: 16
num_gpus: 2
memory: 128g
runtime: PT4H
jobs:
- name: preprocess
command: python preprocess.py --input data/ --output processed/
resource_requirements: light
- name: train_model
command: python train.py --data processed/ --output model/
resource_requirements: gpu
depends_on: [preprocess]
- name: evaluate
command: python evaluate.py --model model/ --output results/
resource_requirements: compute
depends_on: [train_model]
- name: generate_report
command: python report.py --results results/
resource_requirements: light
depends_on: [evaluate]
```
### Submitting the Workflow
```bash
torc slurm generate --account myproject workflow.yaml
torc submit workflow.yaml
```
Torc will:
1. Detect which HPC system you're on (e.g., NLR Kestrel)
2. Match each job's requirements to appropriate partitions
3. Generate Slurm scheduler configurations
4. Create workflow actions that stage resource allocation based on dependencies
5. Submit the workflow for execution
## How It Works
When you use `slurm generate` + `submit`, Torc performs intelligent analysis of your workflow:
### 1. Per-Job Scheduler Generation
Each job gets its own Slurm scheduler configuration based on its resource requirements. This means:
- Jobs are matched to the most appropriate partition
- Memory, CPU, and GPU requirements are correctly specified
- Walltime is set to the partition's maximum (explained below)
### 2. Staged Resource Allocation
Torc analyzes job dependencies and creates **staged workflow actions**:
- **Jobs without dependencies** trigger `on_workflow_start` — resources are allocated immediately
- **Jobs with dependencies** trigger `on_jobs_ready` — resources are allocated only when the job
becomes ready to run
This prevents wasting allocation time on resources that aren't needed yet. For example, in the
workflow above:
- `preprocess` resources are allocated at workflow start
- `train_model` resources are allocated when `preprocess` completes
- `evaluate` resources are allocated when `train_model` completes
- `generate_report` resources are allocated when `evaluate` completes
### 3. Walltime Calculation
By default, Torc sets the walltime to **1.5× your longest job's runtime** (capped at the partition's
maximum). This provides headroom for jobs that run slightly longer than expected.
You can customize this behavior:
- `--walltime-strategy max-job-runtime` (default): Uses longest job runtime × multiplier
- `--walltime-strategy max-partition-time`: Uses the partition's maximum walltime
- `--walltime-multiplier 2.0`: Change the safety multiplier (default: 1.5)
See [Walltime Strategy Options](#walltime-strategy-options) for details.
### 4. HPC Profile Knowledge
Torc includes built-in knowledge of HPC systems like NLR Kestrel, including:
- Available partitions and their resource limits
- GPU configurations
- Memory and CPU specifications
- Special requirements (e.g., minimum node counts for high-bandwidth partitions)
> **Using an unsupported HPC?** Please
> [request built-in support](https://github.com/NatLabRockies/torc/issues) so everyone benefits. You
> can also [create a custom profile](./custom-hpc-profile.md) for immediate use.
## Resource Requirements Specification
Resource requirements are the key to the simplified workflow. Define them once and reference them
from jobs:
```yaml
resource_requirements:
- name: small
num_cpus: 4
num_gpus: 0
num_nodes: 1
memory: 8g
runtime: PT1H
- name: gpu_training
num_cpus: 32
num_gpus: 4
num_nodes: 1
memory: 256g
runtime: PT8H
```
### Fields
| Field | Description | Example |
| ----------- | ------------------------- | ------------------- |
| `name` | Reference name for jobs | `"compute"` |
| `num_cpus` | CPU cores required | `32` |
| `num_gpus` | GPUs required (0 if none) | `2` |
| `num_nodes` | Nodes required | `1` |
| `memory` | Memory with unit suffix | `"64g"`, `"512m"` |
| `runtime` | ISO8601 duration | `"PT2H"`, `"PT30M"` |
### Runtime Format
Use ISO8601 duration format:
- `PT30M` — 30 minutes
- `PT2H` — 2 hours
- `PT1H30M` — 1 hour 30 minutes
- `P1D` — 1 day
- `P2DT4H` — 2 days 4 hours
## Job Dependencies
Define dependencies explicitly or implicitly through file/data relationships:
### Explicit Dependencies
```yaml
jobs:
- name: step1
command: ./step1.sh
resource_requirements: small
- name: step2
command: ./step2.sh
resource_requirements: small
depends_on: [step1]
- name: step3
command: ./step3.sh
resource_requirements: small
depends_on: [step1, step2] # Waits for both
```
### Implicit Dependencies (via Files)
```yaml
files:
- name: raw_data
path: /data/raw.csv
- name: processed_data
path: /data/processed.csv
jobs:
- name: process
command: python process.py
input_files: [raw_data]
output_files: [processed_data]
resource_requirements: compute
- name: analyze
command: python analyze.py
input_files: [processed_data] # Creates implicit dependency on 'process'
resource_requirements: compute
```
## Previewing Generated Configuration
> **Recommended Practice:** Always preview the generated configuration before submitting to Slurm,
> especially for complex workflows. This allows you to verify that schedulers and actions are
> appropriate for your workflow structure.
### Viewing the Execution Plan
Before generating schedulers, visualize how your workflow will execute in stages:
```bash
torc workflows execution-plan workflow.yaml
```
This shows the execution stages, which jobs run at each stage, and (if schedulers are defined) when
Slurm allocations are requested. See
[Visualizing Workflow Structure](../../core/workflows/visualizing-workflows.md) for detailed
examples.
### Generating Slurm Configuration
Preview what Torc will generate:
```bash
torc slurm generate --account myproject --profile kestrel workflow.yaml
```
This outputs the complete workflow with generated schedulers and actions:
#### Scheduler Grouping Options
By default, Torc creates **one scheduler per unique `resource_requirements` name**. This means if
you have three jobs with three different resource requirement definitions (e.g., `cpu`, `memory`,
`mixed`), you get three schedulers—even if all three would fit on the same partition.
The `--group-by` option controls how jobs are grouped into schedulers:
```bash
# Default: one scheduler per resource_requirements name
torc slurm generate --account myproject workflow.yaml
torc slurm generate --account myproject --group-by resource-requirements workflow.yaml
# Result: 3 schedulers (cpu_scheduler, memory_scheduler, mixed_scheduler)
# Group by partition: one scheduler per partition
torc slurm generate --account myproject --group-by partition workflow.yaml
# Result: 1 scheduler (short_scheduler) if all jobs fit on the "short" partition
```
**When to use `--group-by partition`:**
- Your workflow has many small resource requirement definitions that all fit on the same partition
- You want to minimize Slurm queue overhead by reducing the number of allocations
- Jobs have similar characteristics and can share nodes efficiently
**When to use `--group-by resource-requirements` (default):**
- Jobs have significantly different resource profiles that benefit from separate allocations
- You want fine-grained control over which jobs share resources
- You're debugging and want clear separation between job types
When grouping by partition, the scheduler uses the **maximum** resource values from all grouped
requirements (max memory, max CPUs, max runtime, etc.) to ensure all jobs can run.
#### Walltime Strategy Options
The `--walltime-strategy` option controls how Torc calculates the walltime for generated schedulers:
```bash
# Default: use max job runtime with a safety multiplier (1.5x)
torc slurm generate --account myproject workflow.yaml
torc slurm generate --account myproject --walltime-strategy max-job-runtime workflow.yaml
# Use the partition's maximum allowed walltime
torc slurm generate --account myproject --walltime-strategy max-partition-time workflow.yaml
```
**Walltime strategies:**
| `max-job-runtime` | Uses the longest job's runtime × multiplier (default: 1.5x). Capped at partition max. |
| `max-partition-time` | Uses the partition's maximum walltime. More conservative but may impact queue scheduling. |
**Customizing the multiplier:**
The `--walltime-multiplier` option (default: 1.5) provides a safety margin when using
`max-job-runtime`:
```bash
# Use 2x the max job runtime for extra buffer
torc slurm generate --account myproject --walltime-multiplier 2.0 workflow.yaml
# Use exact job runtime (no buffer - use with caution)
torc slurm generate --account myproject --walltime-multiplier 1.0 workflow.yaml
```
**When to use `max-job-runtime` (default):**
- You want better queue scheduling (shorter walltime requests often get prioritized)
- Your job runtime estimates are reasonably accurate
- You prefer the Torc runner to exit early rather than holding idle allocations
**When to use `max-partition-time`:**
- Your job runtimes are highly variable or unpredictable
- You consistently underestimate job runtimes
- Queue priority is not a concern
```yaml
name: data_analysis_pipeline
# ... original content ...
jobs:
- name: preprocess
command: python preprocess.py --input data/ --output processed/
resource_requirements: light
scheduler: preprocess_scheduler
# ... more jobs ...
slurm_schedulers:
- name: preprocess_scheduler
account: myproject
mem: 8g
nodes: 1
walltime: "04:00:00"
- name: train_model_scheduler
account: myproject
mem: 128g
nodes: 1
gres: "gpu:2"
walltime: "04:00:00"
# ... more schedulers ...
actions:
- trigger_type: on_workflow_start
action_type: schedule_nodes
scheduler: preprocess_scheduler
scheduler_type: slurm
num_allocations: 1
- trigger_type: on_jobs_ready
action_type: schedule_nodes
jobs: [train_model]
scheduler: train_model_scheduler
scheduler_type: slurm
num_allocations: 1
# ... more actions ...
```
Save the output to inspect or modify before submission:
```bash
torc slurm generate --account myproject workflow.yaml -o workflow_with_schedulers.yaml
```
## Torc Server Considerations
The Torc server must be accessible to compute nodes. Options include:
1. **Shared server** (Recommended): A team member allocates a dedicated server in the HPC
environment
2. **Login node**: Suitable for small workflows with few, long-running jobs
For large workflows with many short jobs, a dedicated server prevents overloading login nodes.
## Best Practices
### 1. Focus on Resource Requirements
Spend time accurately defining resource requirements. Torc handles the rest:
```yaml
resource_requirements:
# Be specific about what each job type needs
- name: io_heavy
num_cpus: 4
memory: 32g # High memory for data loading
runtime: PT1H
- name: compute_heavy
num_cpus: 64
memory: 16g # Less memory, more CPU
runtime: PT4H
```
### 2. Use Meaningful Names
Name resource requirements by their purpose, not by partition:
```yaml
# Good - describes the workload
resource_requirements:
- name: data_preprocessing
- name: model_training
- name: inference
# Avoid - ties you to specific infrastructure
resource_requirements:
- name: short_partition
- name: gpu_h100
```
### 3. Group Similar Jobs
Jobs with similar requirements can share resource requirement definitions:
```yaml
resource_requirements:
- name: quick_task
num_cpus: 2
memory: 4g
runtime: PT15M
jobs:
- name: validate_input
command: ./validate.sh
resource_requirements: quick_task
- name: check_output
command: ./check.sh
resource_requirements: quick_task
depends_on: [main_process]
```
### 4. Test Locally First
Validate your workflow logic locally before submitting to HPC:
```bash
# Run locally (without Slurm)
torc run workflow.yaml
# Then submit to HPC
torc slurm generate --account myproject workflow.yaml
torc submit workflow.yaml
```
## Limitations and Caveats
The auto-generation in `torc slurm generate` uses heuristics that work well for common workflow
patterns but may not be optimal for all cases:
### When Auto-Generation Works Well
- **Linear pipelines**: A → B → C → D
- **Fan-out patterns**: One job unblocks many (e.g., preprocess → 100 work jobs)
- **Fan-in patterns**: Many jobs unblock one (e.g., 100 work jobs → postprocess)
- **Simple DAGs**: Clear dependency structures with distinct resource tiers
### When to Use Manual Configuration
Consider using `torc slurm generate` to preview and manually adjust, or define schedulers manually,
when:
- **Complex dependency graphs**: Multiple interleaved dependency patterns
- **Shared schedulers**: You want multiple jobs to share the same Slurm allocation
- **Custom timing**: Specific requirements for when allocations should be requested
- **Resource optimization**: Fine-tuning to minimize allocation waste
- **Multi-node jobs**: Jobs requiring coordination across multiple nodes (see
[Multi-Node Jobs](./multi-node-jobs.md))
### What Could Go Wrong
Without previewing, auto-generation might:
1. **Request allocations too early**: Wasting queue time waiting for dependencies
2. **Request allocations too late**: Adding latency to job startup
3. **Create suboptimal scheduler groupings**: Not sharing allocations when beneficial
4. **Miss optimization opportunities**: Not recognizing patterns that could share resources
**Best Practice**: For production workflows, always run `torc slurm generate` first, review the
output, and submit the reviewed configuration with `torc submit`.
## Advanced: Manual Scheduler Configuration
For advanced users who need fine-grained control, you can define schedulers and actions manually.
See [Advanced Slurm Configuration](./slurm.md) for details.
Common reasons for manual configuration:
- Non-standard partition requirements
- Custom Slurm directives (e.g., `--constraint`)
- Multi-node jobs with specific topology requirements
- Reusing allocations across multiple jobs for efficiency
## Troubleshooting
### "No partition found for job"
Your resource requirements exceed what's available. Check:
- Memory doesn't exceed partition limits
- Runtime doesn't exceed partition walltime
- GPU count is available on GPU partitions
Use `torc hpc partitions <profile>` to see available resources.
### Jobs Not Starting
Ensure the Torc server is accessible from compute nodes:
```bash
# From a compute node
curl $TORC_API_URL/health
```
### Wrong Partition Selected
Use `torc hpc match` to see which partitions match your requirements:
```bash
torc hpc match kestrel --cpus 32 --memory 64g --walltime 02:00:00 --gpus 2
```
## See Also
- [Visualizing Workflow Structure](../../core/workflows/visualizing-workflows.md) — Execution plans
and DAG visualization
- [HPC Profiles](./hpc-profiles.md) — Detailed HPC profile usage
- [Advanced Slurm Configuration](./slurm.md) — Manual Slurm scheduler setup
- [Resource Requirements Reference](../../core/reference/resources.md) — Complete specification
- [Workflow Actions](../design/workflow-actions.md) — Understanding actions