torc 0.20.7

Workflow management system
# Working with HPC Profiles

HPC (High-Performance Computing) profiles provide pre-configured knowledge about specific HPC
systems, including their partitions, resource limits, and optimal settings. Torc uses this
information to automatically match job requirements to appropriate partitions.

## Overview

HPC profiles contain:

- **Partition definitions**: Available queues with their resource limits (CPUs, memory, walltime,
  GPUs)
- **Detection rules**: How to identify when you're on a specific HPC system
- **Default settings**: Account names and other system-specific defaults

Built-in profiles are available for systems like NLR's Kestrel. You can also define custom profiles
for private clusters.

## Listing Available Profiles

View all known HPC profiles:

```bash
torc hpc list
```

Example output:

```
Known HPC profiles:

╭─────────┬──────────────┬────────────┬──────────╮
│ Name    │ Display Name │ Partitions │ Detected │
├─────────┼──────────────┼────────────┼──────────┤
│ kestrel │ NLR Kestrel  │ 15         │ ✓        │
╰─────────┴──────────────┴────────────┴──────────╯
```

The "Detected" column shows if Torc recognizes you're currently on that system.

## Dynamic Slurm Profiles

For Slurm-based clusters without a built-in profile, Torc can dynamically generate a profile by
querying the cluster itself. This means you can use Torc on almost any Slurm system without manual
configuration.

To use dynamic Slurm detection, you can:

1. **Explicitly request it**: Use `--hpc-profile slurm` in any command that requires a profile.
2. **Let Torc auto-detect it**: If you're on a Slurm system and haven't specified a profile or
   matched a built-in one, Torc will automatically fall back to dynamic Slurm detection.

Dynamic profiles are generated by:

- Running `sinfo` to discover partitions, CPU/memory limits, and GRES (GPUs).
- Running `scontrol show partition` to find shared node settings and default QOS.
- Heuristically inferring GPU types if not explicitly reported by Slurm.

## Detecting the Current System

Torc can automatically detect which HPC system you're on:

```bash
torc hpc detect
```

Torc uses a prioritized detection strategy:

1. **Built-in Profiles**: Matches known systems via environment variables or hostname patterns.
2. **Custom Profiles**: Matches your configured custom profiles.
3. **Dynamic Slurm**: If Slurm commands (`sinfo`) are available, generates a profile from the
   current cluster.

## Viewing Profile Details

See detailed information about a specific profile:

```bash
torc hpc show kestrel
```

You can also view the dynamically detected Slurm profile:

```bash
torc hpc show slurm
```

## Viewing Available Partitions

List all partitions for a profile:

```bash
torc hpc partitions kestrel
```

For the current Slurm cluster:

```bash
torc hpc partitions slurm
```

## Finding Matching Partitions

Find partitions that can satisfy specific resource requirements:

```bash
torc hpc match --cpus 32 --memory 64g --walltime 02:00:00
```

If no profile is specified, it will use the detected system (including dynamic Slurm).

## Custom HPC Profiles

If your HPC system doesn't have a built-in profile, you have three options:

1. **Use Dynamic Slurm Detection** (Easiest): Let Torc automatically discover your cluster's
   capabilities.
2. **Generate and Customize a Profile**: Run `torc hpc generate` to create a TOML template based on
   your cluster, then customize it in your config file.
3. **Request Built-in Support**: If your HPC is widely used,
   [open an issue]https://github.com/NatLabRockies/torc/issues requesting built-in support.

### Quick Example

Define custom profiles in your configuration file:

```toml
# ~/.config/torc/config.toml

[client.hpc.custom_profiles.mycluster]
display_name = "My Research Cluster"
description = "Internal research HPC system"
detect_env_var = "MY_CLUSTER=research"
default_account = "default_project"

[[client.hpc.custom_profiles.mycluster.partitions]]
name = "compute"
cpus_per_node = 64
memory_mb = 256000
max_walltime_secs = 172800
shared = false

[[client.hpc.custom_profiles.mycluster.partitions]]
name = "gpu"
cpus_per_node = 32
memory_mb = 128000
max_walltime_secs = 86400
gpus_per_node = 4
gpu_type = "A100"
shared = false
```

See [Configuration Reference](../../core/reference/configuration.md) for full configuration options.

## Using Profiles with Slurm Workflows

HPC profiles are used by Slurm-related commands to automatically generate scheduler configurations.
See [Advanced Slurm Configuration](./slurm.md) for details on:

- `torc submit-slurm` - Submit workflows with auto-generated schedulers
- `torc workflows create-slurm` - Create workflows with auto-generated schedulers

## See Also

- [Advanced Slurm Configuration]./slurm.md
- [Custom HPC Profile Tutorial]./custom-hpc-profile.md
- [HPC Profiles Reference]./hpc-profiles-reference.md
- [Configuration Reference]../../core/reference/configuration.md
- [Resource Requirements Reference]../../core/reference/resources.md