row 1.0.0 - Docs.rs

# cluster

An element in `[[cluster]]` is a **table** that defines the configuration of a single
cluster.

For example:
```toml
[[cluster]]
name = "cluster1"
identify.by_environment = ["CLUSTER_NAME", "cluster1"]
scheduler = "slurm"
[[cluster.partition]]
name = "shared"
maximum_cpus_per_job = 127
maximum_gpus_per_job = 0
[[cluster.partition]]
name = "gpu-shared"
minimum_gpus_per_job = 1
[[cluster.partition]]
name = "compute"
require_cpus_multiple_of = 128
maximum_gpus_per_job = 0
[[cluster.partition]]
name = "debug"
maximum_gpus_per_job = 0
prevent_auto_select = true
```

## name

`cluster.name`: **string** - The name of the cluster.

## identify

`cluster.identify`: **table** - Set a condition to identify when **row** is executing
on this cluster. The table **must** have one of the following keys:

* `by_environment`: **array** of two strings - Identify the cluster when the environment
  variable `by_environment[0]` is set and equal to `by_environment[1]`.
* `always`: **bool** - Set to `true` to always identify this cluster. When `false`,
  this cluster may only be chosen by an explicit `--cluster` option.

> [!CAUTION]
> The *first* cluster in the list that sets `identify.always = true` will prevent
> any later cluster from being identified (except by explicit `--cluster=name`).

## scheduler

`cluster.scheduler`: **string** - Set the job scheduler to use on this cluster. Must
be one of:

* `"slurm"`
* `"bash"`

## slurm_gpus_per_task

`cluster.slurm_gpus_per_task`: **string** - Set the `sbatch` command line option that
selects the number of gpus per task (used only by the `slurm` scheduler). When omitted,
`slurm_gpus_per_task` defaults to `--gpus-per-task=`.

## submit_options

`cluster.submit_options`: **array** of **strings** - Scheduler submission options that
are passed to every job on this cluster.

## partition

`cluster.partition`: **array** of **tables** - Define the scheduler partitions that
**row** may select from when submitting jobs. **Row** will check the partitions in the
order provided and choose the *first* partition where the job matches all the
provided conditions. All conditions are optional.

### name

`cluster.partition.name`: **string** - The name of the partition as it should be passed
to the cluster batch submission command.

### maximum_cpus_per_job

`cluster.partition.maximum_cpus_per_job`: **integer** - The maximum number of CPUs that
can be used by a single job on this partition:
```plaintext
total_cpus <= maximum_cpus_per_job
```

### require_cpus_multiple_of

`cluster.partition.require_cpus_multiple_of`: **integer** - All jobs submitted to this
partition **must** use an integer multiple of the given number of cpus:
```plaintext
total_cpus % require_cpus_multiple_of == 0
```

### warn_cpus_not_multiple_of

`cluster.partition.warn_cpus_not_multiple_of`: **integer** - All jobs submitted to this
partition **should** use an integer multiple of the given number of cpus:
```plaintext
if total_cpus % warn_cpus_not_multiple_of != 0:
  warn! ...
```

This is a nonblocking variant of `require_cpus_multiple_of` that allows for submission
of jobs that underutilize resources.

### memory_per_cpu_mb

`cluster.partition.memory_per_cpu`: **integer** - CPU Jobs submitted to this partition
will pass this option to the scheduler. For example SLURM schedulers will set
`--mem-per-cpu=<memory_per_cpu_mb>M`.

### cpus_per_node

`cluster.partition.cpus_per_node`: **string** - Number of CPUs per node.

When `cpus_per_node` is not set, **row** will request `n_processes` tasks. In this case,
some schedulers are free to spread tasks among any number of nodes (for example, shared
partitions on Slurm schedulers).

When `cpus_per_node` is set, **row** will **also** request the minimal number of nodes
needed to satisfy `n_nodes * cpus_per_node >= total_cpus`. This may result in longer
queue times, but will lead to more stable performance for users.

> [!TIP]
> Set `cpus_per_node` only when all nodes in the partition have the same number
> of CPUs.

### minimum_gpus_per_job

`cluster.partition.minimum_gpus_per_job`: **integer** - The minimum number of gpus that
must be used by a single job on this partition:
```plaintext
total_gpus >= minimum_gpus_per_job
```

### maximum_gpus_per_job

`cluster.partition.maximum_gpus_per_job`: **integer** - The maximum number of gpus that
can be used by a single job on this partition:
```plaintext
total_gpus <= maximum_gpus_per_job
```

### require_gpus_multiple_of

`cluster.partition.require_gpus_multiple_of`: **integer** - All jobs submitted to this
partition **must** use an integer multiple of the given number of gpus:
```plaintext
total_gpus % require_gpus_multiple_of == 0
```

### warn_gpus_not_multiple_of

`cluster.partition.warn_gpus_not_multiple_of`: **integer** - All jobs submitted to this
partition **should** use an integer multiple of the given number of gpus:
```plaintext
if total_gpus % warn_gpus_not_multiple_of != 0:
  warn! ...
```

This is a nonblocking variant of `require_gpus_multiple_of` that allows for submission
of jobs that underutilize resources.

### memory_per_gpu_mb

`cluster.partition.memory_per_gpu_mb`: **integer** - GPU Jobs submitted to this partition
will pass this option to the scheduler. For example SLURM schedulers will set
`--mem-per-gpu=<memory_per_gpu_mb>M`.

### gpus_per_node

`cluster.partition.gpus_per_node`: **string** - Number of GPUs per node. Like
`cpus_per_node` but used when jobs request GPUs.

### prevent_auto_select

`cluster.partition.prevent_auto_select`: **boolean** - Set to true to prevent row from
automatically selecting this partition.

### account_suffix

`cluster.partition.account_suffix`: **string** - An account suffix when submitting jobs
to this partition. Useful when clusters define separate `account-cpu` and `account-gpu`
accounts.