torc 0.22.2

Workflow management system
# Quick Start (Remote Workers)

This guide walks you through running a Torc workflow on multiple remote machines via SSH. Jobs are
distributed across workers without requiring an HPC scheduler like Slurm.

For local execution, see [Quick Start (Local)](../getting-started/quick-start-local.md). For
HPC/Slurm execution, see [Quick Start (HPC)](../getting-started/quick-start-hpc.md).

## Prerequisites

- SSH key-based authentication to all remote machines (no password prompts)
- Torc installed on all machines with **matching versions**
- Torc server accessible from all machines

## Start the Server

Start a Torc server. By default, it binds to `0.0.0.0` so it's accessible from remote machines:

```console
torc-server run --database torc.db --port 8080
```

> **Security Note:** The server starts without authentication and is accessible from any machine
> that can reach this host. For networks with untrusted users, see
> [Authentication]../specialized/admin/authentication.md to secure your server.

## Create a Workflow

Save this as `workflow.yaml`:

```yaml
name: distributed_hello
description: Distributed hello world workflow

jobs:
  - name: job 1
    command: echo "Hello from $(hostname)!"
  - name: job 2
    command: echo "Hello again from $(hostname)!"
  - name: job 3
    command: echo "And once more from $(hostname)!"
```

## Create the Workflow on the Server

```console
torc create workflow.yaml
```

Note the workflow ID in the output.

## Add Remote Workers

Add remote machines as workers. Each address uses the format `[user@]hostname[:port]`:

```console
torc remote add-workers <workflow-id> user@host1 user@host2 user@host3
```

Or add workers from a file (one address per line, `#` for comments):

```console
torc remote add-workers-from-file workers.txt <workflow-id>
```

## Run Workers on Remote Machines

Start workers on all registered remote machines via SSH:

```console
torc remote run <workflow-id>
```

This will:

1. Check SSH connectivity to all machines
2. Verify all machines have the same torc version
3. Start a worker process on each machine (detached via `nohup`)
4. Report which workers started successfully

## Check Worker Status

Monitor which workers are still running:

```console
torc remote status <workflow-id>
```

## View Workflow Progress

Check job status from any machine:

```console
torc jobs list <workflow-id>
```

Or use the interactive TUI:

```console
torc tui
```

## Collect Logs

After the workflow completes, collect logs from all workers:

```console
torc remote collect-logs <workflow-id> --local-output-dir ./logs
```

This creates a tarball for each worker containing:

- Worker logs: `torc_worker_<workflow_id>.log`
- Job stdout/stderr: `job_stdio/job_*.o` and `job_stdio/job_*.e`
- Resource utilization data (if enabled): `resource_utilization/resource_metrics_*.db`

## Stop Workers

If you need to stop workers before the workflow completes:

```console
torc remote stop <workflow-id>
```

Add `--force` to send SIGKILL instead of SIGTERM.

## Next Steps

- [CLI Cheat Sheet]../core/reference/cli-cheatsheet.md - Quick reference for all common commands
- [Remote Workers Guide]./remote-workers.md - Detailed configuration and troubleshooting
- [Creating Workflows]../../core/workflows/creating-workflows.md - Workflow specification format
- [Resource Monitoring]../../core/monitoring/resource-monitoring.md - Track CPU/memory usage per
  job