Offload
A flexible parallel test runner written in Rust with pluggable execution providers. By Imbue AI.
Features
- Parallel execution across multiple sandboxes (local processes or remote environments)
- Pluggable providers: local, default (custom shell commands), and Modal
- Multiple test frameworks: pytest, cargo test, or any custom runner
- Automatic retry with flaky test detection
- JUnit XML reporting
- LPT scheduling when historical timing data is available, with round-robin fallback
- Group-level filtering to split tests into groups with different filters and retry policies
- Environment variable expansion in config values (
${VAR}and${VAR:-default}) - Bundled script references using
@filename.extsyntax in commands
Installation
From crates.io:
From source:
Prerequisites
Core:
- Rust toolchain (
cargo) to install Offload
For Modal providers (type = "modal" or type = "default" with @modal_sandbox.py):
- uv — the bundled
modal_sandbox.pyis invoked viauv run, which auto-installs its dependencies (modal,click) - Python >= 3.10
- A Modal account — authenticate with
modal token new
For the pytest framework (local test discovery):
- Python and pytest installed locally — Offload runs
pytest --collect-onlyon the local machine to discover tests - The configured Python runner (e.g.
uv,poetry,python) must be on PATH
For the cargo framework:
- cargo-nextest — Offload runs
cargo nextest listfor test discovery. Install withcargo install cargo-nextest
For the default framework:
- Whatever tools your
discover_commandandrun_commandinvoke
Invariants and Expectations
Offload relies on a stable relationship between test discovery, execution, and result reporting. Understanding these expectations is essential when using the default framework or debugging test ID mismatches.
Discovery
Each group triggers its own discovery call. The discovered test IDs become the canonical identifiers for the entire run.
- pytest: Runs
{command} --collect-only -q(or{python} {extra_args} -m pytest --collect-only -qin legacy mode) locally and parses one test ID per line from stdout. Output format:path/to/test.py::TestClass::test_method. Groupfiltersare appended as extra pytest args (e.g.-m 'not slow'). - cargo: Runs
cargo nextest list --message-format jsonlocally and parses test IDs from the JSON output. Test IDs are formatted as{binary_id} {test_name}. Groupfiltersare appended as extra nextest args. - default: Runs
discover_commandthroughsh -cand reads one test ID per line from stdout. The{filters}placeholder is replaced with the group's filter string (or empty string). Lines starting with#are ignored.
Test ID Matching
Offload matches discovered test IDs to JUnit XML results using a test_id_format string that controls how JUnit XML name and classname attributes are combined into a test ID. For example, "{name}" uses just the name attribute; "{classname} {name}" joins them with a space. This is the most common source of "Not Run" errors.
- The JUnit attributes produced by the test runner must match the test ID from discovery after applying
test_id_format. If they don't match, Offload reports the test as "Not Run". - pytest: The format is fixed internally as
"{name}". The_set_junit_test_idconftest fixture writes the full nodeid into the JUnitnameattribute so it matches thepytest --collect-onlyoutput. Users do not configure this field. - cargo: The format is fixed internally as
"{classname} {name}"where classname is the binary ID and name is the test function. Users do not configure this field. - default: The
test_id_formatfield is a required configuration option. Set it to match how your test runner populates the JUnit XMLnameandclassnameattributes.
Result Reporting
After execution, Offload collects results via one of two mechanisms:
- JUnit XML (recommended): The test command writes a JUnit XML file. For the
defaultframework, configureresult_filewith the path and use{result_file}inrun_command. For pytest and cargo, Offload generates the--junitxml/ nextest JUnit flags automatically. - Exit code fallback (default framework only): If no
result_fileis configured, Offload infers pass/fail from the command's exit code. This loses per-test granularity — all tests are reported under a syntheticall_testsID, and flaky test detection will not work.
Retry and Flaky Test Behavior
- Tests are retried up to
retry_counttimes (configured per group). - Retries run in parallel across available sandboxes.
- If any retry attempt passes, the test is reported as passed.
- A test that passes after a failure is marked as flaky (exit code 2).
- Without JUnit XML result files, retries cannot identify individual test failures and may behave incorrectly.
Exit Codes
| Code | Meaning |
|---|---|
| 0 | All tests passed |
| 1 | One or more tests failed, or tests were not run |
| 2 | All tests passed, but some were flaky (passed only on retry) |
Output Files
After a test run, Offload writes per-batch log files to {output_dir}/logs/:
| File | Meaning |
|---|---|
batch-{N}.success |
All tests in batch N passed |
batch-{N}.failure |
One or more tests in batch N failed |
batch-{N}.error |
Infrastructure error (e.g., sandbox failure, JUnit download failure) |
batch-{N}.cancelled |
Batch was cancelled before completion (e.g., early stopping) |
Each file contains stdout and stderr from all tests in that batch. Stderr lines are prefixed with [stderr].
The {output_dir} defaults to test-results and is configurable via [report] output_dir.
Quick Start
- Initialize a configuration file:
-
Edit
offload.tomlas needed for your project. -
Run tests:
CLI Reference
Global Flags
| Flag | Description |
|---|---|
-c, --config PATH |
Configuration file path (default: offload.toml) |
-v, --verbose |
Enable verbose output |
offload run
Run tests in parallel.
| Flag | Description |
|---|---|
--parallel N |
Override maximum parallel sandboxes |
--collect-only |
Discover tests without running them |
--copy-dir LOCAL:REMOTE |
Copy a directory into each sandbox (repeatable) |
--env KEY=VALUE |
Set an environment variable in sandboxes (repeatable) |
--no-cache |
Skip cached image lookup during prepare (forces fresh build) |
offload collect
Discover tests without running them.
| Flag | Description |
|---|---|
-f, --format text|json |
Output format (default: text) |
offload validate
Validate the configuration file and print a summary of settings.
offload init
Generate a new offload.toml configuration file.
| Flag | Description |
|---|---|
-p, --provider TYPE |
Provider type: local, default (default: local) |
-f, --framework TYPE |
Framework type: pytest, cargo, default (default: pytest) |
offload logs
View per-test results from the most recent run. Reads the JUnit XML report
at {output_dir}/{junit_file} (default: test-results/junit.xml).
| Flag | Description |
|---|---|
--failures |
Show only failed tests |
--errors |
Show only errored tests |
--test ID |
Show only the test with this exact ID (repeatable) |
--test-regex PATTERN |
Show only tests whose ID matches this regex (substring match) |
All flags compose with AND logic. For example, offload logs --failures --test-regex "test_math"
shows only failed tests whose ID contains test_math.
With no flags, all test results are printed. Each test is separated by a banner:
=== tests/test_math.py::test_add [PASSED] ===
=== tests/test_math.py::test_div [FAILED] ===
AssertionError: expected 2 got 3
tests/test_math.py:10: in test_div
assert 1 / 0 == 2
E AssertionError: expected 2 got 3
Exit Codes
| Code | Meaning |
|---|---|
| 0 | All tests passed |
| 1 | Test failures or tests not run |
| 2 | Flaky tests only (passed on retry) |
Configuration Reference
Configuration is stored in a TOML file (default: offload.toml).
[offload] -- Core Settings
| Field | Type | Default | Description |
|---|---|---|---|
max_parallel |
integer | 10 |
Maximum number of parallel sandboxes |
test_timeout_secs |
integer | 900 |
Timeout per test batch in seconds |
working_dir |
string | (cwd) | Working directory for test execution |
sandbox_project_root |
string | required | Project root path on the remote sandbox (exported as OFFLOAD_ROOT) |
sandbox_init_cmd |
string | (none) | Optional command to run during image build, after cwd/copy-dirs are applied |
[provider] -- Execution Provider
The type field selects the provider. One of: local, default, modal.
type = "local"
Run tests as local child processes.
| Field | Type | Default | Description |
|---|---|---|---|
working_dir |
string | (cwd) | Working directory for spawned processes |
env |
table | {} |
Environment variables for test processes |
shell |
string | /bin/sh |
Shell used to execute commands |
type = "default"
Custom shell commands for sandbox lifecycle management. Commands use placeholder variables that are replaced via simple string substitution at runtime.
| Field | Type | Default | Description |
|---|---|---|---|
prepare_command |
string | (none) | Runs once before sandbox creation. Must print an image ID as its last line of stdout (e.g. im-rlXozWoN3Q9TWD8I6fnxm5) |
create_command |
string | required | Creates a sandbox. Must print a sandbox ID to stdout (e.g. sb-xyz123). {image_id} is replaced with the output of prepare_command |
exec_command |
string | required | Runs a command inside a sandbox. {sandbox_id} is replaced with the sandbox ID from create_command. {command} is replaced with the full shell-escaped command string (program + args + env vars as a single quoted argument) |
destroy_command |
string | required | Destroys a sandbox. {sandbox_id} is replaced with the sandbox ID |
download_command |
string | (none) | Downloads files from a sandbox. {sandbox_id} is replaced with the sandbox ID. {paths} is replaced with space-separated 'remote':'local' pairs |
working_dir |
string | (cwd) | Working directory for lifecycle commands |
timeout_secs |
integer | 3600 |
Timeout for remote commands in seconds |
copy_dirs |
list | [] |
Directories to copy into the image ("local:remote" format) |
env |
table | {} |
Environment variables for test processes |
type = "modal"
Simplified Modal sandbox provider. Internally generates the appropriate Modal CLI commands.
| Field | Type | Default | Description |
|---|---|---|---|
dockerfile |
string | (none) | Path to Dockerfile for building the sandbox image |
include_cwd |
boolean | false |
Copy the current working directory into the image |
copy_dirs |
list | [] |
Directories to copy into the image ("local:remote" format) |
[framework] -- Test Framework
The type field selects the framework. One of: pytest, cargo, default.
type = "pytest"
| Field | Type | Default | Description |
|---|---|---|---|
paths |
list | ["tests"] |
Directories to search for tests |
command |
string | (none) | Full command prefix for pytest invocation (e.g. "uv run pytest"). When set, replaces python/extra_args/-m pytest |
run_args |
string | (none) | Extra arguments for test execution only (not discovery). Only used when command is set |
extra_args |
list | [] |
(Legacy) Additional pytest arguments for discovery. Ignored when command is set |
python |
string | "python" |
(Legacy) Python interpreter. Ignored when command is set |
type = "cargo"
No additional configuration. Requires cargo-nextest.
type = "default"
Custom shell commands for test discovery and execution.
| Field | Type | Default | Description |
|---|---|---|---|
discover_command |
string | required | Command that outputs one test ID per line to stdout. Must contain {filters} placeholder |
run_command |
string | required | Command template; {tests} is replaced with space-separated test IDs. {result_file} is replaced with the result file path if configured |
result_file |
string | (none) | Path to JUnit XML result file produced by the test runner |
working_dir |
string | (cwd) | Working directory for test commands |
test_id_format |
string | required | Format for test IDs from JUnit XML ({name}, {classname}) |
[groups.NAME] -- Test Groups
At least one group is required. Each group runs its own test discovery with its filters.
| Field | Type | Default | Description |
|---|---|---|---|
retry_count |
integer | 0 |
Number of times to retry failed tests |
filters |
string | "" |
Filter string passed to the framework during discovery. For pytest: pytest args (e.g. -m 'not slow'). For cargo: nextest list args. For default: substituted into {filters} placeholder in discover_command |
Failed tests that pass on retry are marked as "flaky" (exit code 2).
[report] -- Reporting
| Field | Type | Default | Description |
|---|---|---|---|
output_dir |
string | "test-results" |
Directory for report files |
junit |
boolean | true |
Enable JUnit XML output |
junit_file |
string | "junit.xml" |
Filename for JUnit XML output |
Example Configurations
Example configuration files are included in the repository root.
Local Cargo Tests (offload.toml)
[]
= 4
= 300
= "."
[]
= "local"
= "."
[]
= "cargo"
[]
= 0
[]
= "test-results"
Pytest on Modal (offload-pytest-default.toml)
[]
= 4
= 600
= "/app"
[]
= "default"
= "uv run @modal_sandbox.py prepare --include-cwd examples/Dockerfile"
= "uv run @modal_sandbox.py create {image_id}"
= "uv run @modal_sandbox.py exec {sandbox_id} {command}"
= "uv run @modal_sandbox.py destroy {sandbox_id}"
= "uv run @modal_sandbox.py download {sandbox_id} {paths}"
= 600
[]
= "pytest"
= ["examples/tests"]
= "uv run pytest"
[]
= 2
= "-m 'not slow' -k 'not test_flaky'"
[]
= 3
= "-m 'slow'"
[]
= 5
= "-k test_flaky"
[]
= "test-results"
Cargo Tests on Modal (offload-cargo-modal.toml)
[]
= 4
= 600
= "/app"
[]
= "modal"
= ".devcontainer/Dockerfile"
= true
[]
= "cargo"
[]
= 1
[]
= "test-results"
Pytest on Modal (offload-pytest-default.toml from mng)
[]
= 40
= 60
= "/code/mng"
= "git apply /offload-upload/patch --allow-empty && uv sync --all-packages"
[]
= "default"
= "uv run @modal_sandbox.py prepare --include-cwd --cached libs/mng/imbue/mng/resources/Dockerfile"
= "uv run @modal_sandbox.py create {image_id}"
= "uv run @modal_sandbox.py exec {sandbox_id} {command}"
= "uv run @modal_sandbox.py destroy {sandbox_id}"
= "uv run @modal_sandbox.py download {sandbox_id} {paths}"
= 600
[]
= "pytest"
= ["libs/mng/tests"]
= "uv run pytest"
[]
= 0
= "-m 'not acceptance and not release'"
[]
= "test-results"
= true
= "junit.xml"
This demonstrates using sandbox_init_cmd to run setup commands during image build. The sandbox_init_cmd applies a patch and syncs packages after the working directory is copied into the image, enabling the use of the native pytest framework instead of the default framework with inline setup commands.
Bundled Scripts
Commands in configuration can reference bundled scripts using @filename.ext syntax. For example, uv run @modal_sandbox.py create {image_id} references the bundled modal_sandbox.py script. Scripts are extracted to a cache directory on first use.
Image Cache
When using the modal provider or a default provider with a prepare_command, the bundled modal_sandbox.py script caches the image ID in .offload-image-cache at the project root. Delete this file to force a fresh image build on the next run. You can also pass --no-cache to offload run to skip cached image lookup.
Environment Variable Expansion
Configuration values support environment variable expansion:
${VAR}-- required; fails ifVARis not set${VAR:-default}-- usesdefaultifVARis not set
Self-Testing
Offload can run its own test suite on Modal:
This requires a valid Modal API key.
License
All Rights Reserved. See LICENSE for details.