1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
"""Abstract base classes for the GPU-comparison benchmark suite.
A `BatchTask` describes one workload (e.g. "geodetic → ECEF on N points").
A `BatchConfig` describes how to execute it (which backend, which dtype).
The runner combines tasks and configs into cells, sweeps each (task, config)
across the task's batch-size ladder, and emits one `CellResult` per cell.
"""
"""One (backend, dtype, parallelism) tuple a task can be run under.
The four configs the suite ships with are constructed in
``benchmarks.gpu_comparison.implementations`` and passed into each
task's ``configs`` field. Tasks may declare a narrower list — e.g.,
a 20x20 force-model task that astrojax cannot run yet declares only
the ``brahe-rust-rayon`` config so the scheduler skips the missing
cells with ``config_not_supported_by_task``.
"""
:
: # "f64" or "f32"
: # "rust" | "astrojax-cpu" | "astrojax-gpu" | "astrojax-multigpu"
"""Abstract definition of one benchmark workload."""
: # e.g. "coordinates.geodetic_to_ecef"
: # e.g. "coordinates"
:
:
"""Geometric ladder of batch sizes to sweep, ascending."""
"""Deterministic, JSON-serialisable inputs of the requested batch size."""
"""Number of warmup calls before timed iterations begin."""
return 3
"""Smallest batch size for which the astrojax-multigpu config runs.
Below this threshold pmap inter-device communication dominates and
multi-GPU is slower than single-GPU. The scheduler emits
``below_multigpu_min_batch`` for cells under this size.
"""
return 100_000