Module compute

Expand description

Compute backend detection and GPU-accelerated batch stepping.

The CUDA accelerator works on SoA (Structure-of-Arrays) buffers extracted from the ABM agent store. The flow is:

Extract agent fields into flat Vec<f32> columns via SoaExtractable
Upload columns to GPU device memory
Launch a user-provided PTX kernel that processes all agents in parallel
Download results back to host
Write columns back into the agent store

When CUDA is unavailable (no cuda feature or no device), the same SoA buffers are processed on CPU via a user-provided closure.

§Determinism and backend selection

cpu_batch_step is replayable when:

SoA extraction order is deterministic for the chosen store and workload
the supplied CPU kernel is itself deterministic

auto_batch_step and auto_device_step do not guarantee a fixed backend across machines or runs, because backend selection depends on:

compile-time cuda support
runtime device availability
the RUSTSIM_BACKEND environment variable
CUDA failure fallback to CPU

Exact bitwise equivalence between CPU and CUDA results is not guaranteed. Floating-point behavior, execution order, and kernel implementation details may differ across backends.

§CUDA safety and failure surfaces

The only unsafe operations in this module are CUDA kernel launches via cudarc. Those launches rely on the following invariants:

block_size > 0
the PTX kernel signature matches the launched argument tuple
each device buffer points to a valid uploaded SoA column
the kernel performs bounds checks for idx < n
the kernel does not read or write out of bounds

Failure surfaces are explicit Err(String) results from:

CUDA device initialization
PTX load/module lookup
host-to-device transfer
invalid launch configuration such as block_size == 0
unsupported SoA arity outside 1..=8
kernel launch / synchronization
device-to-host transfer

auto_batch_step and auto_device_step treat those CUDA errors as runtime fallback triggers and continue on CPU.

§Persistent Device Store

For multi-step runs, use DeviceSoaStore to avoid per-step SoA extraction overhead. This mirrors FlameGPU2’s design where agent data lives on the GPU across steps.

Structs§

AccelStepResult: Result of a GPU (or CPU-fallback) batch step.

Enums§

ComputeBackend: Represents the available compute backend.

Functions§

auto_batch_step: Automatically choose CUDA or CPU for a batch step.
auto_device_step: Step a DeviceSoaStore using CUDA or CPU.
cpu_batch_step: CPU-side batch step over SoA columns.
cpu_batch_step_f64: CPU-side batch step over f64 SoA columns.
cuda_batch_step: CUDA batch step over SoA columns.
cuda_batch_step_pinned: CUDA batch step over SoA columns using pinned host memory and dedicated non-default CUDA streams for host/device transfer overlap.
detect_backend: Probe the system for CUDA availability.

Module compute

Module compute Copy item path

§Determinism and backend selection

§CUDA safety and failure surfaces

§Persistent Device Store

Structs§

Enums§

Functions§

Module compute