Skip to main content

Module compute

Module compute 

Source
Expand description

Compute backend detection and GPU-accelerated batch stepping.

The CUDA accelerator works on SoA (Structure-of-Arrays) buffers extracted from the ABM agent store. The flow is:

  1. Extract agent fields into flat Vec<f32> columns via SoaExtractable
  2. Upload columns to GPU device memory
  3. Launch a user-provided PTX kernel that processes all agents in parallel
  4. Download results back to host
  5. Write columns back into the agent store

When CUDA is unavailable (no cuda feature or no device), the same SoA buffers are processed on CPU via a user-provided closure.

§Determinism and backend selection

cpu_batch_step is replayable when:

  • SoA extraction order is deterministic for the chosen store and workload
  • the supplied CPU kernel is itself deterministic

auto_batch_step and auto_device_step do not guarantee a fixed backend across machines or runs, because backend selection depends on:

  • compile-time cuda support
  • runtime device availability
  • the RUSTSIM_BACKEND environment variable
  • CUDA failure fallback to CPU

Exact bitwise equivalence between CPU and CUDA results is not guaranteed. Floating-point behavior, execution order, and kernel implementation details may differ across backends.

§CUDA safety and failure surfaces

The only unsafe operations in this module are CUDA kernel launches via cudarc. Those launches rely on the following invariants:

  • block_size > 0
  • the PTX kernel signature matches the launched argument tuple
  • each device buffer points to a valid uploaded SoA column
  • the kernel performs bounds checks for idx < n
  • the kernel does not read or write out of bounds

Failure surfaces are explicit Err(String) results from:

  • CUDA device initialization
  • PTX load/module lookup
  • host-to-device transfer
  • invalid launch configuration such as block_size == 0
  • unsupported SoA arity outside 1..=8
  • kernel launch / synchronization
  • device-to-host transfer

auto_batch_step and auto_device_step treat those CUDA errors as runtime fallback triggers and continue on CPU.

§Persistent Device Store

For multi-step runs, use DeviceSoaStore to avoid per-step SoA extraction overhead. This mirrors FlameGPU2’s design where agent data lives on the GPU across steps.

Structs§

AccelStepResult
Result of a GPU (or CPU-fallback) batch step.

Enums§

ComputeBackend
Represents the available compute backend.

Functions§

auto_batch_step
Automatically choose CUDA or CPU for a batch step.
auto_device_step
Step a DeviceSoaStore using CUDA or CPU.
cpu_batch_step
CPU-side batch step over SoA columns.
cpu_batch_step_f64
CPU-side batch step over f64 SoA columns.
cuda_batch_step
CUDA batch step over SoA columns.
cuda_batch_step_pinned
CUDA batch step over SoA columns using pinned host memory and dedicated non-default CUDA streams for host/device transfer overlap.
detect_backend
Probe the system for CUDA availability.