Expand description
Compute backend detection and GPU-accelerated batch stepping.
The CUDA accelerator works on SoA (Structure-of-Arrays) buffers extracted from the ABM agent store. The flow is:
- Extract agent fields into flat
Vec<f32>columns viaSoaExtractable - Upload columns to GPU device memory
- Launch a user-provided PTX kernel that processes all agents in parallel
- Download results back to host
- Write columns back into the agent store
When CUDA is unavailable (no cuda feature or no device), the same SoA
buffers are processed on CPU via a user-provided closure.
§Determinism and backend selection
cpu_batch_step is replayable when:
- SoA extraction order is deterministic for the chosen store and workload
- the supplied CPU kernel is itself deterministic
auto_batch_step and auto_device_step do not guarantee a fixed backend
across machines or runs, because backend selection depends on:
- compile-time
cudasupport - runtime device availability
- the
RUSTSIM_BACKENDenvironment variable - CUDA failure fallback to CPU
Exact bitwise equivalence between CPU and CUDA results is not guaranteed. Floating-point behavior, execution order, and kernel implementation details may differ across backends.
§CUDA safety and failure surfaces
The only unsafe operations in this module are CUDA kernel launches via
cudarc. Those launches rely on the following invariants:
block_size > 0- the PTX kernel signature matches the launched argument tuple
- each device buffer points to a valid uploaded SoA column
- the kernel performs bounds checks for
idx < n - the kernel does not read or write out of bounds
Failure surfaces are explicit Err(String) results from:
- CUDA device initialization
- PTX load/module lookup
- host-to-device transfer
- invalid launch configuration such as
block_size == 0 - unsupported SoA arity outside
1..=8 - kernel launch / synchronization
- device-to-host transfer
auto_batch_step and auto_device_step treat those CUDA errors as runtime
fallback triggers and continue on CPU.
§Persistent Device Store
For multi-step runs, use DeviceSoaStore
to avoid per-step SoA extraction overhead. This mirrors FlameGPU2’s design
where agent data lives on the GPU across steps.
Structs§
- Accel
Step Result - Result of a GPU (or CPU-fallback) batch step.
Enums§
- Compute
Backend - Represents the available compute backend.
Functions§
- auto_
batch_ step - Automatically choose CUDA or CPU for a batch step.
- auto_
device_ step - Step a
DeviceSoaStoreusing CUDA or CPU. - cpu_
batch_ step - CPU-side batch step over SoA columns.
- cpu_
batch_ step_ f64 - CPU-side batch step over
f64SoA columns. - cuda_
batch_ step - CUDA batch step over SoA columns.
- cuda_
batch_ step_ pinned - CUDA batch step over SoA columns using pinned host memory and dedicated non-default CUDA streams for host/device transfer overlap.
- detect_
backend - Probe the system for CUDA availability.