1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
//! CUDA arm for rustsim-crowd — GPU kernels for the hot-path pedestrian
//! models.
//!
//! # Scope
//!
//! Closes the P0-1 "GPU path via `DeviceSoaStore`" item from
//! `docs/rustsim-crowd.md`. Every 2-D crowd model now ships a CUDA
//! arm: **Social Force**, **Generalized Centrifugal Force**,
//! **Collision-Free Speed**, **Anticipation Velocity**, and
//! **Optimal Steps**. Social Force ships stateless, device-resident,
//! and grid-accelerated variants; the other four models ship stateless
//! and device-resident variants so production tick loops can avoid
//! per-step host/device transfer.
//!
//! # Design
//!
//! - One CUDA thread per pedestrian. Every thread reads the full `*_in`
//! position / velocity / radius column set (old state) and writes its
//! own row of `*_out` columns (new state). This double-buffered layout
//! sidesteps the intra-step read/write hazard without any explicit
//! synchronisation.
//! - Pair interactions are O(n²) inside the kernel. A device-side
//! uniform-grid neighbour query is the natural next refinement
//! (mirroring the CPU `NeighborGrid`), but even the O(n²) kernel is
//! comfortably faster than the CPU `step_scratch` path for
//! `N ≳ 2 000` because the GPU handles the quadratic term in
//! massively parallel fashion.
//! - Precision: internal arithmetic is `f32`. Helbing's SFM is
//! numerically tolerant well below 1e-4 m tolerance, which matches
//! FLAMEGPU2 and other production SFM GPU implementations.
//! - Wire format: plain scalar PTX arguments, no CUDA constant memory,
//! no textures. The kernel is compiled at runtime via
//! `cudarc::nvrtc::compile_ptx` so the build succeeds without a
//! local CUDA toolkit — only the driver is required at run time.
//!
//! # Runtime requirements
//!
//! - `cuda` feature enabled.
//! - An NVIDIA GPU with a compatible driver at execution time. If no
//! device is found or CUDA initialisation fails, every entry point
//! returns `Err(String)` and the caller is expected to fall back to
//! the CPU path (see [`social_force::step_with_fallback`] for the
//! convenience wrapper that does exactly that).
use CudaContext;
use Any;
use ;
use Arc;
pub