# Benchmarks
The primitive showcase benchmark measures every registered `primitive.*` op at four problem sizes: 1,024, 10,240, 102,400, and 1,048,576 elements. CPU numbers are direct Rust scalar kernels. GPU numbers in the table are end-to-end WGSL dispatch through `wgpu`, including upload, dispatch, readback, and map. The consolidated JSON also records kernel-only timings for each row.
The crossover story is visible in the shape of the table. Tiny inputs are dominated by GPU launch and transfer overhead, so CPU wins most cheap scalar operations at 1K and 10K elements. At larger sizes, expensive arithmetic such as `gcd` and `lcm` crosses over at 10,240 elements even with upload and download included. Cheap single-instruction primitives need larger batches or fused pipelines to amortize transfers; their kernel-only timings in `benches/RESULTS.json` are the GPU throughput signal, while this table is the shipped end-to-end user cost.
Generated by `cargo bench --bench primitives_showcase`.
| op_id | N=1K CPU | N=1K GPU | N=10K CPU | N=10K GPU | N=100K CPU | N=100K GPU | N=1M CPU | N=1M GPU | crossover |
|---|---:|---:|---:|---:|---:|---:|---:|---:|---:|
| primitive.bitwise.and | 1.14 | 53.0 | 1.09 | 6.39 | 1.08 | 2.55 | 1.08 | 2.16 | >1048576 |
| primitive.bitwise.clz | 1.13 | 48.0 | 1.07 | 5.80 | 1.07 | 1.80 | 1.07 | 1.60 | >1048576 |
| primitive.bitwise.ctz | 1.14 | 43.7 | 1.08 | 5.68 | 1.07 | 1.83 | 0.89 | 1.61 | >1048576 |
| primitive.bitwise.extract_bits | 1.32 | 46.6 | 1.26 | 7.00 | 1.25 | 2.33 | 1.25 | 2.16 | >1048576 |
| primitive.bitwise.insert_bits | 1.48 | 49.3 | 1.42 | 6.45 | 1.42 | 2.32 | 1.42 | 2.44 | >1048576 |
| primitive.bitwise.not | 1.12 | 46.4 | 1.07 | 5.86 | 1.07 | 1.84 | 1.07 | 1.84 | >1048576 |
| primitive.bitwise.or | 1.18 | 60.9 | 1.13 | 6.95 | 1.12 | 2.82 | 1.09 | 3.38 | >1048576 |
| primitive.bitwise.popcount | 1.21 | 50.9 | 1.16 | 6.00 | 1.15 | 1.96 | 1.10 | 1.93 | >1048576 |
| primitive.bitwise.reverse_bits | 1.34 | 42.5 | 1.28 | 6.31 | 1.26 | 1.69 | 1.29 | 2.06 | >1048576 |
| primitive.bitwise.rotl | 1.19 | 42.9 | 1.13 | 5.70 | 1.13 | 2.46 | 1.08 | 2.30 | >1048576 |
| primitive.bitwise.rotr | 1.19 | 42.9 | 1.13 | 5.49 | 1.12 | 2.50 | 1.09 | 2.54 | >1048576 |
| primitive.bitwise.shl | 1.14 | 41.5 | 1.09 | 6.01 | 0.90 | 2.30 | 1.09 | 2.98 | >1048576 |
| primitive.bitwise.shr | 1.18 | 41.4 | 1.13 | 5.60 | 1.12 | 2.67 | 1.09 | 2.53 | >1048576 |
| primitive.bitwise.xor | 1.20 | 41.8 | 1.13 | 5.46 | 1.11 | 2.27 | 1.09 | 2.67 | >1048576 |
| primitive.compare.eq | 1.13 | 42.1 | 1.08 | 5.73 | 1.08 | 2.27 | 1.10 | 2.42 | >1048576 |
| primitive.compare.ge | 1.13 | 42.2 | 1.08 | 5.64 | 1.07 | 2.12 | 1.09 | 2.14 | >1048576 |
| primitive.compare.gt | 1.13 | 43.6 | 1.08 | 5.53 | 1.08 | 2.24 | 1.09 | 2.61 | >1048576 |
| primitive.compare.le | 1.14 | 53.4 | 1.07 | 5.58 | 1.08 | 2.52 | 1.09 | 2.85 | >1048576 |
| primitive.compare.logical_not | 1.57 | 41.3 | 1.08 | 4.81 | 1.08 | 1.68 | 1.08 | 1.92 | >1048576 |
| primitive.compare.lt | 1.21 | 49.1 | 1.13 | 6.06 | 1.13 | 2.97 | 1.08 | 2.57 | >1048576 |
| primitive.compare.ne | 1.14 | 42.8 | 1.09 | 6.29 | 1.08 | 2.39 | 1.09 | 2.81 | >1048576 |
| primitive.compare.select | 1.38 | 48.9 | 1.41 | 5.71 | 1.08 | 3.01 | 1.08 | 2.46 | >1048576 |
| primitive.float.f32_abs | 1.15 | 39.4 | 1.08 | 4.80 | 1.07 | 1.69 | 1.07 | 1.33 | >1048576 |
| primitive.float.f32_add | 1.13 | 48.1 | 1.08 | 5.63 | 1.07 | 2.32 | 0.90 | 2.10 | >1048576 |
| primitive.float.f32_cos | 1.37 | 38.5 | 1.34 | 4.72 | 1.33 | 1.66 | 1.42 | 2.45 | >1048576 |
| primitive.float.f32_div | 1.47 | 44.5 | 1.09 | 5.96 | 1.35 | 2.28 | 0.91 | 2.04 | >1048576 |
| primitive.float.f32_mul | 1.40 | 51.6 | 1.08 | 5.84 | 1.08 | 2.17 | 1.09 | 2.61 | >1048576 |
| primitive.float.f32_neg | 1.35 | 44.4 | 1.19 | 5.78 | 0.72 | 1.66 | 0.85 | 1.67 | >1048576 |
| primitive.float.f32_sin | 1.35 | 51.2 | 1.40 | 6.43 | 1.38 | 1.90 | 1.41 | 1.68 | >1048576 |
| primitive.float.f32_sqrt | 1.18 | 41.0 | 1.12 | 4.70 | 1.12 | 1.65 | 1.08 | 1.48 | >1048576 |
| primitive.float.f32_sub | 1.13 | 59.6 | 1.07 | 6.67 | 1.07 | 2.37 | 1.08 | 2.27 | >1048576 |
| primitive.math.abs | 1.14 | 40.0 | 1.08 | 5.51 | 1.07 | 1.88 | 1.08 | 1.52 | >1048576 |
| primitive.math.abs_diff | 1.18 | 55.4 | 1.13 | 6.87 | 1.12 | 2.52 | 1.08 | 2.62 | >1048576 |
| primitive.math.add | 0.96 | 53.1 | 0.90 | 6.11 | 0.89 | 2.65 | 1.08 | 2.04 | >1048576 |
| primitive.math.add_sat | 1.18 | 149 | 1.12 | 6.67 | 1.12 | 2.47 | 1.09 | 2.23 | >1048576 |
| primitive.math.clamp | 1.12 | 49.8 | 1.07 | 6.33 | 1.07 | 2.55 | 1.08 | 2.09 | >1048576 |
| primitive.math.div | 1.14 | 50.5 | 1.08 | 6.17 | 1.10 | 2.40 | 1.10 | 2.05 | >1048576 |
| primitive.math.gcd | 26.2 | 51.6 | 31.8 | 6.52 | 31.4 | 2.49 | 31.4 | 2.35 | 10240 |
| primitive.math.lcm | 22.4 | 48.4 | 26.8 | 6.46 | 27.9 | 2.38 | 28.0 | 2.26 | 10240 |
| primitive.math.max | 1.12 | 50.3 | 1.07 | 6.08 | 1.06 | 2.27 | 1.10 | 2.13 | >1048576 |
| primitive.math.min | 1.14 | 49.8 | 1.13 | 6.19 | 1.12 | 2.73 | 1.11 | 2.29 | >1048576 |
| primitive.math.mod | 1.38 | 49.7 | 1.09 | 6.37 | 1.08 | 2.89 | 1.10 | 2.28 | >1048576 |
| primitive.math.mul | 0.98 | 50.8 | 0.91 | 6.05 | 0.91 | 2.41 | 0.92 | 2.87 | >1048576 |
| primitive.math.neg | 1.14 | 44.7 | 1.08 | 5.52 | 1.08 | 1.96 | 1.08 | 1.67 | >1048576 |
| primitive.math.negate | 1.14 | 41.7 | 1.08 | 5.70 | 1.07 | 1.98 | 1.08 | 1.60 | >1048576 |
| primitive.math.sign | 1.14 | 50.0 | 1.08 | 5.41 | 1.11 | 1.97 | 1.07 | 1.66 | >1048576 |
| primitive.math.sub | 1.20 | 46.4 | 1.13 | 6.00 | 1.12 | 2.25 | 1.13 | 2.33 | >1048576 |
| primitive.math.sub_sat | 1.20 | 52.5 | 1.13 | 6.40 | 1.07 | 2.35 | 1.08 | 2.18 | >1048576 |