1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
//! # kaio-macros
//!
//! Proc macro crate for KAIO. Provides the `#[gpu_kernel]` attribute
//! macro that transforms Rust function syntax into PTX codegen +
//! typed launch wrappers.
//!
//! This crate is not intended to be used directly — use `kaio` and
//! import via `kaio::prelude::*`.
use TokenStream;
use TokenStream as TokenStream2;
use ItemFn;
use parse_kernel_config;
use parse_body;
use parse_kernel_signature;
/// Marks a function as a GPU kernel compiled to PTX.
///
/// # DSL, not compiled Rust
///
/// The function body uses Rust syntax but is **not compiled by rustc**.
/// The proc macro parses it into KAIO's own IR (`KernelStmt`) and emits
/// PTX text directly. No LLVM, no MIR, no borrow checker runs on the
/// kernel body.
///
/// This has an important consequence for `&mut [T]` parameters: in
/// standard Rust, `&mut T` carries a `noalias` guarantee — the compiler
/// assumes exclusive access. In a GPU kernel, thousands of threads
/// execute the same function body concurrently, all accessing the same
/// buffer. Because the body never reaches rustc's backend, no `noalias`
/// attribute is emitted — ptxas sees a plain `.u64` param. There is no
/// UB from the aliasing mismatch, but the `&mut` syntax is misleading:
/// correctness depends on the kernel author writing disjoint access
/// patterns (e.g. `if idx < n` bounds guards), not on compiler-enforced
/// uniqueness.
///
/// A future release will accept `*mut [T]` / `*const [T]` as the
/// primary kernel parameter syntax to better communicate this. See
/// RFC-0001 in the repository for the design direction.
///
/// You cannot call Rust functions declared outside the kernel inside the
/// kernel body. The supported syntax subset includes: arithmetic,
/// comparisons, bitwise ops, short-circuit `&&`/`||`, compound
/// assignment, `if`/`else`, `for`/`while` loops, `let` bindings, and
/// KAIO GPU builtins (`thread_idx_x()`, `shared_mem!`, etc.).
///
/// # Attributes
///
/// - `block_size = N` (required): Number of threads per block. Must be
/// a power of 2 in the range `[1, 1024]`.
///
/// # Example
///
/// ```ignore
/// use kaio::prelude::*;
///
/// #[gpu_kernel(block_size = 256)]
/// fn vector_add(a: &[f32], b: &[f32], out: &mut [f32], n: u32) {
/// let idx = thread_idx_x() + block_idx_x() * block_dim_x();
/// if idx < n {
/// out[idx] = a[idx] + b[idx];
/// }
/// }
/// ```