1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
//! # kaio-macros
//!
//! Proc macro crate for KAIO. Provides the `#[gpu_kernel]` attribute
//! macro that transforms Rust function syntax into PTX codegen +
//! typed launch wrappers.
//!
//! This crate is not intended to be used directly — use `kaio` and
//! import via `kaio::prelude::*`.
use TokenStream;
use TokenStream as TokenStream2;
use ItemFn;
use parse_kernel_config;
use parse_body;
use parse_kernel_signature;
/// Marks a function as a GPU kernel compiled to PTX.
///
/// # Parameter syntax
///
/// Kernel parameters are written as `*const [T]` (primary) or `&[T]`
/// (sugar) for read-only slices, and `*mut [T]` (primary) or `&mut [T]`
/// (sugar) for read-write slices. Both forms lower to identical PTX.
/// The pointer form is recommended because it accurately signals
/// "device pointer, no aliasing contract" — see RFC-0001. The
/// reference form is accepted as permanent ergonomic sugar; it will
/// not be deprecated.
///
/// Scalar types (`f32`, `f64`, `i32`, `u32`, `i64`, `u64`, `bool`) are
/// passed by value.
///
/// # DSL, not compiled Rust
///
/// The function body uses Rust syntax but is **not compiled by rustc**.
/// The proc macro parses it into KAIO's own IR (`KernelStmt`) and emits
/// PTX text directly. No LLVM, no MIR, no borrow checker runs on the
/// kernel body. ptxas sees a plain `.u64` param for every slice
/// parameter regardless of which surface syntax you wrote.
///
/// Thousands of threads execute the kernel body concurrently, all
/// accessing the same device buffers. Correctness depends on writing
/// disjoint access patterns (e.g. `if idx < n` bounds guards), not on
/// compiler-enforced uniqueness.
///
/// You cannot call Rust functions declared outside the kernel inside
/// the kernel body. The supported syntax subset includes: arithmetic,
/// comparisons, bitwise ops, short-circuit `&&`/`||`, compound
/// assignment, `if`/`else`, `for`/`while` loops, `let` bindings, and
/// KAIO GPU builtins (`thread_idx_x()`, `shared_mem!`, etc.).
///
/// # Attributes
///
/// - `block_size = N` (required): Number of threads per block. Must be
/// a power of 2 in the range `[1, 1024]`.
///
/// # Example
///
/// ```ignore
/// use kaio::prelude::*;
///
/// #[gpu_kernel(block_size = 256)]
/// fn vector_add(a: *const [f32], b: *const [f32], out: *mut [f32], n: u32) {
/// let idx = thread_idx_x() + block_idx_x() * block_dim_x();
/// if idx < n {
/// out[idx] = a[idx] + b[idx];
/// }
/// }
/// ```