pub struct KernelBuilder { /* private fields */ }Expand description
Builder for constructing complete PTX kernel modules.
KernelBuilder follows the fluent builder pattern: chain configuration
methods, supply a body closure, and call build to produce the final
PTX text.
§Example
use oxicuda_ptx::builder::KernelBuilder;
use oxicuda_ptx::arch::SmVersion;
use oxicuda_ptx::ir::PtxType;
let ptx = KernelBuilder::new("vector_add")
.target(SmVersion::Sm80)
.param("a", PtxType::U64)
.param("b", PtxType::U64)
.param("c", PtxType::U64)
.param("n", PtxType::U32)
.body(|b| {
let tid = b.global_thread_id_x();
let n_reg = b.load_param_u32("n");
b.if_lt_u32(tid, n_reg, |b| {
b.comment("kernel body goes here");
});
b.ret();
})
.build()
.expect("PTX generation failed");
assert!(ptx.contains(".entry vector_add"));
assert!(ptx.contains(".target sm_80"));Implementations§
Source§impl KernelBuilder
impl KernelBuilder
Sourcepub fn new(name: &str) -> Self
pub fn new(name: &str) -> Self
Creates a new kernel builder with the given kernel name.
The default target is SmVersion::Sm80 (Ampere). Call target
to override.
Sourcepub const fn target(self, sm: SmVersion) -> Self
pub const fn target(self, sm: SmVersion) -> Self
Sets the target GPU architecture for this kernel.
This determines the .target and .version directives in the
generated PTX, and also controls which instructions the
BodyBuilder may emit.
Sourcepub fn param(self, name: &str, ty: PtxType) -> Self
pub fn param(self, name: &str, ty: PtxType) -> Self
Adds a kernel parameter with the given name and type.
Parameters are emitted in declaration order in the .entry signature.
Common types: PtxType::U64 for pointers, PtxType::U32 / PtxType::F32
for scalar arguments.
Declares a static shared memory allocation.
This generates a .shared .align declaration at the top of the
kernel body. The total size is count * ty.size_bytes() bytes.
Sourcepub const fn max_threads_per_block(self, n: u32) -> Self
pub const fn max_threads_per_block(self, n: u32) -> Self
Sets the .maxntid directive, hinting to ptxas the maximum
number of threads per block this kernel will be launched with.
This can improve register allocation and occupancy planning.
Sourcepub fn body<F>(self, f: F) -> Selfwhere
F: FnOnce(&mut BodyBuilder<'_>) + 'static,
pub fn body<F>(self, f: F) -> Selfwhere
F: FnOnce(&mut BodyBuilder<'_>) + 'static,
Supplies the body closure that generates the kernel’s instructions.
The closure receives a mutable reference to a BodyBuilder which
provides the instruction emission API (loads, stores, arithmetic,
control flow, tensor core ops, etc.).
Sourcepub fn build(self) -> Result<String, PtxGenError>
pub fn build(self) -> Result<String, PtxGenError>
Consumes the builder and generates the complete PTX module text.
§Errors
Returns PtxGenError::MissingBody if no body closure was provided.
Returns PtxGenError::FormatError if string formatting fails.