pub enum ControlOp {
SetP {
dst: Register,
cmp_op: CmpOp,
lhs: Operand,
rhs: Operand,
ty: PtxType,
},
SetPAnd {
dst: Register,
cmp_op: CmpOp,
lhs: Operand,
rhs: Operand,
ty: PtxType,
src_pred: Register,
},
BraPred {
pred: Register,
target: String,
negate: bool,
},
Bra {
target: String,
},
Ret,
BarSync {
barrier_id: u32,
},
ShflSyncDown {
dst: Register,
src: Register,
delta: Operand,
c: u32,
mask: u32,
},
ShflSyncUp {
dst: Register,
src: Register,
delta: Operand,
c: u32,
mask: u32,
},
ShflSyncBfly {
dst: Register,
src: Register,
lane_mask: Operand,
c: u32,
mask: u32,
},
}Expand description
Control flow PTX instruction variants.
Variants§
SetP
Set predicate from comparison: setp.{cmp_op}{ty} pred, lhs, rhs;
Compares lhs and rhs and writes the result to a predicate register.
Example: setp.ge.u32 %p1, %r1, %r2;
Fields
SetPAnd
Set predicate from comparison ANDed with a source predicate:
setp.{cmp_op}.and{ty} pred, lhs, rhs, src_pred;
Computes pred = (lhs CmpOp rhs) AND src_pred in one instruction.
Used for compact edge-tile bounds checking — combines a row check
with an existing col-check predicate without a separate and.pred.
Sprint 6.7 (multi-warp matmul_tc edge tiles) is the first user.
Example: setp.lt.and.u32 %p3, %r5, %r10, %p2;
Fields
BraPred
Predicated branch: @{pred} bra {target}; or @!{pred} bra {target};
Branches to target label if pred is true (or false when negated).
Uses PtxWriter::line() instead of instruction() because the
@pred mnemonic target; format doesn’t fit the comma-separated
operand pattern.
Examples:
@%p1 bra $L__BB0_2;— branch if pred is true@!%p1 bra IF_END_0;— branch if pred is false (Phase 2 if/else)
Fields
Bra
Unconditional branch: bra {target};
Not used in vector_add but included for Phase 3 loop support.
Ret
Return from kernel: ret;
BarSync
Block-level barrier synchronization: bar.sync {barrier_id};
All threads in the block must reach this instruction before any
can proceed. Barrier 0 is the conventional default.
Example: bar.sync 0;
ShflSyncDown
Warp shuffle down: shfl.sync.down.b32 dst, src, delta, c, membermask;
Each thread reads from the thread delta lanes below it within
the warp. The c operand packs clamp width (see PTX ISA 8.7 S9.7.8).
Example: shfl.sync.down.b32 %r2, %r1, 1, 31, 0xFFFFFFFF;
Fields
ShflSyncUp
Warp shuffle up: shfl.sync.up.b32 dst, src, delta, c, membermask;
Each thread reads from the thread delta lanes above it.
Fields
ShflSyncBfly
Warp shuffle butterfly (XOR): shfl.sync.bfly.b32 dst, src, lane_mask, c, membermask;
Each thread reads from the thread at lane XOR lane_mask.