Module intrinsics

Source
Expand description

amdgpu compiler intrinsics.

Intrinsics defined for the amdgpu LLVM backend. Availability of intrinsics varies depending on the target architecture.

Functions§

ballot
Returns a bitfield (i32 or i64) containing the result of its i1 argument in all active lanes, and zero in all inactive lanes.
dispatch_id
Returns the id of the dispatch that is currently executed.
ds_bpermute
Gather data across all lanes in a wavefront.
ds_permute
Scatter data across all lanes in a wavefront.
endpgm
Stop execution of the wavefront.
global_atomic_cond_sub
Conditional atomic subtraction
global_atomic_csub
Clamping atomic subtraction
groupstaticsize
Returns the number of LDS bytes statically allocated for this program.
inverse_ballot
Indexes into the value with the current lane id and returns for each lane if the corresponding bit is set.
mbcnt_hi
Masked bit count, high 32 lanes.
mbcnt_lo
Masked bit count, low 32 lanes.
perm
Permute a 64-bit value.
permlane16_swap
Provide direct access to v_permlane16_swap_b32 instruction on supported targets.
permlane16_u32
Performs arbitrary gather-style operation within a row (16 contiguous lanes) of the second input operand.
permlane16_var
Performs arbitrary gather-style operation within a row (16 contiguous lanes) of the second input operand.
permlane32_swap
Provide direct access to v_permlane32_swap_b32 instruction on supported targets.
permlane64_u32
Swap value between upper and lower 32 lanes in a wavefront.
permlanex16_u32
Performs arbitrary gather-style operation across two rows (16 contiguous lanes) of the second input operand.
permlanex16_var
Performs arbitrary gather-style operation across two rows (16 contiguous lanes) of the second input operand.
readfirstlane_u32
Get value from the first active lane in the wavefront.
readfirstlane_u64
Get value from the first active lane in the wavefront.
readlane_u32
Get value from the lane at index lane in the wavefront.
readlane_u64
Get value from the lane at index lane in the wavefront.
s_barrier
Synchronize all wavefronts in a workgroup.
s_get_waveid_in_workgroup
Get the index of the current wavefront in the workgroup.
s_memrealtime
Measures time based on a fixed frequency.
s_sethalt
Stop execution of the kernel.
s_sleep
Sleeps for approximately count * 64 cycles.
update_dpp
The update_dpp intrinsic represents the update.dpp operation in AMDGPU. It takes an old value, a source operand, a DPP control operand, a row mask, a bank mask, and a bound control. This operation is equivalent to a sequence of v_mov_b32 operations.
wave_id
Get the index of the current wavefront in the workgroup.
wavefrontsize
Returns the number of threads in a wavefront.
workgroup_id_x
Returns the x coordinate of the workgroup index within the dispatch.
workgroup_id_y
Returns the y coordinate of the workgroup index within the dispatch.
workgroup_id_z
Returns the z coordinate of the workgroup index within the dispatch.
workitem_id_x
Returns the x coordinate of the workitem index within the workgroup.
workitem_id_y
Returns the y coordinate of the workitem index within the workgroup.
workitem_id_z
Returns the z coordinate of the workitem index within the workgroup.
writelane_u32
Return value for the lane at index lane in the wavefront. Return default for all other lanes.
writelane_u64
Return value for the lane at index lane in the wavefront. Return default for all other lanes.