Module intrinsics

Source

Expand description

amdgpu compiler intrinsics.

Intrinsics defined for the amdgpu LLVM backend. Availability of intrinsics varies depending on the target architecture.

Functions§

ballot: Returns a bitfield (i32 or i64) containing the result of its i1 argument in all active lanes, and zero in all inactive lanes.
dispatch_id: Returns the id of the dispatch that is currently executed.
ds_bpermute^⚠: Gather data across all lanes in a wavefront.
ds_permute^⚠: Scatter data across all lanes in a wavefront.
endpgm: Stop execution of the wavefront.
global_atomic_cond_sub^⚠: Conditional atomic subtraction
global_atomic_csub^⚠: Clamping atomic subtraction
groupstaticsize: Returns the number of LDS bytes statically allocated for this program.
inverse_ballot: Indexes into the value with the current lane id and returns for each lane if the corresponding bit is set.
mbcnt_hi: Masked bit count, high 32 lanes.
mbcnt_lo: Masked bit count, low 32 lanes.
perm^⚠: Permute a 64-bit value.
permlane16_swap^⚠: Provide direct access to v_permlane16_swap_b32 instruction on supported targets.
permlane16_u32^⚠: Performs arbitrary gather-style operation within a row (16 contiguous lanes) of the second input operand.
permlane16_var^⚠: Performs arbitrary gather-style operation within a row (16 contiguous lanes) of the second input operand.
permlane32_swap^⚠: Provide direct access to v_permlane32_swap_b32 instruction on supported targets.
permlane64_u32^⚠: Swap value between upper and lower 32 lanes in a wavefront.
permlanex16_u32^⚠: Performs arbitrary gather-style operation across two rows (16 contiguous lanes) of the second input operand.
permlanex16_var^⚠: Performs arbitrary gather-style operation across two rows (16 contiguous lanes) of the second input operand.
readfirstlane_u32: Get value from the first active lane in the wavefront.
readfirstlane_u64: Get value from the first active lane in the wavefront.
readlane_u32^⚠: Get value from the lane at index lane in the wavefront.
readlane_u64^⚠: Get value from the lane at index lane in the wavefront.
s_barrier: Synchronize all wavefronts in a workgroup.
s_get_waveid_in_workgroup: Get the index of the current wavefront in the workgroup.
s_memrealtime: Measures time based on a fixed frequency.
s_sethalt: Stop execution of the kernel.
s_sleep: Sleeps for approximately count * 64 cycles.
update_dpp^⚠: The update_dpp intrinsic represents the update.dpp operation in AMDGPU. It takes an old value, a source operand, a DPP control operand, a row mask, a bank mask, and a bound control. This operation is equivalent to a sequence of v_mov_b32 operations.
wave_id: Get the index of the current wavefront in the workgroup.
wavefrontsize: Returns the number of threads in a wavefront.
workgroup_id_x: Returns the x coordinate of the workgroup index within the dispatch.
workgroup_id_y: Returns the y coordinate of the workgroup index within the dispatch.
workgroup_id_z: Returns the z coordinate of the workgroup index within the dispatch.
workitem_id_x: Returns the x coordinate of the workitem index within the workgroup.
workitem_id_y: Returns the y coordinate of the workitem index within the workgroup.
workitem_id_z: Returns the z coordinate of the workitem index within the workgroup.
writelane_u32^⚠: Return value for the lane at index lane in the wavefront. Return default for all other lanes.
writelane_u64^⚠: Return value for the lane at index lane in the wavefront. Return default for all other lanes.

Module intrinsicsCopy item path

Functions§

Module intrinsics