pub struct ElementwiseTemplate {
pub op: ElementwiseOp,
pub precision: PtxType,
pub target: SmVersion,
}Expand description
Template for generating elementwise PTX kernels.
Combines an ElementwiseOp, a precision (PtxType), and a target
architecture (SmVersion) to produce a complete PTX module string.
The generated kernel handles global thread indexing and bounds checking.
For complex activations (GELU, sigmoid, SiLU), the template emits
approximate PTX instruction sequences using ex2.approx and rcp.approx.
Fields§
§op: ElementwiseOpThe elementwise operation to generate.
precision: PtxTypeThe data precision for computation (e.g., PtxType::F32).
target: SmVersionThe target GPU architecture.
Implementations§
Source§impl ElementwiseTemplate
impl ElementwiseTemplate
Sourcepub const fn new(
op: ElementwiseOp,
precision: PtxType,
target: SmVersion,
) -> Self
pub const fn new( op: ElementwiseOp, precision: PtxType, target: SmVersion, ) -> Self
Creates a new elementwise template with the given parameters.
Sourcepub fn kernel_name(&self) -> String
pub fn kernel_name(&self) -> String
Returns the kernel function name derived from the operation and precision.
The name follows the pattern elementwise_{op}_{type}, for example
elementwise_add_f32 or elementwise_relu_f16.
Sourcepub fn generate(&self) -> Result<String, PtxGenError>
pub fn generate(&self) -> Result<String, PtxGenError>
Generates the complete PTX module text for this elementwise operation.
§Errors
Returns PtxGenError if the precision type is unsupported for the
requested operation or if PTX text generation fails.