Expand description
Link-time optimisation for JIT-linking multiple PTX modules.
This module wraps the CUDA linker API (cuLinkCreate, cuLinkAddData,
cuLinkAddFile, cuLinkComplete, cuLinkDestroy) for combining
multiple PTX, cubin, or fatbin inputs into a single linked binary.
§Platform behaviour
On macOS (where NVIDIA dropped CUDA support), all linker operations use a synthetic in-memory implementation. PTX inputs are accumulated and concatenated into a synthetic cubin blob so that the full API surface can be exercised in tests without a GPU.
§Example
let opts = LinkerOptions::default();
let mut linker = Linker::new(opts)?;
linker.add_ptx(r#"
.version 7.0
.target sm_70
.address_size 64
.visible .entry kernel_a() { ret; }
"#, "module_a.ptx")?;
linker.add_ptx(r#"
.version 7.0
.target sm_70
.address_size 64
.visible .entry kernel_b() { ret; }
"#, "module_b.ptx")?;
let linked = linker.complete()?;
println!("cubin size: {} bytes", linked.cubin_size());Structs§
- Linked
Module - The output of a successful link operation.
- Linker
- RAII wrapper around the CUDA link state (
CUlinkState). - Linker
Options - Options controlling the JIT linker’s behaviour.
Enums§
- Fallback
Strategy - Strategy when an exact binary match is not found for the target GPU.
- Link
Input Type - The type of input data being added to the linker.
- Optimization
Level - JIT optimisation level for the linker.