Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
cuTile Rust Macros
Procedural macros behind cutile::module. They expand cuTile Rust modules into:
- Type-checkable Rust items that drive rustc's validation of kernel code.
- An AST capture (
__module_ast_self()) plus a linker-registry entry that hands the original generic source to the JIT compiler. - Host-side kernel launch functions for
#[entry]points.
Two-track design: macro output is for rustc, not the JIT
The macro runs at Rust compile time and emits three kinds of output:
- Rank-instance instantiation and shadow-trait dispatch. Items that use a const-generic-array (CGA) generic —
const X: [i32; N]— get materialized as concrete Rust that rustc can type-check. See Rank-instance instantiation and Shadow dispatch below. - AST capture and registry registration.
__module_ast_self()stores the pre-expansion source text viaSpan::source_text()and re-parses it at runtime; alinkme::distributed_sliceentry registers the module into a global registry the JIT consults. The JIT sees the original genericfn foo<const S: [i32; N]>(...) { … real body … }, not macro-emitted expansions. See Linker-registry module discovery below. - Kernel launchers for
#[entry]functions.
The separation matters: items emitted in (1) exist only to make rustc type-check kernel bodies and give good error messages. The JIT does its own rank-instance specialization from the original source, independent of anything the macro emits. Consequently, the macro never has to grow to support new user-code language features — adding support for (say) user-defined structs or methods is the JIT's job. The macro only has to handle whatever Rust patterns appear in op signatures inside _core.rs.
Linker-registry module discovery
#[cutile::module] emits two items used by the JIT to discover this module at runtime:
// Per-module AST builder. Returns just *this* module's `Module` value
// (re-parsed source text + span base). No recursive dep aggregation.
// Self-registers this module into the global registry at link time.
static __CUTILE_MODULE_ENTRY_FOO: CutileModuleEntry =
CutileModuleEntry ;
linkme's #[distributed_slice] exploits linker-section concatenation: each registration emits a static value tagged with a platform-specific section attribute (#[link_section = "..."]). At link time the linker concatenates every object file's same-named sections into one contiguous region; at runtime CUTILE_MODULES iterates that region as a slice of entries. The contract is between the macro and the linker, not Rust's module system — every crate that emits cuTile module entries with the agreed slice name participates in the same registry, regardless of visibility, semver, or path hierarchy.
JIT-time dep discovery
CUDATileModules::from_kernel(kernel) walks the kernel's use statements iteratively against the registry:
- Seed the working set with the kernel module.
- For each
use path::*;(oruse path::item;) in the kernel, find the longest prefix that's registered inCUTILE_MODULES. If found and not yet visited, call itsbuildclosure to produce that module's AST, add it to the working set, recurse on its ownusestatements. - Continue until the working set stabilizes (cycle detection by visited absolute paths).
- Build a
NameResolverfrom the final working set.
crate::* paths are resolved against the owning module's crate-root segment (extracted from its module_path!()-derived absolute path). Unregistered paths (std::, half::{bf16, f16}, plain Rust submodules) are skipped silently.
Re-exports need manual aliases
The registry uses string paths and can't follow Rust re-exports. If a crate exposes a module at a public path different from its canonical module_path!(), e.g.:
// canonical: `cutile::_core::core`
pub use core; // public: `cutile::core`
then a use cutile::core::*; in user code won't match the registered cutile::_core::core entry. Until macro support lands, register an alias entry by hand next to the pub use:
static __CUTILE_REEXPORT_CORE: CutileModuleEntry =
CutileModuleEntry ;
Caveats
- Static linking only. Modules inside a
dlopened.soaren't reachable from the host'sCUTILE_MODULESiteration. - Platform coverage.
linkmesupports Linux, macOS, Windows, and Wasm. Currently verified on Linux only. - Mis-registered paths are silent misses. A wrong
absolute_pathproduces a JIT-time lookup miss rather than a link-time error. Mitigated by deriving the path frommodule_path!()at the static's location, which matches what the JIT looks up by construction.
Rank-instance instantiation (rank_instantiation.rs)
For items annotated with #[cuda_tile::variadic_struct], #[cuda_tile::variadic_impl], or #[cuda_tile::variadic_trait_impl], the macro emits one rank instance per rank, replacing each const X: [i32; N] with scalar consts X_0, X_1, …, X_{R-1} and suffixing CGA-bearing path segments accordingly.
// Source
// Macro output (abridged):
// … through Tile_4
Inherent impls instantiate the same way; their method bodies get walked by a syn::visit_mut::VisitMut pass (RankInstantiator) that substitutes the active CGA bindings into types, expression paths, struct literals, and const_shape! / const_array! macros.
Shadow dispatch (shadow_dispatch.rs)
Any function annotated with #[cuda_tile::variadic_op(...)], or any trait declared with #[cuda_tile::variadic_trait(...)], is treated as rank-polymorphic. Instead of emitting rank-suffixed callables that the user has to spell out, the macro synthesizes a shadow trait — a single CGA-erased rank-polymorphic trait that doesn't exist in user source — together with one impl per rank instance and (for free fns) a free-fn wrapper that delegates through the shadow trait. User code calls the unsuffixed name and rustc resolves it via normal trait lookup against the shadow's impls. The "shadow" framing captures the synthesis: the trait and impls are auxiliary scaffolding mirroring the user's variadic op, not part of the user's declared API.
Example — addf (same-shape case):
// Source
;
// Macro output (abridged — trait + wrapper shown; 7 rank-instance impls elided)
// ... ranks 0..=6
User code writes the call naturally; rustc resolves it through the trait:
Signature patterns the emitter handles
The emitter recognizes three shapes for the return type and emits the right trait form for each:
- Same-shape (case 3a). Return matches the first shape-bearing arg. Method returns
Self. Examples:addf,subf, unary math,fma, comparisons that preserve shape. - Bound return (case 3b). Return differs from
Selfbut every generic in the return is also referenced by some argument. Method returnsSelf::Outwith an associated type. Examples:reshape,broadcast,constant, comparisons withboolreturn. - Free return (case 3c). Return contains a generic not present in any argument (an associated type would break coherence).
Outbecomes a trait generic that the caller ascribes at the return site. Examples:permute,reduce_sum,bitcast,exti.
The emitter also handles: nested-type recursion (Option<Tile<bool, S>>), reborrow on reference receivers (&Tensor, &mut Tensor), return-only lifetimes (lifted to trait generics), rank-dependent argument types (idx: [i32; N] promoted to a trait generic), and literal-CGA patterns (Tile<E, {[]}> → Tile_0<E>).
Debugging
Generate backtraces on macro panics:
Dump generated kernel launcher code to a directory:
Inspect macro-emitted code for a specific module:
Testing