cu-embed 0.1.0

Compile CUDA kernels with nvcc, embed cubin/PTX artifacts, and load the best module at runtime.
docs.rs failed to build cu-embed-0.1.0
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.

cu-embed

cu-embed compiles CUDA .cu sources with nvcc during build.rs, embeds the generated artifacts into your Rust binary, and loads the best module for the installed NVIDIA GPU at runtime.

It is designed to work well with cudarc, rust-embed, and Nix-based builds.

What it does

  • Compiles one or more .cu files into per-architecture CUBINs.
  • Generates a conservative PTX fallback for forward compatibility.
  • Writes a manifest describing the embedded assets.
  • Lets your runtime prefer exact-match CUBINs, then compatible CUBINs, then PTX.

The target machine only needs a suitable NVIDIA driver. It does not need a matching CUDA toolkit installed.

Build-time usage

In your crate's Cargo.toml:

[build-dependencies]
cu-embed = "0.1"

Then in build.rs:

fn main() {
    cu_embed::build::Builder::new()
        .source_dir("src/kernels")
        .build()
        .expect("failed to build embedded CUDA kernels");
}

Runtime usage

Your runtime crate needs a rust-embed asset type pointed at $CU_EMBED_ASSET_DIR:

use cu_embed::EmbeddedCudaModules;
use rust_embed::RustEmbed;

#[derive(RustEmbed)]
#[folder = "$CU_EMBED_ASSET_DIR"]
struct EmbeddedCudaAssets;

Then load modules through EmbeddedCudaModules:

let modules = EmbeddedCudaModules::<EmbeddedCudaAssets>::new()?;
let module = modules.load_module(&ctx, "my_kernel")?;

EmbeddedCudaModules selects artifacts in this order:

  1. exact sm_* CUBIN
  2. same-major lower-minor CUBIN
  3. embedded PTX fallback

Configuration

  • CU_EMBED_CUDA_ARCHES: override the CUBIN architectures, for example sm_80,sm_90.
  • CU_EMBED_PTX_ARCH: override the PTX target, for example compute_80.
  • NVCC, NVCC_CCBIN, CUDA_PATH: standard CUDA toolchain overrides used by the builder.

Example

See examples/add_scalar.rs for a complete runtime example.

License

Licensed under either of:

at your option.