Crate cuda_std[−][src]

Expand description

CUDA Standard Library

The CUDA Standard Library provides a curated set of abstractions for writing performant, reliable, and understandable GPU kernels using the Rustc NVVM backend.

This library will build on non-nvptx targets or targets not using the nvvm backend. However, it will not be usable, and it will throw linker errors if you attempt to use most of the functions in the library. However, kernel automatically cfg-gates the function annotated for nvptx64 or nvptx, therefore, no “actual” functions from this crate should be used when compiling for a non-nvptx target.

This crate cannot be used with the llvm ptx backend either, it heavily relies on external functions implicitly defined by the nvvm backend, as well as internal attributes.

Structure

This library tries to follow the structure of the Rust standard library to some degree, where different concepts are separated into their own modules.

The Prelude

In order to simplify imports, we provide a prelude module which contains GPU analogues to standard library structures as well as common imports such as thread.

Re-exports

pub use float::GpuFloat;

pub use half;

pub use vek;

Modules

float

Trait for float intrinsics for making floats work in no_std gpu environments.

intrinsics

Raw libdevice math intrinsics.

io

Utilities for printing to stdout from GPU threads.

mem

Support for allocating memory and using alloc using CUDA memory allocation system-calls.

misc

Misc functions that do not exactly fit into other categories.

prelude

ptr

CUDA-specific pointer handling logic.

shared

Static and Dynamic shared memory handling.

thread

Functions for dealing with the parallel thread execution model employed by CUDA.

warp

Functions that work over warps of threads.

Macros

assert_eq

Asserts that two expression are equal and returns an AssertionFailed error to the application that launched the kernel if it is not true.

assert_ne

Asserts that two expression are not equal and returns an AssertionFailed error to the application that launched the kernel if it is not true.

print

Alternative to print! which works on CUDA. See print for more info.

println

Alternative to println! which works on CUDA. See print for more info.

shared_array

Statically allocates a buffer large enough for len elements of array_type, yielding a *mut array_type that points to uninitialized shared memory. len must be a constant expression.

Structs

bf16

A 16-bit floating point type implementing the bfloat16 format.

f16

A 16-bit floating point type implementing the IEEE 754-2008 standard binary16 a.k.a half format.

Traits

FloatExt

Extension trait for f32 and f64 which provides high level functions for low level intrinsics for common math operations. You should generally use these functions over “manual” implementations because they are often much faster.

Attribute Macros

address_space

Notifies the codegen to put a static/static mut inside of a specific memory address space. This is mostly for internal use and/or advanced users, as the codegen and cuda_std handle address space placement implicitly. Improper use of this macro could yield weird or undefined behavior.

externally_visible

Notifies the codegen that this function is externally visible and should not be removed if it is not used by a kernel. Usually used for linking with other PTX/cubin files.

gpu_only

Creates a cpu version of the function which panics and cfg-gates the function for only nvptx/nvptx64.

kernel

Registers a function as a gpu kernel.