Crate cuda_std

Source
Expand description

§CUDA Standard Library

The CUDA Standard Library provides a curated set of abstractions for writing performant, reliable, and understandable GPU kernels using the Rustc NVVM backend.

This library will build on non-nvptx targets or targets not using the nvvm backend. However, it will not be usable, and it will throw linker errors if you attempt to use most of the functions in the library. However, kernel automatically cfg-gates the function annotated for nvptx64 or nvptx, therefore, no “actual” functions from this crate should be used when compiling for a non-nvptx target.

This crate cannot be used with the llvm ptx backend either, it heavily relies on external functions implicitly defined by the nvvm backend, as well as internal attributes.

§Structure

This library tries to follow the structure of the Rust standard library to some degree, where different concepts are separated into their own modules.

§The Prelude

In order to simplify imports, we provide a prelude module which contains GPU analogues to standard library structures as well as common imports such as thread.

Re-exports§

pub use float::GpuFloat;
pub use half;
pub use vek;

Modules§

atomic
Atomic Types for modification of numbers in multiple threads in a sound way.
cfg
Utilities for configuring code based on the specified compute capability.
float
Trait for float intrinsics for making floats work in no_std gpu environments.
intrinsics
Raw libdevice math intrinsics.
io
Utilities for printing to stdout from GPU threads.
mem
Support for allocating memory and using alloc using CUDA memory allocation system-calls.
misc
Misc functions that do not exactly fit into other categories.
prelude
ptr
CUDA-specific pointer handling logic.
shared
Static and Dynamic shared memory handling.
thread
Functions for dealing with the parallel thread execution model employed by CUDA.
warp
Functions that work over warps of threads.

Macros§

assert_eq
Asserts that two expression are equal and returns an AssertionFailed error to the application that launched the kernel if it is not true.
assert_ne
Asserts that two expression are not equal and returns an AssertionFailed error to the application that launched the kernel if it is not true.
print
Alternative to print! which works on CUDA. See print for more info.
println
Alternative to println! which works on CUDA. See print for more info.
shared_array
Statically allocates a buffer large enough for len elements of array_type, yielding a *mut array_type that points to uninitialized shared memory. len must be a constant expression.

Structs§

bf16
A 16-bit floating point type implementing the bfloat16 format.
f16
A 16-bit floating point type implementing the IEEE 754-2008 standard binary16 a.k.a half format.

Traits§

FloatExt
Extension trait for f32 and f64 which provides high level functions for low level intrinsics for common math operations. You should generally use these functions over “manual” implementations because they are often much faster.

Attribute Macros§

address_space
Notifies the codegen to put a static/static mut inside of a specific memory address space. This is mostly for internal use and/or advanced users, as the codegen and cuda_std handle address space placement implicitly. Improper use of this macro could yield weird or undefined behavior.
externally_visible
Notifies the codegen that this function is externally visible and should not be removed if it is not used by a kernel. Usually used for linking with other PTX/cubin files.
gpu_only
Creates a cpu version of the function which panics and cfg-gates the function for only nvptx/nvptx64.
kernel
Registers a function as a gpu kernel.