Crate cuda_std[][src]

Expand description

CUDA Standard Library

The CUDA Standard Library provides a curated set of abstractions for writing performant, reliable, and understandable GPU kernels using the Rustc NVVM backend.

This library will build on non-nvptx targets or targets not using the nvvm backend. However, it will not be usable, and it will throw linker errors if you attempt to use most of the functions in the library. However, kernel automatically cfg-gates the function annotated for nvptx64 or nvptx, therefore, no “actual” functions from this crate should be used when compiling for a non-nvptx target.

This crate cannot be used with the llvm ptx backend either, it heavily relies on external functions implicitly defined by the nvvm backend, as well as internal attributes.

Structure

This library tries to follow the structure of the Rust standard library to some degree, where different concepts are separated into their own modules.

The Prelude

In order to simplify imports, we provide a prelude module which contains GPU analogues to standard library structures as well as common imports such as thread.

Re-exports

pub use float::GpuFloat;
pub use half;
pub use vek;

Modules

Trait for float intrinsics for making floats work in no_std gpu environments.

Raw libdevice math intrinsics.

Utilities for printing to stdout from GPU threads.

Support for allocating memory and using alloc using CUDA memory allocation system-calls.

Misc functions that do not exactly fit into other categories.

CUDA-specific pointer handling logic.

Shared memory handling. Currently only macros.

Functions for dealing with the parallel thread execution model employed by CUDA.

Functions that work over warps of threads.

Macros

Asserts that two expression are equal and returns an AssertionFailed error to the application that launched the kernel if it is not true.

Asserts that two expression are not equal and returns an AssertionFailed error to the application that launched the kernel if it is not true.

Alternative to print! which works on CUDA. See print for more info.

Alternative to println! which works on CUDA. See print for more info.

Statically allocates a buffer large enough for len elements of array_type, yielding a *mut array_type that points to uninitialized shared memory. len must be a constant expression.

Structs

A 16-bit floating point type implementing the bfloat16 format.

A 16-bit floating point type implementing the IEEE 754-2008 standard binary16 a.k.a half format.

Traits

Extension trait for f32 and f64 which provides high level functions for low level intrinsics for common math operations. You should generally use these functions over “manual” implementations because they are often much faster.

Attribute Macros

Notifies the codegen to put a static/static mut inside of a specific memory address space. This is mostly for internal use and/or advanced users, as the codegen and cuda_std handle address space placement implicitly. Improper use of this macro could yield weird or undefined behavior.

Notifies the codegen that this function is externally visible and should not be removed if it is not used by a kernel. Usually used for linking with other PTX/cubin files.

Creates a cpu version of the function which panics and cfg-gates the function for only nvptx/nvptx64.

Registers a function as a gpu kernel.