Struct Program

Source

pub struct Program { /* private fields */ }

Expand description

Abstraction that contains everything to run an OpenCL kernel on a GPU.

The majority of methods are the same as crate::cuda::Program, so you can write code using this API, which will then work with OpenCL as well as CUDA kernels.

Implementations§

Source §

impl Program

Source

pub fn device_name(&self) -> &str

Returns the name of the GPU, e.g. “GeForce RTX 3090”.

Source

pub fn from_opencl(device: &Device, src: &str) -> Result<Program, GPUError>

Creates a program for a specific device from OpenCL source code.

Examples found in repository ?

examples/add.rs (line 17)

14fn opencl(device: &Device) -> Program {
15    let opencl_kernel = include_str!("./add.cl");
16    let opencl_device = device.opencl_device().unwrap();
17    let opencl_program = opencl::Program::from_opencl(opencl_device, opencl_kernel).unwrap();
18    Program::Opencl(opencl_program)
19}

Source

pub fn from_binary(device: &Device, bin: Vec<u8>) -> Result<Program, GPUError>

Creates a program for a specific device from a compiled OpenCL binary.

Source

pub unsafe fn create_buffer<T>( &self, length: usize, ) -> Result<Buffer<T>, GPUError>

Creates a new buffer that can be used for input/output with the GPU.

The length is the number of elements to create.

It is usually used to create buffers that are initialized by the GPU. If you want to directly transfer data from the host to the GPU, you would use the safe Program::create_buffer_from_slice instead.

§Safety

This function isn’t actually unsafe, it’s marked as unsafe due to the CUDA version of it, where it is unsafe. This is done to have symmetry between both APIs.

Source

pub fn create_buffer_from_slice<T>( &self, slice: &[T], ) -> Result<Buffer<T>, GPUError>

Creates a new buffer on the GPU and initializes with the given slice.

Source

pub fn create_kernel( &self, name: &str, global_work_size: usize, local_work_size: usize, ) -> Result<Kernel<'_>, GPUError>

Returns a kernel.

The global_work_size does not follow the OpenCL definition. It is not the total number of threads. Instead it follows CUDA’s definition and is the number of local_work_size sized thread groups. So the total number of threads is global_work_size * local_work_size.

Source

pub fn write_from_buffer<T>( &self, buffer: &mut Buffer<T>, data: &[T], ) -> Result<(), GPUError>

Puts data from an existing buffer onto the GPU.

Source

pub fn read_into_buffer<T>( &self, buffer: &Buffer<T>, data: &mut [T], ) -> Result<(), GPUError>

Reads data from the GPU into an existing buffer.

Source

pub fn run<F, R, E, A>(&self, fun: F, arg: A) -> Result<R, E>
where F: FnOnce(&Self, A) -> Result<R, E>, E: From<GPUError>,

Run some code in the context of the program.

It takes the program as a parameter, so that we can use the same function body, for both the OpenCL and the CUDA code path. The only difference is the type of the program.