A Rust implementation of the Khronos OpenCL API.
OpenCL (Open Computing Language) is framework for general purpose parallel programming across heterogeneous devices including: CPUs, GPUs, DSPs, FPGAs and other processors or hardware accelerators.
The OpenCL Specification has evolved over time and not all device vendors support all OpenCL features.
is a unified specification that adds little new functionality to previous OpenCL versions.
It specifies that all OpenCL 1.2 features are mandatory, while all OpenCL 2.x and OpenCL 3.0 features are now optional.
See OpenCL Description.
The OpenCL Specification considers OpenCL as four models:
The physical OpenCL hardware: a host containing one or more OpenCL platforms, each connected to one or more OpenCL devices.
An OpenCL application running on the host, creates an OpenCL environment called a context on a single platform to process data on one or more of the OpenCL devices connected to the platform.
An OpenCL program consists of OpenCL kernel functions that can run on OpenCL devices within a context.
OpenCL programs must be created (and most must be built) for a context before their OpenCL kernel functions can be created from them, the exception being “built-in” kernels which don’t need to be built (or compiled and linked).
OpenCL kernels are controlled by an OpenCL application that runs on the host, see Execution Model.
OpenCL 1.2 memory is divided into two fundamental memory regions: host memory and device memory.
OpenCL kernels run on device memory; an OpenCL application must write host memory to device memory for OpenCL kernels to process. An OpenCL application must also read results from device memory to host memory after a kernel has completed execution.
OpenCL 2.0 shared virtual memory (svm) is shared between the host and device(s) and synchronised by OpenCL; eliminating the explicit transfer of memory between host and device(s) memory regions.
An OpenCL application creates at least one OpenCL command_queue for each OpenCL device (or sub-device) within it’s OpenCL context.
OpenCL kernel executions and OpenCL 1.2 memory reads and writes are “enqueued” by the OpenCL application on each command_queue. An application can wait for all “enqueued” commands to finish on a command_queue or it can wait for specific events to complete. Normally command_queues run commands in the order that they are given. However, events can be used to execute kernels out-of-order.
The platform model has thee objects:
Of these three objects, the OpenCL Context is by far the most important. Each application must create a Context from the most appropriate Devices available on one of Platforms on the host system that the application is running on.
Most example OpenCL applications just choose the first available Platform
and Device for their Context. However, since many systems have multiple
platforms and devices, the first Platform and Device are unlikely to
provide the best performance.
For example, on a system with an APU (combined CPU and GPU, e.g. Intel i7) and a discrete graphics card (e.g. Nvidia GTX 1070) OpenCL may find the either the integrated GPU or the GPU on the graphics card first.
OpenCL applications often require the performance of discrete graphics cards or specific OpenCL features, such as svm or double/half floating point precision. In such cases, it is necessary to query the Platforms and Devices to choose the most appropriate Devices for the application before creating the Context.
The OpenCL programming model has two objects:
OpenCL Program objects can be created from OpenCL source code, built-in kernels, binaries and intermediate language binaries. Depending upon how an OpenCL Program object was created, it may need to be built (or complied and linked) before the Kernels in them can be created.
The OpenCL memory model consists of five objects:
A Buffer is a contiguous block of memory used for general purpose data.
An Image holds data for one, two or three dimensional images.
A Sampler describes how a Kernel is to sample an Image, see Sampler objects.
Shared Virtual Memory enables the host and kernels executing on devices to directly share data without explicitly transferring it.
The OpenCL execution model has two objects:
OpenCL commands to transfer memory and execute kernels on devices are performed via CommandQueues.
Each OpenCL device (and sub-device) must have at least one command_queue associated with it, so that commands may be enqueued on to the device.
There are several OpenCL CommandQueue “enqueue_” methods to transfer data between host and device memory, map SVM memory and execute kernels. All the “enqueue_” methods accept an event_wait_list parameter and return an Event that can be used to monitor and control out-of-order execution of kernels on a CommandQueue, see Event Objects.
Custom Result type to output OpenCL error text.