Crate opencl3

Expand description

A Rust implementation of the Khronos OpenCL API.

Description

This crate provides a relatively simple, object based model of the OpenCL 3.0 API.
It is built upon the cl3 crate, which provides a functional interface to the OpenCL API.

OpenCL (Open Computing Language) is framework for general purpose parallel programming across heterogeneous devices including: CPUs, GPUs, DSPs, FPGAs and other processors or hardware accelerators.

It is often considered as an open-source alternative to Nvidia’s proprietary Compute Unified Device Architecture CUDA for performing General-purpose computing on GPUs, see GPGPU.

The OpenCL Specification has evolved over time and not all device vendors support all OpenCL features.

OpenCL 3.0 is a unified specification that adds little new functionality to previous OpenCL versions.
It specifies that all OpenCL 1.2 features are mandatory, while all OpenCL 2.x and OpenCL 3.0 features are now optional.

See OpenCL Description.

OpenCL Architecture

The OpenCL Specification considers OpenCL as four models:

Platform Model
The physical OpenCL hardware: a host containing one or more OpenCL platforms, each connected to one or more OpenCL devices.
An OpenCL application running on the host, creates an OpenCL environment called a context on a single platform to process data on one or more of the OpenCL devices connected to the platform.
Programming Model
An OpenCL program consists of OpenCL kernel functions that can run on OpenCL devices within a context.
OpenCL programs must be created (and most must be built) for a context before their OpenCL kernel functions can be created from them, the exception being “built-in” kernels which don’t need to be built (or compiled and linked).
OpenCL kernels are controlled by an OpenCL application that runs on the host, see Execution Model.
Memory Model
OpenCL 1.2 memory is divided into two fundamental memory regions: host memory and device memory.
OpenCL kernels run on device memory; an OpenCL application must write host memory to device memory for OpenCL kernels to process. An OpenCL application must also read results from device memory to host memory after a kernel has completed execution.
OpenCL 2.0 shared virtual memory (svm) is shared between the host and device(s) and synchronised by OpenCL; eliminating the explicit transfer of memory between host and device(s) memory regions.
Execution Model
An OpenCL application creates at least one OpenCL command_queue for each OpenCL device (or sub-device) within it’s OpenCL context.
OpenCL kernel executions and OpenCL 1.2 memory reads and writes are “enqueued” by the OpenCL application on each command_queue. An application can wait for all “enqueued” commands to finish on a command_queue or it can wait for specific events to complete. Normally command_queues run commands in the order that they are given. However, events can be used to execute kernels out-of-order.

OpenCL Objects

Platform Model

The platform model has thee objects:

Of these three objects, the OpenCL Context is by far the most important. Each application must create a Context from the most appropriate Devices available on one of Platforms on the host system that the application is running on.

Most example OpenCL applications just choose the first available Platform and Device for their Context. However, since many systems have multiple platforms and devices, the first Platform and Device are unlikely to provide the best performance.
For example, on a system with an APU (combined CPU and GPU, e.g. Intel i7) and a discrete graphics card (e.g. Nvidia GTX 1070) OpenCL may find the either the integrated GPU or the GPU on the graphics card first.

OpenCL applications often require the performance of discrete graphics cards or specific OpenCL features, such as svm or double/half floating point precision. In such cases, it is necessary to query the Platforms and Devices to choose the most appropriate Devices for the application before creating the Context.

The Platform and Device modules contain structures and methods to simplify querying the host system Platforms and Devices to create a Context.

Programming Model

The OpenCL programming model has two objects:

OpenCL Kernel functions are contained in OpenCL Programs.

Kernels are usually defined as functions in OpenCL Program source code, however OpenCL Devices may contain built-in Kernels, e.g.: some Intel GPUs have built-in motion estimation kernels.

OpenCL Program objects can be created from OpenCL source code, built-in kernels, binaries and intermediate language binaries. Depending upon how an OpenCL Program object was created, it may need to be built (or complied and linked) before the Kernels in them can be created.

All the Kernels in an Program can be created together or they can be created individually, by name.

Memory Model

The OpenCL memory model consists of five objects:

Buffer, Image and Sampler are OpenCL 1.2 (i.e. mandatory) objects,
svm and Pipe are are OpenCL 2.0 (i.e. optional) objects.

A Buffer is a contiguous block of memory used for general purpose data.
An Image holds data for one, two or three dimensional images.
A Sampler describes how a Kernel is to sample an Image, see Sampler objects.

Shared Virtual Memory enables the host and kernels executing on devices to directly share data without explicitly transferring it.

Pipes store memory as FIFOs between Kernels. Pipes are not accessible from the host.

Execution Model

The OpenCL execution model has two objects:

OpenCL commands to transfer memory and execute kernels on devices are performed via CommandQueues.

Each OpenCL device (and sub-device) must have at least one command_queue associated with it, so that commands may be enqueued on to the device.

There are several OpenCL CommandQueue “enqueue_” methods to transfer data between host and device memory, map SVM memory and execute kernels. All the “enqueue_” methods accept an event_wait_list parameter and return an Event that can be used to monitor and control out-of-order execution of kernels on a CommandQueue, see Event Objects.

Modules

Type Aliases

Result
Custom Result type to output OpenCL error text.