Crate parenchyma [−] [src]
Provides a simple, unified API for running highly parallel computations on different devices across different GPGPU frameworks, allowing you to swap your backend at runtime.
Parenchyma began as a hard fork of Collenchyma, a now-defunct project started at Autumn.
Abstract
Code is often executed on the CPU, but can be executed on other devices, such as GPUs and accelerators. These devices are accessible through GPGPU frameworks. Most interfaces are complicated, making the use of these devices a painful experience. Some of the pain points when writing such code for a particular device are:
- portability: not only do frameworks have different interfaces, devices support different versions and machines might have different hardware - all of this leads to code that will be executable only on a very specific set of machines and platforms.
- learning curve: executing code on a device through a framework is quite different to running code on the native CPU and comes with a lot of hurdles. OpenCL's 1.2 specification, for example, has close to 400 pages.
- custom code: integrating support for devices into your project requires the need for writing a lot of low-level code, e.g., kernels, memory management, and general business logic.
Writing code for non-CPU devices is often a good choice, as these devices can execute operations a lot faster than native CPUs. GPUs, for example, can execute operations roughly one to two orders of magnitudes faster, thanks to better support of parallelizing operations.
Parenchyma eliminates the pain points of writing device code, so you can run your code like any other code without needing to learn about kernels, events, or memory synchronization. Parenchyma also allows you to deploy your code with ease to servers, desktops and mobile device, all while enabling your code to make full use of the underlying hardware.
Architecture
The single entry point of Parenchyma is a Backend. A backend is agnostic over the device it runs operations on. In order to be agnostic over the device, such as native host CPU, GPUs, accelerators or any other devices, the backend needs to be agnostic over the framework as well. The framework is important, as it provides the interface to execute operations on devices, among other things. Since different vendors of hardware use different frameworks, it becomes important that the backend is agnostic over the framework. This allows us to run computations on any machine without having to worry about hardware availability, which gives us the freedom to write code once and deploy it on different machines where it will execute on the most potent hardware by default.
Frameworks
The default framework is simply the host CPU for common computation. To make use of other devices such as GPUs, you may choose a GPGPU framework (such as OpenCL or CUDA) to access the processing capabilities of the device(s).
Extensions
Operations are introduced by a Parenchyma extension. An extension extends your backend with
ready-to-execute operations. All you need to do is add the Parenchyma extension crate(s)
to your Cargo.toml
file alongside the Parenchyma crate. Your backend will then be extended with
operations provided by the extension(s). The interface is simply the language you're using to
work with Parenchyma. For example, you'd simply call backend.dot(..)
using Rust-lang and
a BLAS extension. Whether or not the dot operation is executed on one GPU, multiple GPUS or on
a CPU device depends solely on how you configured the backend.
Packages
The concept of Parenchyma extensions has one more component - the Package trait. As opposed to executing code on the native CPU, other devices need to compile and build the extension manually at runtime which makes up a significant part of a framework. We need an instance that's able to be initialized at runtime for holding the sate and compiled operations - which is the package's main purpose.
Memory
The last piece of Parenchyma is the memory. An operation happens over data, but this data needs to be accessible to the device on which the operation is executed. That memory space needs to be allocated on the device and then, in a later step, synced from the host to the device or from the device back to the host. Thanks to the Tensor type, we do not have to care about memory management between devices for the execution of operations. The tensor tracks and automatically manages data and its memory across devices, which is often the host and the device. Memory can also be passed around to different backends. Operations take tensors as arguments while handling the synchronization and allocation for you.
Example
extern crate parenchyma as pa; extern crate parenchyma_nn as pann; use pa::{Backend, Native, OpenCL, SharedTensor}; fn main() { let ref native: Backend = Backend::new::<Native>()?; // Initialize an OpenCL or CUDA backend packaged with the NN extension. let ref backend = pann::Backend::new::<OpenCL>()?; // Initialize two `SharedTensor`s. let shape = 1; let ref x = SharedTensor::<f32>::with(backend, shape, vec![3.5])?; let ref mut result = SharedTensor::<f32>::new(shape); // Run the sigmoid operation, provided by the NN extension, on // your OpenCL/CUDA enabled GPU (or CPU, which is possible through OpenCL) backend.sigmoid(x, result)?; // Print the result: `[0.97068775] shape=[1], strides=[1]` println!("{:?}", result.read(native)?.as_native()?); }
Development
At the moment, Parenchyma itself will provide Rust APIs for the important frameworks - OpenCL and CUDA.
If a framework isn't specified, the backend will try to use the most potent framework given the underlying hardware - which would probably be in this order: CUDA -> OpenCL -> Native. The process might take longer, as every framework needs to be check and devices need to be loaded in order to identify the best setup. The time it takes to go through that process is a reasonable compromise as it would allow you to deploy a Parenchyma-backed application to almost any machine - server, desktops, mobiles, etc.
Modules
changelog |
Project changelog (YEAR-MONTH-DAY) |
native |
Native backend support. |
opencl |
OpenCL backend support - heterogeneous computing. |
utility |
Helper functions and traits |
Structs
Backend |
The heart of Parenchyma - provides an interface for running parallel computations on one or more devices. |
Error |
The core error type used in Parenchyma. |
Hardware |
Hardware can be GPUs, multi-core CPUs or DSPs, Cell/B.E. processor or whatever else
is supported by the provided framework. The struct holds all important information about
the hardware. To execute code on hardware, turn hardware into a [ |
Native |
The native framework |
OpenCL |
Provides the OpenCL framework. |
Shape |
Describes the shape of a tensor. |
SharedTensor |
A shared tensor for framework-agnostic, memory-aware, n-dimensional storage. |
Unextended |
A marker type for unextended backends/contexts. |
Enums
ComputeDevice |
A wrapper around the various compute devices. |
ErrorKind |
A set of general categories. |
HardwareKind |
General categories for devices, used to identify the type of a device. |
Memory |
Provides a representation for memory across different frameworks. |
Traits
Alloc |
Allocator |
BoxContext |
Initialize a context, box it, and then return it. |
Build |
Builds a package and provides the functionality for turning a library into backend-specific, executable operations, and tailored for the target framework. |
Context |
Contexts are the heart of both OpenCL and CUDA applications. Contexts provide a container for objects such as memory, command-queues, programs/modules and kernels. |
Device |
An device capable of processing data. |
ExtensionPackage |
Provides the generic functionality for a backend-specific implementation of a library. |
Framework |
A trait implemented for all frameworks. |
Synch |
Synchronizer |
Viewable |
A viewable device. |
Type Definitions
Result |
A specialized |