Crate parenchyma [−] [src]

Provides a simple, unified API for running highly parallel computations on different devices across different GPGPU frameworks, allowing you to swap your backend at runtime.

Parenchyma began as a hard fork of Collenchyma, a now-defunct project started at Autumn.

Abstract

Code is often executed on the CPU, but can be executed on other devices, such as GPUs and accelerators. These devices are accessible through GPGPU frameworks. Most interfaces are complicated, making the use of these devices a painful experience. Some of the pain points when writing such code for a particular device are:

portability: not only do frameworks have different interfaces, devices support different versions and machines might have different hardware - all of this leads to code that will be executable only on a very specific set of machines and platforms.
learning curve: executing code on a device through a framework is quite different to running code on the native CPU and comes with a lot of hurdles. OpenCL's 1.2 specification, for example, has close to 400 pages.
custom code: integrating support for devices into your project requires the need for writing a lot of low-level code, e.g., kernels, memory management, and general business logic.

Writing code for non-CPU devices is often a good choice, as these devices can execute operations a lot faster than native CPUs. GPUs, for example, can execute operations roughly one to two orders of magnitudes faster, thanks to better support of parallelizing operations.

Parenchyma eliminates the pain points of writing device code, so you can run your code like any other code without needing to learn about kernels, events, or memory synchronization. Parenchyma also allows you to deploy your code with ease to servers, desktops and mobile device, all while enabling your code to make full use of the underlying hardware.

Architecture

The single entry point of Parenchyma is a Backend. A backend is agnostic over the device it runs operations on. In order to be agnostic over the device, such as native host CPU, GPUs, accelerators or any other devices, the backend needs to be agnostic over the framework as well. The framework is important, as it provides the interface to execute operations on devices, among other things. Since different vendors of hardware use different frameworks, it becomes important that the backend is agnostic over the framework. This allows us to run computations on any machine without having to worry about hardware availability, which gives us the freedom to write code once and deploy it on different machines where it will execute on the most potent hardware by default.

Frameworks

The default framework is simply the host CPU for common computation. To make use of other devices such as GPUs, you may choose a GPGPU framework (such as OpenCL or CUDA) to access the processing capabilities of the device(s).

Extensions

Operations are introduced by a Parenchyma extension. An extension extends your backend with ready-to-execute operations. All you need to do is add the Parenchyma extension crate(s) to your Cargo.toml file alongside the Parenchyma crate. Your backend will then be extended with operations provided by the extension(s). The interface is simply the language you're using to work with Parenchyma. For example, you'd simply call backend.dot(..) using Rust-lang and a BLAS extension. Whether or not the dot operation is executed on one GPU, multiple GPUS or on a CPU device depends solely on how you configured the backend.

Packages

The concept of Parenchyma extensions has one more component - the Package trait. As opposed to executing code on the native CPU, other devices need to compile and build the extension manually at runtime which makes up a significant part of a framework. We need an instance that's able to be initialized at runtime for holding the sate and compiled operations - which is the package's main purpose.

Memory

The last piece of Parenchyma is the memory. An operation happens over data, but this data needs to be accessible to the device on which the operation is executed. That memory space needs to be allocated on the device and then, in a later step, synced from the host to the device or from the device back to the host. Thanks to the Tensor type, we do not have to care about memory management between devices for the execution of operations. The tensor tracks and automatically manages data and its memory across devices, which is often the host and the device. Memory can also be passed around to different backends. Operations take tensors as arguments while handling the synchronization and allocation for you.

Example

extern crate parenchyma as pa;
extern crate parenchyma_nn as pann;
 
use pa::{Backend, Native, OpenCL, SharedTensor};
 
fn main() {
    let ref native: Backend = Backend::new::<Native>()?;
    // Initialize an OpenCL or CUDA backend packaged with the NN extension.
    let ref backend = pann::Backend::new::<OpenCL>()?;
 
    // Initialize two `SharedTensor`s.
    let shape = 1;
    let ref x = SharedTensor::<f32>::with(backend, shape, vec![3.5])?;
    let ref mut result = SharedTensor::<f32>::new(shape);
 
    // Run the sigmoid operation, provided by the NN extension, on 
    // your OpenCL/CUDA enabled GPU (or CPU, which is possible through OpenCL)
    backend.sigmoid(x, result)?;
 
    // Print the result: `[0.97068775] shape=[1], strides=[1]`
    println!("{:?}", result.read(native)?.as_native()?);
}

Development

At the moment, Parenchyma itself will provide Rust APIs for the important frameworks - OpenCL and CUDA.

If a framework isn't specified, the backend will try to use the most potent framework given the underlying hardware - which would probably be in this order: CUDA -> OpenCL -> Native. The process might take longer, as every framework needs to be check and devices need to be loaded in order to identify the best setup. The time it takes to go through that process is a reasonable compromise as it would allow you to deploy a Parenchyma-backed application to almost any machine - server, desktops, mobiles, etc.

Modules

changelog	Project changelog (YEAR-MONTH-DAY)
native	Native backend support.
opencl	OpenCL backend support - heterogeneous computing.
utility	Helper functions and traits

Structs

Backend	The heart of Parenchyma - provides an interface for running parallel computations on one or more devices.
Error	The core error type used in Parenchyma.
Hardware	Hardware can be GPUs, multi-core CPUs or DSPs, Cell/B.E. processor or whatever else is supported by the provided framework. The struct holds all important information about the hardware. To execute code on hardware, turn hardware into a [`Device`].
Native	The native framework
OpenCL	Provides the OpenCL framework.
Shape	Describes the shape of a tensor.
SharedTensor	A shared tensor for framework-agnostic, memory-aware, n-dimensional storage.
Unextended	A marker type for unextended backends/contexts.

Enums

ComputeDevice	A wrapper around the various compute devices.
ErrorKind	A set of general categories.
HardwareKind	General categories for devices, used to identify the type of a device.
Memory	Provides a representation for memory across different frameworks.

Traits

Alloc	Allocator
BoxContext	Initialize a context, box it, and then return it.
Build	Builds a package and provides the functionality for turning a library into backend-specific, executable operations, and tailored for the target framework.
Context	Contexts are the heart of both OpenCL and CUDA applications. Contexts provide a container for objects such as memory, command-queues, programs/modules and kernels.
Device	An device capable of processing data.
ExtensionPackage	Provides the generic functionality for a backend-specific implementation of a library.
Framework	A trait implemented for all frameworks. `Framework`s contain a list of all available devices as well as other objects specific to the implementor.
Synch	Synchronizer
Viewable	A viewable device.

Type Definitions

Result

A specialized Result type.