1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
//! Provides a simple, unified API for running highly parallel computations on different
//! devices across different GPGPU frameworks, allowing you to swap your backend at runtime.
//!
//! Parenchyma began as a hard fork of [Collenchyma], a now-defunct project started at [Autumn].
//!
//! ## Abstract
//!
//! Code is often executed on the CPU, but can be executed on other devices, such as GPUs
//! and accelerators. These devices are accessible through GPGPU frameworks. Most interfaces are 
//! complicated, making the use of these devices a painful experience. Some of the pain points when 
//! writing such code for a particular device are:
//!
//! * portability: not only do frameworks have different interfaces, devices support different 
//! versions and machines might have different hardware - all of this leads to code that will be 
//! executable only on a very specific set of machines and platforms.
//! * learning curve: executing code on a device through a framework is quite different to
//! running code on the native CPU and comes with a lot of hurdles. OpenCL's 1.2 specification, for
//! example, has close to 400 pages.
//! * custom code: integrating support for devices into your project requires the need for writing
//! a lot of low-level code, e.g., kernels, memory management, and general business logic.
//!
//! Writing code for non-CPU devices is often a good choice, as these devices can execute
//! operations a lot faster than native CPUs. GPUs, for example, can execute operations roughly
//! one to two orders of magnitudes faster, thanks to better support of parallelizing operations.
//!
//! Parenchyma eliminates the pain points of writing device code, so you can run your code like any 
//! other code without needing to learn about kernels, events, or memory synchronization. Parenchyma
//! also allows you to deploy your code with ease to servers, desktops and mobile device, all while 
//! enabling your code to make full use of the underlying hardware. 
//!
//! ## Architecture
//!
//! The single entry point of Parenchyma is a [Backend](./struct.Backend.html). A
//! backend is agnostic over the device it runs operations on. In order to be agnostic over the 
//! device, such as native host CPU, GPUs, accelerators or any other devices, the backend needs to 
//! be agnostic over the framework as well. The framework is important, as it provides the interface 
//! to execute operations on devices, among other things. Since different vendors of hardware use 
//! different frameworks, it becomes important that the backend is agnostic over the framework.
//! This allows us to run computations on any machine without having to worry about hardware 
//! availability, which gives us the freedom to write code once and deploy it on different machines 
//! where it will execute on the most potent hardware by default.
//!
//! ### Frameworks
//!
//! The default framework is simply the host CPU for common computation. To make use of other
//! devices such as GPUs, you may choose a GPGPU framework (such as OpenCL or CUDA) to access the 
//! processing capabilities of the device(s).
//!
//! ### Extensions
//!
//! Operations are introduced by a Parenchyma extension. An extension extends your backend with 
//! ready-to-execute operations. All you need to do is add the Parenchyma extension crate(s)
//! to your `Cargo.toml` file alongside the Parenchyma crate. Your backend will then be extended with
//! operations provided by the extension(s). The interface is simply the language you're using to 
//! work with Parenchyma. For example, you'd simply call `backend.dot(..)` using Rust-lang and 
//! a BLAS extension. Whether or not the dot operation is executed on one GPU, multiple GPUS or on 
//! a CPU device depends solely on how you configured the backend.
//!
//! ### Packages
//!
//! The concept of Parenchyma extensions has one more component - the [Package](./trait.ExtensionPackage.html)
//! trait. As opposed to executing code on the native CPU, other devices need to compile and build 
//! the extension manually at runtime which makes up a significant part of a framework. We need an
//! instance that's able to be initialized at runtime for holding the sate and compiled 
//! operations - which is the package's main purpose.
//!
//! ### Memory
//!
//! The last piece of Parenchyma is the memory. An operation happens over data, but this data needs
//! to be accessible to the device on which the operation is executed. That memory space needs to be
//! allocated on the device and then, in a later step, synced from the host to the device or from
//! the device back to the host. Thanks to the [Tensor](./struct.SharedTensor.html) type, we do not
//! have to care about memory management between devices for the execution of operations. The tensor
//! tracks and automatically manages data and its memory across devices, which is often the host and
//! the device. Memory can also be passed around to different backends. Operations take tensors
//! as arguments while handling the synchronization and allocation for you.
//!
//! ## Example
//!
//! ```ignore
//! extern crate parenchyma as pa;
//! extern crate parenchyma_nn as pann;
//! 
//! use pa::{Backend, Native, OpenCL, SharedTensor};
//! 
//! fn main() {
//!     let ref native: Backend = Backend::new::<Native>()?;
//!     // Initialize an OpenCL or CUDA backend packaged with the NN extension.
//!     let ref backend = pann::Backend::new::<OpenCL>()?;
//! 
//!     // Initialize two `SharedTensor`s.
//!     let shape = 1;
//!     let ref x = SharedTensor::<f32>::with(backend, shape, vec![3.5])?;
//!     let ref mut result = SharedTensor::<f32>::new(shape);
//! 
//!     // Run the sigmoid operation, provided by the NN extension, on 
//!     // your OpenCL/CUDA enabled GPU (or CPU, which is possible through OpenCL)
//!     backend.sigmoid(x, result)?;
//! 
//!     // Print the result: `[0.97068775] shape=[1], strides=[1]`
//!     println!("{:?}", result.read(native)?.as_native()?);
//! }
//! ```
//!
//! ## Development
//!
//! At the moment, Parenchyma itself will provide Rust APIs for the important 
//! frameworks - OpenCL and CUDA.
//!
//! If a framework isn't specified, the backend will try to use the most potent framework given
//! the underlying hardware - which would probably be in this order: CUDA -> OpenCL -> Native. The 
//! process might take longer, as every framework needs to be check and devices need to be loaded in
//! order to identify the best setup. The time it takes to go through that process is a reasonable
//! compromise as it would allow you to deploy a Parenchyma-backed application to almost any
//! machine - server, desktops, mobiles, etc.
//!
//! [Collenchyma]: https://github.com/autumnai/collenchyma
//! [Autumn]: https://github.com/autumnai

#![deny(missing_docs, missing_debug_implementations, unused_import_braces, unused_qualifications)]
#![feature(associated_consts, pub_restricted)]
#![feature(libc, unsize, untagged_unions)]

#[macro_use] extern crate enum_primitive;
#[macro_use] extern crate lazy_static;
#[macro_use] extern crate log;

extern crate libc;
extern crate libloading as lib;
extern crate ndarray;

pub mod changelog;
pub mod utility;

pub use self::frameworks::{native, opencl};

pub use self::backend::Backend;
pub use self::context::Context;
pub use self::error::{Error, ErrorKind, Result};
pub use self::extension::{Build, ExtensionPackage, Unextended};
pub use self::framework::{BoxContext, Framework};
pub use self::frameworks::native::Native;
pub use self::frameworks::opencl::OpenCL;
pub use self::hardware::{Alloc, ComputeDevice, Device, Hardware, HardwareKind, Synch, Viewable};
pub use self::memory::Memory;
pub use self::tensor::{Shape, SharedTensor};

mod backend;
mod context;
mod error;
mod extension;
mod framework;
mod frameworks;
mod hardware;
mod memory;
mod tensor;