1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
//! Emu is a framework/compiler for GPU acceleration, GPU programming. It is
//! first and foremost a procedural macro that looks at a subset of safe Rust
//! code and attempts to offload parts of it to a GPU. As an example of how you could use Emu, let's start off with a simple
//! vector-by-scalar multiplication. We can implement this in pure Rust as
//! follows.
//! ```
//! fn main() {
//!     let mut data = vec![0.1; 1000];
//!
//!     for i in 0..1000 {
//!         data[i] = data[i] * 10.0;
//!     }
//! }
//! ```
//! Emu let's you run parts of your program on the GPU by declaring
//! things you want the GPU to do. These declarations can tell the GPU to
//! load data, launch computation, etc. Here are appropriate declarations for
//! this program.
//! ```
//! # extern crate em;
//! # use em::*;
//! fn main() {
//!     let mut data = vec![0.1; 1000];
//!
//!     gpu_do!(load(data));
//!     gpu_do!(launch());
//!     for i in 0..1000 {
//!         data[i] = data[i] * 10.0;
//!     }
//!     gpu_do!(read(data));
//! }
//! ```
//! But these declarations don't actually do anything. They are really just
//! declarations. To actually have Emu interpret these declarations and do
//! something, we need to be able to tell Emu to use the GPU for a piece of
//! code. We do this by tagging the function we are working on with
//! `#[gpu_use]`. Here's how to do that.
//! ```
//! # extern crate em;
//! # use em::*;
//! #[gpu_use]
//! fn main() {
//!     let mut data = vec![0.1; 1000];
//!
//!     gpu_do!(load(data));
//!     gpu_do!(launch());
//!     for i in 0..1000 {
//!         data[i] = data[i] * 10.0;
//!     }
//!     gpu_do!(read(data));
//! }
//! ```
//! And now Emu will actually look through your code, load data onto the GPU,
//! launch code on the GPU, and read back from the GPU. This example should
//! give you a sense of how you can use Emu and what Emu tries to do.
//! Ultimately, what Emu comes down to is to parts.
//! 1. Passing - passing the GPU around (function to function), with `#[gpu_use]`
//! 2. Accelerating - using the GPU to load/read data, launch with `gpu_do!()`
//!
//! You've actually seen both passing and accelerating in the above example.
//! But to get a better idea of how to do passing and accelerating you should
//! look at the documentation for `#[gpu_use]` and `gpu_do!()` respectively.
//! 1. Passing - look at docs for `#[gpu_use]`
//! 2. Accelerating - look at docs for `gpu_do!()`
//!
//! Once you understand, passing and accelerating, you understand Emu. These
//! are the main high-level ideas of GPU programming with Emu. Looking at their
//! documentation should help you understand them better.

pub use emu_macro::gpu_use;
pub use ocl;

/// A container that holds information needed for interacting with a GPU using OpenCL.
///
/// You should really only use this if you intend to drop down to low-level OpenCL for maximum performance
/// Buffers and programs are stored in hash tables. Programs are indexed by their source code.
/// Buffers are indexed by a `*const [f32]`. Given a value `data`, you can get the `*const [f32]` index with `get_buffer_key!(data)`.
///
/// Note that `data` must have an `as_slice()` method defined for its type. As an example `data` could be of type `Vec`.
pub struct Gpu {
    pub device: ocl::Device,
    pub context: ocl::Context,
    pub queue: ocl::Queue,
    pub buffers: std::collections::HashMap<*const [f32], ocl::Buffer<f32>>,
    pub programs: std::collections::HashMap<String, ocl::Program>, // TODO cache kernels instead of programs if possible
                                                                   // kernels can be cached instead of programs, if it is easy to change the dims and args of a kernel
}

/// A macro for getting key to access a `Buffer` in the `buffers` field of a `Gpu`.
///
/// Given a value `data`, you can get the `*const [f32]` index with `get_buffer_key!(data)`.
/// Note that `data` must have an `as_slice()` method defined for its type. As an example `data` could be of type `Vec`.
/// This should really only be used if you want to drop down to low-level OpenCL for maximum performance gain.
///
/// Here's a quick example.
/// ```
/// # extern crate em;
/// # use em::*;
/// #[gpu_use] // this inserts a "let gpu = Gpu { ... };" at the start of the main function
/// fn main() {
///     let data = vec![0.0; 1000];
///     gpu_do!(load(data));
///     let buffer: &ocl::Buffer<f32> = gpu.buffers.get(&get_buffer_key!(data)).unwrap();
///
///     // do something with buffer...
/// }
/// ```
#[macro_export]
macro_rules! get_buffer_key {
    ($i:ident) => {
        ($i.as_slice() as *const [f32])
    };
}

/// A macro for declaring a thing that the GPU should do.
///
/// By declaring things that the GPU should do, this macro essentially serves
/// as the "accelerating" part of Emu. It assumes a GPU is in scope and
/// focuses on simply using that GPU to accelerate. Here's an example of usage.
///
/// ```
/// # extern crate em;
/// # use em::*;
/// #[gpu_use] // removing this will effectively switch to "no GPU"
/// fn main() {
///     let mut data = vec![0.1; 1000];
///
///     gpu_do!(load(data)); // load data to the GPU
///     // now that data is loaded, we should not re-allocate it (by changing
///     // its size) in between launches, reads that use the data
///     gpu_do!(launch()); // launch the next thing encountered by the compiler
///     // the next thing is a for loop so Emu compiles it into a "kernel" and
///     // launches the kernel on the GPU
///     for i in 0..1000 {
///         data[i] = data[i] * 10.0;
///     }
///     gpu_do!(read(data)); // read data back from GPU
/// }
/// ```
/// Concretely, there are 3 (only 3 at the moment) commands to the GPU that
/// can be declared.
/// 1. Loading to the GPU with `gpu_do!(load(data))`
/// 2. Reading from the GPU with `gpu_do!(read(data))`
/// 3. Launching on the GPU with `gpu_do!(launch())`
///
/// Note that data must be an identifier. The only hard requirement for data is
/// that it must have the 2 following methods. 
/// - `fn as_slice(&self) -> &[f32]`
/// - `fn as_mut_slice(&mut self) -> &mut [f32]`
/// 
/// There is a soft requirement that the data should be representing a list of
/// `f32`s and indexing it with `data[i]` should return an `f32`. But this is
/// really just to ensure that when we lift code from CPU to GPU it is
/// functionally equivalent in a sane way. Also, note that no invocation of
/// `gpu_do!()` will ever expand to anything, unless the function it's being
/// used in is tagged with `#[gpu_use]
///
/// There is also a requirement that once data is loaded, it should not be
/// re-allocated on the CPU in-between launches, reads that make use of it.
/// So basically just make sure you don't resize it.
///
/// And in case the example doesn't make
/// this clear, `gpu_do!(launch())` basically attempts to launch the following
/// expression/piece of code on the GPU. Now, you can't just put any code you
/// want there. There is a very, very small subset of Rust code that can
/// be launched. Anything outside of this subset will result in a compile-time
/// error that will explain to you what was outside of the subset.
#[macro_export]
macro_rules! gpu_do {
    (load($i:ident)) => {};
    (read($i:ident)) => {};
    (launch()) => {};
}