[−][src]Crate ocl_convolution
OpenCL-accelerated 2D convolutions.
Convolution is a fundamental building block in signal processing. This crate is focused on 2D convolutions (i.e., the signal is a still image) in the context of deep learning (more precisely, convolutional neural networks). The second requirement means that the convolution filter may contain many (order of hundreds) filters; and the input may contain many channels (order of hundreds or thousands), rather than traditional 3 or 4. Computing such convolutions is computationally heavy and can be effectively accelerated with the help of OpenCL.
Features
The crate implements convolutions on two numerical formats:
- Single-precision floats (
f32
) - Signed 8-bit integers with 32-bit multiply-add accumulator (this format is frequently denoted
int8/32
in deep learning literature). Quantization parameters are applied uniformly to the entire layer.
For both cases, dilated or grouped convolutions are supported.
Implementation details
The implementation uses output-stationary workflow (see, e.g., this paper for the definition); that is, each element of the output tensor is computed in a single run of the OpenCL kernel. This minimizes memory overhead, but may not be the fastest algorithm.
Examples
Floating-point convolution
use ndarray::Array4; use rand::{Rng, thread_rng}; use ocl_convolution::{Convolution, FeatureMap, Params}; let convolution = Convolution::f32(3)?.build(Params { strides: [1, 1], pads: [0; 4], dilation: [1, 1], groups: 1, })?; // Generate random signal with 6x6 spatial dims and 3 channels. let mut rng = thread_rng(); let signal = Array4::from_shape_fn([1, 6, 6, 3], |_| rng.gen_range(-1.0, 1.0)); // Construct two 3x3 spatial filters. let filters = Array4::from_shape_fn([2, 3, 3, 3], |_| rng.gen_range(-1.0, 1.0)); // Perform the convolution. The output should have 4x4 spatial dims // and contain 2 channels (1 per each filter). The output layout will // be the same as in the signal. let output = convolution.compute( // `FeatureMap` wraps `ArrayView4` with information about // memory layout (which is "channels-last" / NHWC in this case). FeatureMap::nhwc(&signal), &filters, )?; assert_eq!(output.shape(), [1, 4, 4, 2]); // For increased efficiency, we may pin filter memory. // This is especially useful when the same filters are convolved // with multiple signals. let convolution = convolution.with_filters(&filters)?; let new_output = convolution.compute(FeatureMap::nhwc(&signal))?; assert_eq!(output, new_output);
Quantized convolution
use ndarray::Array4; use rand::{Rng, thread_rng}; use ocl_convolution::{Convolution, I8Params, FeatureMap, Params}; const BIT_SHIFT: u8 = 16; let params = I8Params { common: Params::default(), // These params are found by profiling; here, they are // chosen randomly. bit_shift: BIT_SHIFT, scale: I8Params::convert_scale(BIT_SHIFT, 0.1), output_bias: -10, signal_bias: 20, filter_bias: -5, }; let convolution = Convolution::i8(3)?.build(params)?; // Generate random signal with 6x6 spatial dims and 3 channels. let mut rng = thread_rng(); let signal = Array4::from_shape_fn([1, 6, 6, 3], |_| rng.gen_range(-127, 127)); // Construct two 3x3 spatial filters. let filters = Array4::from_shape_fn([2, 3, 3, 3], |_| rng.gen_range(-127, 127)); // Perform the convolution. The output should have 4x4 spatial dims // and contain 2 channels (1 per each filter). let output = convolution.compute( FeatureMap::nhwc(&signal), &filters, )?; assert_eq!(output.shape(), [1, 4, 4, 2]);
Structs
Convolution | Convolution without pinned memory. |
ConvolutionBuilder | Convolution builder. The same builder can be used to create multiple |
FeatureMap | Feature map, i.e., a signal or output of the convolution operation. |
FeatureMapShape | Shape of a |
FiltersConvolution | Convolution with pinned filters memory. |
I8Params | Params for the quantized convolution. |
Params | General convolution parameters. |
PinnedConvolution | Convolution with pinned memory for filters, signal and output. |
Enums
Layout | Memory layout of |
Traits
ConvElement | Supported element types for convolutions. |
WithParams | Type that can be associated with convolution parameters. |