Expand description

OpenCL-accelerated 2D convolutions.

Convolution is a fundamental building block in signal processing. This crate is focused on 2D convolutions (i.e., the signal is a still image) in the context of deep learning (more precisely, convolutional neural networks). The second requirement means that the convolution filter may contain many (order of hundreds) filters; and the input may contain many channels (order of hundreds or thousands), rather than traditional 3 or 4. Computing such convolutions is computationally heavy and can be effectively accelerated with the help of OpenCL.

Features

The crate implements convolutions on two numerical formats:

  • Single-precision floats (f32)
  • Signed 8-bit integers with 32-bit multiply-add accumulator (this format is frequently denoted int8/32 in deep learning literature). Quantization parameters are applied uniformly to the entire layer.

For both cases, dilated or grouped convolutions are supported.

Implementation details

The implementation uses output-stationary workflow (see, e.g., this paper for the definition); that is, each element of the output tensor is computed in a single run of the OpenCL kernel. This minimizes memory overhead, but may not be the fastest algorithm.

Examples

Floating-point convolution

use ndarray::Array4;
use rand::{Rng, thread_rng};
use ocl_convolution::{Convolution, FeatureMap, Params};

let convolution = Convolution::f32(3)?.build(Params {
    strides: [1, 1],
    pads: [0; 4],
    dilation: [1, 1],
    groups: 1,
})?;

// Generate random signal with 6x6 spatial dims and 3 channels.
let mut rng = thread_rng();
let signal = Array4::from_shape_fn([1, 6, 6, 3], |_| rng.gen_range(-1.0..=1.0));
// Construct two 3x3 spatial filters.
let filters = Array4::from_shape_fn([2, 3, 3, 3], |_| rng.gen_range(-1.0..=1.0));
// Perform the convolution. The output must have 4x4 spatial dims
// and contain 2 channels (1 per each filter). The output layout will
// be the same as in the signal.
let output = convolution.compute(
    // `FeatureMap` wraps `ArrayView4` with information about
    // memory layout (which is "channels-last" / NHWC in this case).
    FeatureMap::nhwc(&signal),
    &filters,
)?;
assert_eq!(output.shape(), [1, 4, 4, 2]);

// For increased efficiency, we may pin filter memory.
// This is especially useful when the same filters are convolved
// with multiple signals.
let convolution = convolution.with_filters(&filters)?;
let new_output = convolution.compute(FeatureMap::nhwc(&signal))?;
assert_eq!(output, new_output);

Quantized convolution

use ndarray::Array4;
use rand::{Rng, thread_rng};
use ocl_convolution::{Convolution, I8Params, FeatureMap, Params};

const BIT_SHIFT: u8 = 16;
let params = I8Params {
    common: Params::default(),
    // These params are found by profiling; here, they are
    // chosen randomly.
    bit_shift: BIT_SHIFT,
    scale: I8Params::convert_scale(BIT_SHIFT, 0.1),
    output_bias: -10,
    signal_bias: 20,
    filter_bias: -5,
};
let convolution = Convolution::i8(3)?.build(params)?;

// Generate random signal with 6x6 spatial dims and 3 channels.
let mut rng = thread_rng();
let signal = Array4::from_shape_fn([1, 6, 6, 3], |_| rng.gen_range(-127..=127));
// Construct two 3x3 spatial filters.
let filters = Array4::from_shape_fn([2, 3, 3, 3], |_| rng.gen_range(-127..=127));
// Perform the convolution. The output must have 4x4 spatial dims
// and contain 2 channels (1 per each filter).
let output = convolution.compute(
    FeatureMap::nhwc(&signal),
    &filters,
)?;
assert_eq!(output.shape(), [1, 4, 4, 2]);

Structs

Convolution without pinned memory.

Convolution builder. The same builder can be used to create multiple Convolutions which share the same spatial size.

Feature map, i.e., a signal or output of the convolution operation.

Convolution with pinned filters memory. Pinning memory increases efficiency at the cost of making the convolution less flexible.

Params for the quantized convolution.

General convolution parameters.

Convolution with pinned memory for filters, signal and output. Pinning memory increases efficiency at the cost of making the convolution less flexible.

Enums

Memory layout of a FeatureMap.

Traits

Supported element types for convolutions.