Skip to main content

Module conv

Module conv 

Source
Expand description

Image Convolution Engine

§What is convolution?

Convolution slides a filter kernel over an image, computing a weighted sum
at each position. Each output pixel is the dot product of the kernel with
the underlying image patch:

  Input image patch         Kernel           Output pixel
  ┌────┬────┬────┐      ┌────┬────┬────┐
  │ a  │ b  │ c  │      │ k₀ │ k₁ │ k₂ │     out = a·k₀ + b·k₁ + c·k₂
  ├────┼────┼────┤   ⊙  ├────┼────┼────┤        + d·k₃ + e·k₄ + f·k₅
  │ d  │ e  │ f  │      │ k₃ │ k₄ │ k₅ │        + g·k₆ + h·k₇ + i·k₈
  ├────┼────┼────┤      ├────┼────┼────┤
  │ g  │ h  │ i  │      │ k₆ │ k₇ │ k₈ │     Then clamped to [0, 255]
  └────┴────┴────┘      └────┴────┴────┘

The kernel "slides" horizontally then vertically across the whole image,
producing one output pixel per valid position.

§Three-tier performance

                   ┌──────────────────┐
                   │  try_separable()  │
                   └────────┬─────────┘
                            │
              ┌─────────────┴─────────────┐
              │                           │
         Separable                     Not separable
              │                           │
              ▼                           ▼
  ┌──────────────────────┐    ┌──────────────────────┐
  │  separable_convolve  │    │      convolve        │
  │  (two 1D passes)     │    │    (one 2D pass)      │
  │                      │    │                      │
  │  Input ────┬──────── │    │  Input image          │
  │            │horizontal│    │  7×7 kernel = 49 ops │
  │            ▼   pass   │    │  per output pixel     │
  │        Temp buffer    │    │                      │
  │   (width reduced by   │    └──────────────────────┘
  │    filter-1+2·pad)    │
  │            │          │    Legend:
  │            │vertical  │    7×7 separable = 14 ops
  │            ▼  pass    │    per pixel — 3.5× faster
  │        Output image   │
  └──────────────────────┘

  Both paths are parallelised over output rows via rayon.

§Threading model

Each output row is completely independent — zero data dependencies. Rayon splits the output buffer into per-row slices and processes them in parallel across all available cores.

  Output buffer (hc rows × wc cols × 4 bytes RGBA)
  ┌──────────────────────────────────────┐
  │ Row 0 ──────▶ Thread 0               │
  │ Row 1 ──────▶ Thread 1               │  Output rows are
  │ Row 2 ──────▶ Thread 2               │  processed in
  │ Row 3 ──────▶ Thread 3               │  parallel — each
  │   ...                                │  thread reads from
  │ Row hc-1 ───▶ Thread N               │  the input image
  └──────────────────────────────────────┘  (immutable, safe)

§Output size formula

output_width  = (W - Fw + 2·P) / S + 1
output_height = (H - Fh + 2·P) / S + 1

Functions§

convolution
Applies convolution to an image using the given filter.