What is it?
image-conv applies convolution filters to images — the mathematical operation behind edge detection, blurring, sharpening, and denoising. A small matrix (kernel) slides over every pixel, multiplying overlapping values and summing the results.
Input patch (3×3) Kernel (Sobel-X) Output pixel
┌────┬────┬────┐ ┌────┬────┬────┐
│ a │ b │ c │ │ +1 │ 0 │ −1 │ out = a·1 + b·0 + c·(−1)
├────┼────┼────┤ ⊙ ├────┼────┼────┤ + d·2 + e·0 + f·(−2)
│ d │ e │ f │ │ +2 │ 0 │ −2 │ + g·1 + h·0 + i·(−1)
├────┼────┼────┤ ├────┼────┼────┤
│ g │ h │ i │ │ +1 │ 0 │ −1 │ → clamped to [0, 255]
└────┴────┴────┘ └────┴────┴────┘
Performance
Three optimisation tiers are applied automatically — no user configuration needed.
Benchmark Results (4-core machine)
| Kernel | Size | Original | Current | Speedup |
|---|---|---|---|---|
| Sobel-X | 3×3 | 58.4 ms | 18.5 ms | 3.2× |
| Laplacian | 3×3 | 58.2 ms | 26.6 ms | 2.2× |
| Gaussian | 7×7 | 202.7 ms | 33.3 ms | 6.1× |
| Gaussian | 9×9 | 320.7 ms | 33.9 ms | 9.5× |
| Gaussian | 15×15 | 836.5 ms | 46.4 ms | 18.0× |
Image size: 1000×1000, stride=1, no padding (worst-case). Lower is better.
How the speedups work
| Tier | Technique | What it does | Typical gain |
|---|---|---|---|
| 1 | Separable detection | Decomposes Gaussian/Sobel kernels into two 1D passes: O(N²) → O(2N) ops per pixel | 3–9× |
| 2 | Rayon threading | Parallelises over output rows across all CPU cores | ~2–4× (scales with cores) |
| 3 | Clean inner loops | Branch-free iteration, direct memory access, pre-allocated buffers | ~10–20% |
The tiers compound — a 15×15 Gaussian gets 8.8× from separable + 2× from rayon = ~18× total.
Example outputs
| Original |
|---|
![]() |
| Sobel-X | Sobel-Y |
|---|---|
![]() |
![]() |
| Scharr-X | Scharr-Y |
|---|---|
![]() |
![]() |
| Laplacian | Median (3×3) |
|---|---|
![]() |
![]() |
| Gaussian (7×7) | Denoise (5×5) |
|---|---|
![]() |
![]() |
Quick start
Add to your Cargo.toml:
[]
= "1.0"
= "0.3"
use conv;
use ;
use ;
API reference
Filter
A 2D convolution kernel stored as a flat row-major Vec<f32>.
// Create from flat buffer (row 0, then row 1, ...)
let kernel = vec!;
let filter = from; // width=3, height=3
// Or create zero-initialised
let mut f = new;
f.set_value_at_pos;
// Inspect
println!; // 5
println!; // 5
f.display; // pretty-print as table
PaddingType
Controls border handling:
| Variant | Behaviour | Output size |
|---|---|---|
UNIFORM(n) |
Pad n black pixels on all sides |
(in − filter + 2·n) / stride + 1 |
NONE |
No padding, output shrinks | (in − filter) / stride + 1 |
UNIFORM(1): NONE:
┌···········┐ ┌─────────┐
┆ 0 0 0 0 0 ┆ │ * * * * │ filter 3×3
┆ 0 * * * 0 ┆ │ * * * * │ input 5×5 → output 3×3
┆ 0 * * * 0 ┆ │ * * * * │
┆ 0 * * * 0 ┆ └─────────┘
┆ 0 0 0 0 0 ┆
└···········┘
input 5×5 → output 5×5
conv::convolution()
stride = 0terminates the process with an error.- Stride values that don't evenly divide
(input − filter + 2·pad)produce a warning.
How it works
convolution(img, filter, stride, padding)
│
├─ stride = 0? → ERROR
│
├─ try_separable()
│ ├─ Yes → separable_convolve() 2× 1D pass (O(fw+fh) per pixel)
│ │ ├─ Horizontal: 1D conv per row (rayon-parallel)
│ │ └─ Vertical: 1D conv per column (rayon-parallel)
│ │
│ └─ No → convolve() 1× 2D pass (O(fw·fh) per pixel)
│ └─ Standard 2D sliding window (rayon-parallel)
│
└─ padding
├─ UNIFORM(n) → pad image with black border, then convolve
└─ NONE → convolve directly (zero-copy, faster)
Separable kernel decomposition
Many common kernels factor into an outer product of two 1D vectors:
[ 1 2 1 ] [ 1 ]
[ 2 4 2 ] = [ 2 ] × [ 1 2 1 ]
[ 1 2 1 ] [ 1 ]
3×3 kernel col row
3×1 1×3
Instead of 9 ops per pixel: two 1D passes = 6 ops (1.5× faster)
Scale: 7×7 kernel: 49 ops → 14 ops (3.5× faster)
The library auto-detects separable kernels by finding the largest absolute value, extracting its row and column, and verifying the outer product matches every element within floating-point tolerance.
Threading model
Each output row is completely independent — zero data dependencies between rows. Rayon splits the output buffer into per-row slices and processes them in parallel across all available CPU cores.
Output buffer (H rows × W cols × 4 bytes RGBA)
┌──────────────────────────────────────┐
│ Row 0 ──▶ Thread 0 │
│ Row 1 ──▶ Thread 1 │ All threads read
│ Row 2 ──▶ Thread 2 │ from the same input
│ ... │ image (immutable).
│ Row H−1 ──▶ Thread N │ No locks, no contention.
└──────────────────────────────────────┘
Conversion helpers
// PhotonImage ↔ image::DynamicImage
let dyn_img: DynamicImage = photon_to_dynamic;
let photon_img: PhotonImage = dynamic_to_photon;
Project structure
src/
├── lib.rs # Filter struct, PaddingType enum, separable detection,
│ # DynamicImage ↔ PhotonImage conversion helpers
└── conv.rs # Convolution engine: convolve(), separable_convolve(),
# convolution() — with rayon + documentation
tests/
├── conv_test.rs # Integration tests (all filter types + separable correctness)
└── filter_tests.rs # Unit tests (Filter init, element assignment)
benches/
└── conv_bench.rs # Criterion benchmarks (3×3 through 15×15, separable vs 2D)
Running benchmarks
Results are stored in target/criterion/. Open target/criterion/report/index.html for interactive charts.
Contributing
- Open an issue describing your proposed change
- Fork, develop, and test (
cargo test --release) - Run
cargo fmtbefore submitting - Submit a pull request
License
Apache 2.0 — see LICENSE.








