1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
//! 2D convolution sub-dialect.
//!
//! ROADMAP H3 - Im2col/direct-conv decision by shape and memory
//! budget. Substrate ships `conv2d_3x3_direct`: direct 2D
//! convolution with a fixed 3x3 kernel and unit stride. Im2col
//! variant + the shape-driven decision wrapper land beside this
//! primitive once the direct-conv reference is verified.
//!
//! ## Why direct conv first
//!
//! Direct convolution is the algorithmic ground truth: every
//! element of the output is computed by the canonical sum
//! `out[y, x] = sum_{ky=0..3, kx=0..3} input[y+ky-1, x+kx-1] * kernel[ky, kx]`.
//! Im2col's contribution is to reshape this into a matmul that
//! reuses the existing `matmul` primitive's tile / vectorisation
//! work; the parity gate is "im2col output equals direct-conv
//! output". The direct-conv primitive provides that parity gate.
pub use conv2d_3x3_direct;
pub use im2col_3x3;
/// Decision wrapper: choose direct conv vs im2col + matmul based
/// on image area. Crossover threshold derived from a simple memory
/// vs compute tradeoff: im2col materialises an `H·W·9` patch matrix
/// (vs `H·W` for the input), so it pays an extra `8·H·W·sizeof(f32)`
/// of memory traffic. The matmul tile/vectorisation win recovers
/// that cost once the per-pixel work amortises across enough output
/// pixels - empirically the crossover is around 64x64 (4096
/// pixels). Below that threshold direct conv wins.
///
/// Returns the same Program as `conv2d_3x3_direct(input, kernel,
/// output, h, w)` regardless of the decision; the choice is
/// expressed via the Region's `generator` ident so a downstream
/// pass can route the dispatch differently if the runtime chooses
/// to honour the hint.
///
/// # Errors
///
/// Returns `Err` when `h * w` overflows `u32`.