1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
//! Tensor Unit pipeline engines.
//!
//! This module owns the **pipeline adjacency matrix**: all `CanApplyXxx`
//! marker traits and the `impl CanApplyXxx for PositionYyy {}` edges that
//! gate which source typestate can enter which engine.
//!
//! Each engine submodule owns its full surface: the `PositionXxx` marker and
//! `XxxTensor` type alias it produces, the `verify_*` helper, and the inherent
//! impl on `TuTensor<P: CanApplyXxx, ...>` that carries the entry method.
//!
//! # Pipeline graph
//!
//! Each `XxxTensor` block below lists *every* outgoing edge from that
//! typestate. The set is normative: it must equal the `impl CanApplyYyy for
//! PositionXxx {}` lines below — that is the wire-up. `commit` / `commit_view`
//! are only available from flit-normalized positions (Collect onwards); the
//! pre-Collect stages (Begin / Fetch / Switch) must go through `collect` first.
//!
//! ```text
//! BeginTensor (PositionBegin)
//! └── fetch → FetchTensor
//!
//! FetchTensor (PositionFetch)
//! ├── fetch_mask → FetchMaskTensor
//! ├── fetch_table_lookup → FetchTableLookupTensor
//! ├── fetch_cast → FetchCastTensor
//! ├── switch → SwitchTensor (fetch adapter skipped)
//! └── collect → CollectTensor (fetch adapter skipped)
//!
//! FetchMaskTensor (PositionFetchMask)
//! ├── fetch_table_lookup → FetchTableLookupTensor
//! ├── fetch_cast → FetchCastTensor
//! ├── switch → SwitchTensor
//! └── collect → CollectTensor
//!
//! FetchTableLookupTensor (PositionFetchTableLookup)
//! ├── fetch_cast → FetchCastTensor
//! ├── switch → SwitchTensor
//! └── collect → CollectTensor
//!
//! FetchCastTensor (PositionFetchCast)
//! ├── switch → SwitchTensor
//! └── collect → CollectTensor
//!
//! SwitchTensor (PositionSwitch)
//! └── collect → CollectTensor
//!
//! CollectTensor (PositionCollect)
//! ├── to_trf → TrfTensor
//! ├── to_vrf → VrfTensor
//! ├── contract_outer(trf)
//! │ → ContractOuterTensor ─contract_packet→ ContractPacketTensor
//! │ ─contract_time→ ContractTimeTensor ─contract_lane→ ContractTensor
//! ├── cast → CastTensor
//! ├── transpose → TransposeTensor
//! ├── vector_init → VectorInitTensor (handed to `crate::engine::vector`)
//! ├── commit_trim → CommitTrimTensor
//! ├── commit_cast → CommitCastTensor
//! ├── commit_valid_count_pack → CommitValidCountPackTensor
//! ├── commit → DmTensor
//! └── commit_view → (writes to existing view)
//!
//! ContractTensor (PositionContraction)
//! ├── cast → CastTensor
//! ├── transpose → TransposeTensor
//! ├── vector_init → VectorInitTensor
//! ├── commit_trim → CommitTrimTensor
//! ├── commit_cast → CommitCastTensor
//! ├── commit_valid_count_pack → CommitValidCountPackTensor
//! ├── commit → DmTensor
//! └── commit_view → (writes to existing view)
//!
//! VectorFinalTensor (PositionVectorFinal — produced by `VectorTensor::vector_final`)
//! ├── cast → CastTensor
//! ├── transpose → TransposeTensor
//! ├── to_vrf → VrfTensor
//! ├── commit_trim → CommitTrimTensor
//! ├── commit_cast → CommitCastTensor
//! ├── commit_valid_count_pack → CommitValidCountPackTensor
//! ├── commit → DmTensor
//! └── commit_view → (writes to existing view)
//!
//! CastTensor (PositionCast)
//! ├── transpose → TransposeTensor
//! ├── commit_trim → CommitTrimTensor
//! ├── commit_cast → CommitCastTensor
//! ├── commit_valid_count_pack → CommitValidCountPackTensor
//! ├── commit → DmTensor
//! └── commit_view → (writes to existing view)
//!
//! TransposeTensor (PositionTranspose)
//! ├── commit_trim → CommitTrimTensor
//! ├── commit_cast → CommitCastTensor
//! ├── commit_valid_count_pack → CommitValidCountPackTensor
//! ├── commit → DmTensor
//! └── commit_view → (writes to existing view)
//!
//! CommitTrimTensor (PositionCommitTrim)
//! ├── commit_cast → CommitCastTensor
//! ├── commit_valid_count_pack → CommitValidCountPackTensor
//! └── commit → DmTensor
//!
//! CommitCastTensor (PositionCommitCast)
//! └── commit → DmTensor
//!
//! CommitValidCountPackTensor (PositionCommitValidCountPack)
//! └── commit → DmTensor
//! ```
// Re-exports so `use crate::engine::*` (and the prelude) bring engine-facing
// types into scope.
pub use *;
pub use *;
pub use ;
pub use *;
pub use *;
pub use *;
pub use *;
pub use *;
use cratePositionVectorFinal;
use crate;
/// Size of a single flit in bytes.
///
/// Data flows through the switching network in flit-sized units.
/// Both the collect engine and cast engine normalize packets to exactly one flit.
pub const FLIT_BYTES: usize = 32;
pub
pub
// ============================================================================
// `CanApplyXxx` marker traits — pipeline adjacency.
//
// `impl CanApplyXxx for PositionYyy {}` reads as "the `Yyy` typestate can enter
// the `Xxx` engine". These are the *only* edges in the pipeline graph; adding
// or removing one here is how the topology changes.
// ============================================================================
/// Source positions that can enter the Fetch Sequencer stage.
/// Source positions that can enter the Fetch Adapter's masking stage.
/// Source positions that can enter the Fetch Adapter's table-lookup stage.
/// Source positions that can enter the Fetch Adapter's type-casting stage
/// (which also folds in zero-point subtraction at the hardware level).
/// Source positions that can enter the Switch Engine.
/// Source positions that can enter the Collect Engine.
/// Source positions that can store to the TRF.
/// Source positions that can store to the VRF.
/// Source positions that can enter the Outer stage (Contraction Engine entry).
/// Source positions that can enter the Vector Engine.
/// Source positions that can enter the Cast Engine.
/// Source positions that can enter the Transpose Engine.
/// Source positions that can enter the Commit Adapter's trimming stage.
/// Source positions that can enter the Commit Adapter's type-casting
/// stage (which folds in an optional ReLU at the hardware level).
/// Source positions that can enter the Commit Adapter's
/// valid-count-packing stage.
/// Source positions that can commit to data memory.
///
/// Only positions with a flit-normalized (32-byte) packet can commit — the
/// pre-Collect stages (`Begin`, `Fetch`, `Switch`) are excluded.
// Commit Adapter pipeline (per HW spec):
// Main: trim → cast(+ReLU) → commit
// Sub: trim → valid_count_pack → commit
//
// `trim` is mandatory and runs first: it is the only adapter stage reachable
// off the source engines. `cast` / `valid_count_pack` chain after `trim`, and
// `commit` / `commit_view` (the sequencer stage) are reachable only from a
// post-trim adapter position, so every commit is trimmed first.