Expand description
ViT sensor encoder with rectangular patch embedding and MAP pooling.
§Input / output contract
| Tensor | Shape | Description |
|---|---|---|
| Input | (B, T, C) | Batch of normalised sensor sequences |
| Output | (B, D) | L2-normalised per-sample embeddings |
where B = batch size, T = 1440 time steps, C = 34 channels,
D = 768 embedding dimension.
§Patch grid
The (T, C) sensor grid is divided into (T/ph, C/pw) non-overlapping
rectangular patches of size (ph, pw) = (10, 2):
T = 1440 ──► 144 patches along time axis
C = 34 ──► 17 patches along channel axis (ceil(34/2) = 17)
Total = 144 × 17 = 2448 patch tokensEach patch is linearly projected to D = 768 via a Conv2d layer.
Structs§
- Encoder
Block - Pre-norm ViT transformer block.
- Encoder
Block Record - The record type for the module.
- Encoder
Block Record Item - The record item type for the module.
- MAPHead
- Pools a patch sequence to a single vector via a learnable probe.
- MAPHead
Record - The record type for the module.
- MAPHead
Record Item - The record item type for the module.
- MlpBlock
- Feed-forward MLP:
Linear(D, mlp_dim) → GELU → Dropout → Linear(mlp_dim, D). - MlpBlock
Record - The record type for the module.
- MlpBlock
Record Item - The record item type for the module.
- Multi
Head Self Attention - Scaled dot-product multi-head self-attention with optional chunked computation.
- Multi
Head Self Attention Record - The record type for the module.
- Multi
Head Self Attention Record Item - The record item type for the module.
- Patch
Embedding - Projects rectangular sensor patches into the ViT embedding space.
- Patch
Embedding Record - The record type for the module.
- Patch
Embedding Record Item - The record item type for the module.
- Sensor
Encoder - Vision Transformer sensor encoder.
- Sensor
Encoder Record - The record type for the module.
- Sensor
Encoder Record Item - The record item type for the module.
Functions§
- l2_
normalize - L2-normalise each row of
(B, D)to unit norm.