Module sensor_encoder

Expand description

ViT sensor encoder with rectangular patch embedding and MAP pooling.

§Input / output contract

Tensor	Shape	Description
Input	`(B, T, C)`	Batch of normalised sensor sequences
Output	`(B, D)`	L2-normalised per-sample embeddings

where B = batch size, T = 1440 time steps, C = 34 channels, D = 768 embedding dimension.

The (T, C) sensor grid is divided into (T/ph, C/pw) non-overlapping rectangular patches of size (ph, pw) = (10, 2):

T = 1440 ──► 144 patches along time axis
C =   34 ──►  17 patches along channel axis  (ceil(34/2) = 17)
Total = 144 × 17 = 2448 patch tokens

Each patch is linearly projected to D = 768 via a Conv2d layer.

EncoderBlock: Pre-norm ViT transformer block.
EncoderBlockRecord: The record type for the module.
EncoderBlockRecordItem: The record item type for the module.
MAPHead: Pools a patch sequence to a single vector via a learnable probe.
MAPHeadRecord: The record type for the module.
MAPHeadRecordItem: The record item type for the module.
MlpBlock: Feed-forward MLP: Linear(D, mlp_dim) → GELU → Dropout → Linear(mlp_dim, D).
MlpBlockRecord: The record type for the module.
MlpBlockRecordItem: The record item type for the module.
MultiHeadSelfAttention: Scaled dot-product multi-head self-attention with optional chunked computation.
MultiHeadSelfAttentionRecord: The record type for the module.
MultiHeadSelfAttentionRecordItem: The record item type for the module.
PatchEmbedding: Projects rectangular sensor patches into the ViT embedding space.
PatchEmbeddingRecord: The record type for the module.
PatchEmbeddingRecordItem: The record item type for the module.
SensorEncoder: Vision Transformer sensor encoder.
SensorEncoderRecord: The record type for the module.
SensorEncoderRecordItem: The record item type for the module.