Skip to main content

Module sensor_encoder

Module sensor_encoder 

Source
Expand description

ViT sensor encoder with rectangular patch embedding and MAP pooling.

§Input / output contract

TensorShapeDescription
Input(B, T, C)Batch of normalised sensor sequences
Output(B, D)L2-normalised per-sample embeddings

where B = batch size, T = 1440 time steps, C = 34 channels, D = 768 embedding dimension.

§Patch grid

The (T, C) sensor grid is divided into (T/ph, C/pw) non-overlapping rectangular patches of size (ph, pw) = (10, 2):

T = 1440 ──► 144 patches along time axis
C =   34 ──►  17 patches along channel axis  (ceil(34/2) = 17)
Total = 144 × 17 = 2448 patch tokens

Each patch is linearly projected to D = 768 via a Conv2d layer.

Structs§

EncoderBlock
Pre-norm ViT transformer block.
EncoderBlockRecord
The record type for the module.
EncoderBlockRecordItem
The record item type for the module.
MAPHead
Pools a patch sequence to a single vector via a learnable probe.
MAPHeadRecord
The record type for the module.
MAPHeadRecordItem
The record item type for the module.
MlpBlock
Feed-forward MLP: Linear(D, mlp_dim) → GELU → Dropout → Linear(mlp_dim, D).
MlpBlockRecord
The record type for the module.
MlpBlockRecordItem
The record item type for the module.
MultiHeadSelfAttention
Scaled dot-product multi-head self-attention with optional chunked computation.
MultiHeadSelfAttentionRecord
The record type for the module.
MultiHeadSelfAttentionRecordItem
The record item type for the module.
PatchEmbedding
Projects rectangular sensor patches into the ViT embedding space.
PatchEmbeddingRecord
The record type for the module.
PatchEmbeddingRecordItem
The record item type for the module.
SensorEncoder
Vision Transformer sensor encoder.
SensorEncoderRecord
The record type for the module.
SensorEncoderRecordItem
The record item type for the module.

Functions§

l2_normalize
L2-normalise each row of (B, D) to unit norm.