Expand description
Patch embedding for images.
Implements the patchification step from RFC-002 (Encoder Module).
Patch embedding is the first stage of a Vision Transformer: it converts a raw image into a sequence of learnable token vectors.
[B, C, H, W] ──reshape──► [B, grid_h·grid_w, C·patch_h·patch_w] ──linear──► [B, S, D]Steps:
- Divide the image into non-overlapping patches of size
(patch_h, patch_w). - Flatten each patch to a vector of length
C × patch_h × patch_w. - Project through a learned linear layer to
embed_dim.
For video, see crate::video which uses 3-D tubelet embedding instead.
Structs§
- Patch
Embedding - Patch embedding module.
- Patch
Embedding Config - Configuration for patch embedding.
- Patch
Embedding Record - The record type for the module.
- Patch
Embedding Record Item - The record item type for the module.