Module patch

Expand description

Patch embedding for images.

Implements the patchification step from RFC-002 (Encoder Module).

Patch embedding is the first stage of a Vision Transformer: it converts a raw image into a sequence of learnable token vectors.

[B, C, H, W]  ──reshape──►  [B, grid_h·grid_w, C·patch_h·patch_w]  ──linear──►  [B, S, D]

Steps:

For video, see crate::video which uses 3-D tubelet embedding instead.

Structs§