Skip to main content

Module patch

Module patch 

Source
Expand description

Patch embedding for images.

Implements the patchification step from RFC-002 (Encoder Module).

Patch embedding is the first stage of a Vision Transformer: it converts a raw image into a sequence of learnable token vectors.

[B, C, H, W]  ──reshape──►  [B, grid_h·grid_w, C·patch_h·patch_w]  ──linear──►  [B, S, D]

Steps:

  1. Divide the image into non-overlapping patches of size (patch_h, patch_w).
  2. Flatten each patch to a vector of length C × patch_h × patch_w.
  3. Project through a learned linear layer to embed_dim.

For video, see crate::video which uses 3-D tubelet embedding instead.

Structs§

PatchEmbedding
Patch embedding module.
PatchEmbeddingConfig
Configuration for patch embedding.
PatchEmbeddingRecord
The record type for the module.
PatchEmbeddingRecordItem
The record item type for the module.