Skip to main content

Module data2vec

Module data2vec 

Source
Expand description

data2vec — Baevski et al. 2022, ICML.

Unified self-supervised learning via teacher-student masked prediction:

  1. A teacher network (EMA of the student) encodes the full, unmasked input and produces target representations.
  2. The student encoder receives the masked input and predicts the teacher’s representations at masked positions.
  3. The loss is the smooth-L1 (Huber) divergence between L2-normalised student predictions and L2-normalised teacher targets, summed only over masked tokens.
 θ_teacher ← m · θ_teacher + (1−m) · θ_student       [EMA update]
 target_j  ← target_j / (‖target[:,j]‖₂ + ε)         [per-dim batch norm]
 L          = mean huber(student_pred − target, β)     [masked positions only]

Reference: “data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language”, Baevski et al., ICML 2022.

Structs§

Data2VecConfig
Hyper-parameters for the data2vec training objective.
Data2VecResult
Output of a single data2vec loss computation.
Data2VecState
Mutable state that tracks the teacher EMA parameter vector and training step.

Functions§

data2vec_batch_loss
Compute the mean data2vec loss over a batch of samples.
data2vec_loss
Compute the data2vec loss for a single sample.
data2vec_mask
Generate a boolean mask of length n_tokens with exactly floor(n_tokens × mask_ratio) positions set to true (= masked).
huber_loss
Per-element Huber (smooth-L1) loss, averaged over all elements.
normalize_teacher_targets
Normalise teacher representations along the batch dimension in-place.