Expand description

Transformer models.

Modules

ALBERT (Lan et al., 2020)

BERT (Devlin et al., 2018)

RoBERTa (Liu et al., 2018) and XLM-RoBERTa (Conneau et al., 2019).

Word embeddings with sinusoidal position embeddings.

SqueezeBERT (Iandola et al., 2020) + ALBERT (Lan et al., 2020)

SqueezeBERT (Iandola et al., 2020)

Structs

Hidden layer output and attention.

Enums

Output of a BERT layer.

Traits

Encoder networks.