Expand description
Pointer Network.
Reference: Vinyals, O., Fortunato, M. & Jaitly, N. (2015). Pointer Networks. NeurIPS 28 (arXiv 1506.03134). https://arxiv.org/abs/1506.03134.
§Model
A Pointer Network is a sequence-to-sequence model whose attention mechanism
points to positions in the input rather than emitting a token from a fixed
output vocabulary. This makes the output vocabulary equal to the (variable)
input length n, which is exactly what combinatorial tasks such as sorting,
convex hull and the travelling-salesman problem require.
Given encoder hidden states e_1 … e_n and a decoder query d_i, the
content-based attention score for pointing at input position j is
u^i_j = vᵀ tanh(W1 e_j + W2 d_i)and the pointer distribution over input positions is
p^i = softmax(u^i). Greedy decoding emits argmax_j p^i_j at each step.
Here the encoder states are provided directly (or produced by a minimal
Elman/tanh RNN encoder, PointerNetwork::encode) and the decoder queries
are likewise provided per step, so the module is a faithful CPU reference for
the pointer attention head and its training objective (teacher-forced NLL with
a finite-difference-verified gradient) without committing to any one recurrent
cell. Production code never panics: all fallible paths return SeqError.
Structs§
- Pointer
Grad - Gradients of the teacher-forced NLL with respect to the attention parameters.
- Pointer
Network - A Pointer Network attention head with optional Elman encoder.