Module pointer_network

Expand description

Pointer Network.

Reference: Vinyals, O., Fortunato, M. & Jaitly, N. (2015). Pointer Networks. NeurIPS 28 (arXiv 1506.03134). https://arxiv.org/abs/1506.03134.

§Model

A Pointer Network is a sequence-to-sequence model whose attention mechanism points to positions in the input rather than emitting a token from a fixed output vocabulary. This makes the output vocabulary equal to the (variable) input length n, which is exactly what combinatorial tasks such as sorting, convex hull and the travelling-salesman problem require.

Given encoder hidden states e_1 … e_n and a decoder query d_i, the content-based attention score for pointing at input position j is

u^i_j = vᵀ tanh(W1 e_j + W2 d_i)

and the pointer distribution over input positions is p^i = softmax(u^i). Greedy decoding emits argmax_j p^i_j at each step.

Here the encoder states are provided directly (or produced by a minimal Elman/tanh RNN encoder, PointerNetwork::encode) and the decoder queries are likewise provided per step, so the module is a faithful CPU reference for the pointer attention head and its training objective (teacher-forced NLL with a finite-difference-verified gradient) without committing to any one recurrent cell. Production code never panics: all fallible paths return SeqError.

Structs§

PointerGrad: Gradients of the teacher-forced NLL with respect to the attention parameters.
PointerNetwork: A Pointer Network attention head with optional Elman encoder.