A GumbelTopBucket is a bucket that can be used to draw from a discrete
distribution, similar to a softmax. The difference is that the GumbelTopBucket
uses a Gumbel distribution to add noise to the scores, and then draws from
the noisy scores. This is useful for performant sampling, as it does not
require the re-calculation of the softmax for each draw. The particular
feature of this bucket is that it will never draw the same index twice,
even if the scores are the same. This is useful for sampling without
replacement. It is important to note that this comes at a memory cost,
as we have to store a whole vector of noisy scores, on top of the original
scores.