Creates unsigned and signed division functions optimized for dividing integers with the same
bitwidth as the largest operand in an asymmetrically sized division. For example, x86-64 has an
assembly instruction that can divide a 128 bit integer by a 64 bit integer if the quotient fits
in 64 bits. The 128 bit version of this algorithm would use that fast hardware division to
construct a full 128 bit by 128 bit division.
Creates unsigned and signed division functions that use binary long division, designed for
computer architectures without division instructions. These functions have good performance for
microarchitectures with large branch miss penalties and architectures without the ability to
predicate instructions. For architectures with predicated instructions, one of the algorithms
described in the documentation of these functions probably has higher performance, and a custom
assembly routine should be used instead.
Creates unsigned and signed division functions that use a combination of hardware division and
binary long division to divide integers larger than what hardware division by itself can do. This
function is intended for microarchitectures that have division hardware, but not fast enough
multiplication hardware for impl_trifecta to be faster.
Creates unsigned and signed division functions optimized for division of integers with bitwidths
larger than the largest hardware integer division supported. These functions use large radix
division algorithms that require both fast division and very fast widening multiplication on the
target microarchitecture. Otherwise, impl_delegate should be used instead.