Fast division by powers of 10, in 64-bit and 128-bit.
Division is slow. However it can be optimized if the divisor is divided repeatly.
One of the use cases is divided by powers of 10. For example in a decimal crate.
This crate provides easy APIs for division by powers of 10. You do not need to touch the magic numbers.
Algorithm
We provides 2 kinds of division.
The first is single-word. The divident is as wide as the divisor. The are both 64-bit or 128-bit.
We use the algorithm in Granlund & Montgomery's paper: Division by Invariant Integers using Multiplication. Here is a good tutorial for this algorithm.
The second is double-word. The divident is twice wide as the divisor, and the quotient is the same wide as the divisor. For example, the divident is 256-bit while the divisor and the quotient are both 128-bit. Or the divident is 128-bit while the divisor and the quotient are both 64-bit.
We use the algorithm in Moller and Granlund's paper: Improved division by invariant integers.
Performance
The performance improvement brought by this crate is highly dependent on the CPU model. For example, on some Intel CPUs where the division instruction is not well optimized, this crate may deliver up to a 10X performance boost. However, on Apple M-series chips, the improvement tends to be minimal. Therefore, it is strongly recommended that you run benchmarks on your own machine to determine whether to adopt this crate.
To run the benchmark: