Crate morton_encoding[−][src]
morton_encoding
is a crate that helps interleave the bits of numbers,
(called “coordinates” in the source code)
thereby creating a socalled “Morton encoding” (also known as a
“Zorder encoding”) in whose final result (“Key”)
all bits of the same significance are together.
This is helpful for linearising the points
of data structures whilst preserving
some measure of locality. Zorder encoding isn’t as good at preserving
locality as Hilbert encoding, but it is much easier to compute;
nonetheless, for Hilbert encoding, the user may check out the lindel
crate which will be hopefully released soon.
Usage
The user is cordially advised to look for solutions in the following order:
 By far the most ergonomic, performant, and panicfree usage
is to use the
morton_encode
andmorton_decode
functions with arrays of primitive unsigned integers, as follows:
let input = [5u8, 4, 12, 129]; let output = morton_encode(input); assert_eq!(output, 268447241); let reassembled_input = morton_decode(output); assert_eq!(input, reassembled_input);

If 128 bits are not enough, the user should use the
lindel
crate (which will be released hopefully shortly) which will contain thecreate_lineariseable_data_type
macro. This will unfortunately require manual implementation of any trait that the macro doesn’t implement automatically, such as subtraction. 
If the coordinates come at the end of a long iterator chain of length
<=8
, the user should use the_generic
functions, taking care to not compare Keys resulting from unequallength iterators. 
Finally, if performance is of no concern at all, the user can make use of the
arbitrary_bit_size
module or of the_generic
functions for arbitrary iterator length.
Contents
The functions contained herein are broadly split into four categories:
 “Bloating functions”, that interleave a number’s bits with zeroes;
 “Shrinking functions”, that perform the opposite operation;
 “Morton encoding functions”, that interleave some numbers’ bits with each other;
 “Morton decoding functions”, that perform the opposite operation.
For more detailed information, please refer to each individual function’s documentation.
std
vs no_std
std
is entirely optional for this crate, and is only useful if
the user needs to use BigUint
s. Currently, the crate
(or its documentation, in any event) has currently been compiled with std
on, meaning that it can interleave arbitrary amounts of arbitrarily large integers, as long as the user only runs it within an operating system.
Similar crates
A quick sift through crates.rs comes up with the following 3 similar crates:
morton
: Just as fast as ours, but only works for[u16; 2] <> u32
.bitwise
: On my machine it doesn’t even compile.
 It hasn’t been implemented for more than 3 dimensions.
 Its portable implementation, when copypasted, is 3.5x slower than ours.
 Its nonportable implementation uses the
core::arch::x86_64::_pdep_u64
primitive, so we wrote some extra implementations that used it and benchmarked them. To our surprise, it was consistently slightly slower than our own code!
zdex
: Despite not built with performance in mind, its speed is within a few orders of magnitude from ours, its results are almost correct, and the API ought to be usable with enough dedication and experimentation. It does, however, needstd
whereas ours doesn’t.
Operation
This code works by splitting each coordinate’s bits in half, and pushing
them apart; then each half is split into quarters, and pushed apart again;
and so on until each bit is n1
bits apart from its neighbours. From
there, the complete key is computed by a mere shiftandbitwiseor.
A simplified version of this algorithm (and, in fact, the inspiration
for this crate) can be found here.
Performance
This crate’s performace depends crucially on three things:
 The CPU’s ability to shift numbers by more than one bit at a time (which some microcontrollers can’t do)
 The compiler’s ability to utilise compiletime information in order to fold all the constants
 The result of the operations fitting within a single register.
Assuming, then, that the compiler can optimise to the best of its ability,
the code is reasonably fast, at least for an operation whose hardware
implementation only needs wires. For the transformation between
[u{N}; M] <> u{N*M}
, the time needed is O(M*log(N))
.
The program generally needs about 20 machine instructions per coordinate,
though this can certainly vary.
A full list of the machine instructions needed for each transformation can be found below; each column lists the operation first, then the total amount of machine instructions, then the amount of machine instructions per coordinate rounded up.
u16 > [u8; 2] : 23 12
[u8; 2] > u16 : 27 14
u32 > [u16; 2]: 28 14
u64 > [u32; 2]: 32 16
[u16; 2] > u32 : 34 17
[u8; 3] > u32 : 41 14
u32 > [u8; 3]: 48 16
[u32; 2] > u64 : 48 24
[u8; 4] > u32 : 55 14
u32 > [u8; 4]: 55 14
[u16; 3] > u64 : 56 19
u64 > [u16; 3]: 62 21
[u8; 5] > u64 : 64 13
u64 > [u16; 4]: 68 17
[u16; 4] > u64 : 71 18
u64 > [u8; 5]: 72 15
[u8; 6] > u64 : 79 14
u64 > [u8; 6]: 87 15
[u64; 2] > u128 : 93 47
[u8; 7] > u64 : 94 14
u64 > [u8; 7]: 102 15
[u8; 8] > u64 : 102 13
u128 > [u64; 2]: 110 55
u128 > [u16; 5]: 131 27
u64 > [u8; 8]: 132 17
[u32; 3] > u128 : 142 48
u128 > [u32; 3]: 142 48
[u8; 9] > u128 : 149 17
[u32; 4] > u128 : 152 38
u128 > [u32; 4]: 153 39
[u16; 5] > u128 : 159 32
u128 > [u16; 6]: 185 31
[u8; 10] > u128 : 190 19
u128 > [u8; 9]: 194 22
[u16; 6] > u128 : 213 36
u128 > [u8; 10]: 216 22
u128 > [u16; 7]: 217 31
[u16; 7] > u128 : 230 33
[u8; 11] > u128 : 244 23
u128 > [u8; 11]: 247 23
[u16; 8] > u128 : 253 32
u128 > [u16; 8]: 255 32
u128 > [u8; 12]: 266 23
[u8; 12] > u128 : 268 23
[u8; 13] > u128 : 285 22
u128 > [u8; 13]: 287 23
u128 > [u8; 14]: 308 22
[u8; 14] > u128 : 318 23
u128 > [u8; 15]: 332 23
[u8; 15] > u128 : 361 25
u128 > [u8; 16]: 382 24
[u8; 16] > u128 : 419 27
As we can see, anything that doesn’t involve u128
s generally takes fewer
than 20 machine code instructions per coordinate; the actual spread
ranges between 12 and 24 instructions. For calculations that include
u128
s and above, the instructions needed increase linearly with the
bit length.
Modules
arbitrary_bit_size  The most general, but least performant, ways to perform Morton encoding and decoding. The functions contained herein work correctly, but the author makes absolutely no guarantees about how performant they are. 
Traits
IdealKey  Given a data type and a number, it yields the smallest unsigned
integer that’s at least 
ValidKey  Guarantees that a given data structure is suitable for a Morton Key. 
Functions
bloat_custom  “Bloats” a given number by interleaving its bits with zeroes. 
bloat_custom_checked  Fallibly “bloats” a given number, interleaving its bits with zeroes. 
morton_decode  The most ergonomic way to perform the Morton decoding operation. 
morton_decode_array  Receives a 
morton_decode_generic  Receives a 
morton_encode  The most ergonomic way to perform the Morton encoding operation. 
morton_encode_array  Encodes an array of 
morton_encode_generic  Receives an iterator of 
morton_encode_generic_checked  Receives an iterator of 
nz  A convenience function. 
shrink_custom  “Shrinks” a given number, only keeping a userprovided amount of its bits. 
shrink_custom_checked  Fallibly “shrinks” a given number, only keeping a certain amount of its bits. 