Expand description
morton_encoding
is a crate that helps interleave the bits of numbers,
(called “co-ordinates” in the source code)
thereby creating a so-called “Morton encoding” (also known as a
“Z-order encoding”) in whose final result (“Key”)
all bits of the same significance are together.
This is helpful for linearising the points
of data structures whilst preserving
some measure of locality. Z-order encoding isn’t as good at preserving
locality as Hilbert encoding, but it is much easier to compute.
(For Hilbert encoding please refer to the
(lindel
crate instead.)
§Usage
The user is cordially advised to look for solutions in the following order:
- By far the most ergonomic, performant, and panic-free usage
is to use the
morton_encode
andmorton_decode
functions with arrays of primitive unsigned integers, as follows:
let input = [5u8, 4, 12, 129];
let output = morton_encode(input);
assert_eq!(output, 268447241);
let reassembled_input = morton_decode(output);
assert_eq!(input, reassembled_input);
-
If 128 bits are not enough, the user should use
lindel
, which contains thecreate_lineariseable_data_type
macro. This will unfortunately require manual implementation of any trait that the macro doesn’t implement automatically, such as subtraction. -
If the co-ordinates come at the end of a long iterator chain of length
<=8
, the user should use the_generic
functions, taking care to not compare Keys resulting from unequal-length iterators. -
Finally, if performance is of no concern at all, the user can make use of the
arbitrary_bit_size
module or of the_generic
functions for arbitrary iterator length.
§Contents
The functions contained herein are broadly split into four categories:
- “Bloating functions”, that interleave a number’s bits with zeroes;
- “Shrinking functions”, that perform the opposite operation;
- “Morton encoding functions”, that interleave some numbers’ bits with each other;
- “Morton decoding functions”, that perform the opposite operation.
For more detailed information, please refer to each individual function’s documentation.
§std
vs no_std
std
is entirely optional for this crate, and is only useful if
the user needs to use BigUint
s. Currently, the crate
(or its documentation, in any event) has currently been compiled with std
on, meaning that it can interleave arbitrary amounts of arbitrarily large integers, as long as the user only runs it within an operating system.
§Similar crates
A quick sift through crates.rs comes up with the following 3 similar crates:
morton
: Just as fast as ours, but only works for[u16; 2] <-> u32
.bitwise
:- On my machine it doesn’t even compile.
- It hasn’t been implemented for more than 3 dimensions.
- Its portable implementation, when copy-pasted, is 3.5x slower than ours.
- Its non-portable implementation uses the
core::arch::x86_64::_pdep_u64
primitive, so we wrote some extra implementations that used it and benchmarked them. To our surprise, it was consistently slightly slower than our own code!
zdex
: Despite not built with performance in mind, its speed is within a few orders of magnitude from ours, its results are almost correct, and the API ought to be usable with enough dedication and experimentation. It does, however, needstd
whereas ours doesn’t.
§Operation
This code works by splitting each coordinate’s bits in half, and pushing
them apart; then each half is split into quarters, and pushed apart again;
and so on until each bit is n-1
bits apart from its neighbours. From
there, the complete key is computed by a mere shift-and-bitwise-or.
A simplified version of this algorithm (and, in fact, the inspiration
for this crate) can be found here.
§Performance
This crate’s performace depends crucially on three things:
- The CPU’s ability to shift numbers by more than one bit at a time (which some microcontrollers can’t do)
- The compiler’s ability to utilise compile-time information in order to fold all the constants
- The result of the operations fitting within a single register.
Assuming, then, that the compiler can optimise to the best of its ability,
the code is reasonably fast, at least for an operation whose hardware
implementation only needs wires. For the transformation between
[u{N}; M] <-> u{N*M}
, the time needed is O(M*log(N))
.
The program generally needs about 20 machine instructions per coordinate,
though this can certainly vary.
A full list of the machine instructions needed for each transformation can be found below; each column lists the operation first, then the total amount of machine instructions, then the amount of machine instructions per co-ordinate rounded up.
u16 -> [u8; 2] : 23 12
[u8; 2] -> u16 : 27 14
u32 -> [u16; 2]: 28 14
u64 -> [u32; 2]: 32 16
[u16; 2] -> u32 : 34 17
[u8; 3] -> u32 : 41 14
u32 -> [u8; 3]: 48 16
[u32; 2] -> u64 : 48 24
[u8; 4] -> u32 : 55 14
u32 -> [u8; 4]: 55 14
[u16; 3] -> u64 : 56 19
u64 -> [u16; 3]: 62 21
[u8; 5] -> u64 : 64 13
u64 -> [u16; 4]: 68 17
[u16; 4] -> u64 : 71 18
u64 -> [u8; 5]: 72 15
[u8; 6] -> u64 : 79 14
u64 -> [u8; 6]: 87 15
[u64; 2] -> u128 : 93 47
[u8; 7] -> u64 : 94 14
u64 -> [u8; 7]: 102 15
[u8; 8] -> u64 : 102 13
u128 -> [u64; 2]: 110 55
u128 -> [u16; 5]: 131 27
u64 -> [u8; 8]: 132 17
[u32; 3] -> u128 : 142 48
u128 -> [u32; 3]: 142 48
[u8; 9] -> u128 : 149 17
[u32; 4] -> u128 : 152 38
u128 -> [u32; 4]: 153 39
[u16; 5] -> u128 : 159 32
u128 -> [u16; 6]: 185 31
[u8; 10] -> u128 : 190 19
u128 -> [u8; 9]: 194 22
[u16; 6] -> u128 : 213 36
u128 -> [u8; 10]: 216 22
u128 -> [u16; 7]: 217 31
[u16; 7] -> u128 : 230 33
[u8; 11] -> u128 : 244 23
u128 -> [u8; 11]: 247 23
[u16; 8] -> u128 : 253 32
u128 -> [u16; 8]: 255 32
u128 -> [u8; 12]: 266 23
[u8; 12] -> u128 : 268 23
[u8; 13] -> u128 : 285 22
u128 -> [u8; 13]: 287 23
u128 -> [u8; 14]: 308 22
[u8; 14] -> u128 : 318 23
u128 -> [u8; 15]: 332 23
[u8; 15] -> u128 : 361 25
u128 -> [u8; 16]: 382 24
[u8; 16] -> u128 : 419 27
As we can see, anything that doesn’t involve u128
s generally takes fewer
than 20 machine code instructions per co-ordinate; the actual spread
ranges between 12 and 24 instructions. For calculations that include
u128
s and above, the instructions needed increase linearly with the
bit length.
Modules§
- arbitrary_
bit_ size - The most general, but least performant, ways to perform Morton encoding and decoding. The functions contained herein work correctly, but the author makes absolutely no guarantees about how performant they are.
Traits§
- Ideal
Key - Given a data type and a number, it yields the smallest unsigned
integer that’s at least
N
times larger. - Valid
Key - Guarantees that a given data structure is suitable for a Morton Key.
Functions§
- bloat_
custom - “Bloats” a given number by interleaving its bits with zeroes.
- bloat_
custom_ checked - Fallibly “bloats” a given number, interleaving its bits with zeroes.
- morton_
decode - The most ergonomic way to perform the Morton decoding operation.
- morton_
decode_ array - Receives a
Key
value and unscrambles it into an array. - morton_
decode_ generic - Receives a
Key
value and returns an iterator ofCoor
values that were decoded from it. - morton_
encode - The most ergonomic way to perform the Morton encoding operation.
- morton_
encode_ array - Encodes an array of
N
Coordinates
into aKey
. - morton_
encode_ generic - Receives an iterator of
Coor
values and encodes them all in aKey
value. - morton_
encode_ generic_ checked - Receives an iterator of
Coor
values and encodes them all in aOption<Key>
value. - nz
- A convenience function.
- shrink_
custom - “Shrinks” a given number, only keeping a user-provided amount of its bits.
- shrink_
custom_ checked - Fallibly “shrinks” a given number, only keeping a certain amount of its bits.