Crate morton_encoding[][src]

morton_encoding is a crate that helps interleave the bits of numbers, (called “co-ordinates” in the source code) thereby creating a so-called “Morton encoding” (also known as a “Z-order encoding”) in whose final result (“Key”) all bits of the same significance are together. This is helpful for linearising the points of data structures whilst preserving some measure of locality. Z-order encoding isn’t as good at preserving locality as Hilbert encoding, but it is much easier to compute. (For Hilbert encoding please refer to the (lindel crate instead.)

Usage

The user is cordially advised to look for solutions in the following order:

  1. By far the most ergonomic, performant, and panic-free usage is to use the morton_encode and morton_decode functions with arrays of primitive unsigned integers, as follows:
let input = [5u8, 4, 12, 129];
let output = morton_encode(input);
assert_eq!(output, 268447241);
let reassembled_input = morton_decode(output);
assert_eq!(input, reassembled_input);
  1. If 128 bits are not enough, the user should use lindel, which contains the create_lineariseable_data_type macro. This will unfortunately require manual implementation of any trait that the macro doesn’t implement automatically, such as subtraction.

  2. If the co-ordinates come at the end of a long iterator chain of length <=8, the user should use the _generic functions, taking care to not compare Keys resulting from unequal-length iterators.

  3. Finally, if performance is of no concern at all, the user can make use of the arbitrary_bit_size module or of the _generic functions for arbitrary iterator length.

Contents

The functions contained herein are broadly split into four categories:

  • “Bloating functions”, that interleave a number’s bits with zeroes;
  • “Shrinking functions”, that perform the opposite operation;
  • “Morton encoding functions”, that interleave some numbers’ bits with each other;
  • “Morton decoding functions”, that perform the opposite operation.

For more detailed information, please refer to each individual function’s documentation.

std vs no_std

std is entirely optional for this crate, and is only useful if the user needs to use BigUints. Currently, the crate (or its documentation, in any event) has currently been compiled with std on, meaning that it can interleave arbitrary amounts of arbitrarily large integers, as long as the user only runs it within an operating system.

Similar crates

A quick sift through crates.rs comes up with the following 3 similar crates:

  • morton: Just as fast as ours, but only works for [u16; 2] <-> u32.
  • bitwise:
    1. On my machine it doesn’t even compile.
    2. It hasn’t been implemented for more than 3 dimensions.
    3. Its portable implementation, when copy-pasted, is 3.5x slower than ours.
    4. Its non-portable implementation uses the core::arch::x86_64::_pdep_u64 primitive, so we wrote some extra implementations that used it and benchmarked them. To our surprise, it was consistently slightly slower than our own code!
  • zdex: Despite not built with performance in mind, its speed is within a few orders of magnitude from ours, its results are almost correct, and the API ought to be usable with enough dedication and experimentation. It does, however, need std whereas ours doesn’t.

Operation

This code works by splitting each coordinate’s bits in half, and pushing them apart; then each half is split into quarters, and pushed apart again; and so on until each bit is n-1 bits apart from its neighbours. From there, the complete key is computed by a mere shift-and-bitwise-or. A simplified version of this algorithm (and, in fact, the inspiration for this crate) can be found here.

Performance

This crate’s performace depends crucially on three things:

  1. The CPU’s ability to shift numbers by more than one bit at a time (which some microcontrollers can’t do)
  2. The compiler’s ability to utilise compile-time information in order to fold all the constants
  3. The result of the operations fitting within a single register.

Assuming, then, that the compiler can optimise to the best of its ability, the code is reasonably fast, at least for an operation whose hardware implementation only needs wires. For the transformation between [u{N}; M] <-> u{N*M}, the time needed is O(M*log(N)). The program generally needs about 20 machine instructions per coordinate, though this can certainly vary.

A full list of the machine instructions needed for each transformation can be found below; each column lists the operation first, then the total amount of machine instructions, then the amount of machine instructions per co-ordinate rounded up.
   u16   -> [u8; 2] :  23     12
[u8; 2]  ->   u16   :  27     14
   u32   -> [u16; 2]:  28     14
   u64   -> [u32; 2]:  32     16
[u16; 2] ->   u32   :  34     17
[u8; 3]  ->   u32   :  41     14
   u32   -> [u8;  3]:  48     16
[u32; 2] ->   u64   :  48     24
[u8; 4]  ->   u32   :  55     14
   u32   -> [u8;  4]:  55     14
[u16; 3] ->   u64   :  56     19
   u64   -> [u16; 3]:  62     21
[u8; 5]  ->   u64   :  64     13
   u64   -> [u16; 4]:  68     17
[u16; 4] ->   u64   :  71     18
   u64   -> [u8;  5]:  72     15
[u8; 6]  ->   u64   :  79     14
   u64   -> [u8;  6]:  87     15
[u64; 2] ->   u128  :  93     47
[u8; 7]  ->   u64   :  94     14
   u64   -> [u8;  7]: 102     15
[u8; 8]  ->   u64   : 102     13
  u128   -> [u64; 2]: 110     55
  u128   -> [u16; 5]: 131     27
   u64   -> [u8;  8]: 132     17
[u32; 3] ->   u128  : 142     48
  u128   -> [u32; 3]: 142     48
[u8; 9]  ->   u128  : 149     17
[u32; 4] ->   u128  : 152     38
  u128   -> [u32; 4]: 153     39
[u16; 5] ->   u128  : 159     32
  u128   -> [u16; 6]: 185     31
[u8; 10] ->   u128  : 190     19
  u128   -> [u8; 9]:  194     22
[u16; 6] ->   u128  : 213     36
  u128   -> [u8; 10]: 216     22
  u128   -> [u16; 7]: 217     31
[u16; 7] ->   u128  : 230     33
[u8; 11] ->   u128  : 244     23
  u128   -> [u8; 11]: 247     23
[u16; 8] ->   u128  : 253     32
  u128   -> [u16; 8]: 255     32
  u128   -> [u8; 12]: 266     23
[u8; 12] ->   u128  : 268     23
[u8; 13] ->   u128  : 285     22
  u128   -> [u8; 13]: 287     23
  u128   -> [u8; 14]: 308     22
[u8; 14] ->   u128  : 318     23
  u128   -> [u8; 15]: 332     23
[u8; 15] ->   u128  : 361     25
  u128   -> [u8; 16]: 382     24
[u8; 16] ->   u128  : 419     27

As we can see, anything that doesn’t involve u128s generally takes fewer than 20 machine code instructions per co-ordinate; the actual spread ranges between 12 and 24 instructions. For calculations that include u128s and above, the instructions needed increase linearly with the bit length.

Modules

arbitrary_bit_size

The most general, but least performant, ways to perform Morton encoding and decoding. The functions contained herein work correctly, but the author makes absolutely no guarantees about how performant they are.

Traits

IdealKey

Given a data type and a number, it yields the smallest unsigned integer that’s at least N times larger.

ValidKey

Guarantees that a given data structure is suitable for a Morton Key.

Functions

bloat_custom

“Bloats” a given number by interleaving its bits with zeroes.

bloat_custom_checked

Fallibly “bloats” a given number, interleaving its bits with zeroes.

morton_decode

The most ergonomic way to perform the Morton decoding operation.

morton_decode_array

Receives a Key value and unscrambles it into an array.

morton_decode_generic

Receives a Key value and returns an iterator of Coor values that were decoded from it.

morton_encode

The most ergonomic way to perform the Morton encoding operation.

morton_encode_array

Encodes an array of N Coordinates into a Key.

morton_encode_generic

Receives an iterator of Coor values and encodes them all in a Key value.

morton_encode_generic_checked

Receives an iterator of Coor values and encodes them all in a Option<Key> value.

nz

A convenience function.

shrink_custom

“Shrinks” a given number, only keeping a user-provided amount of its bits.

shrink_custom_checked

Fallibly “shrinks” a given number, only keeping a certain amount of its bits.