Crate pessimize

source ·
Expand description

This crate aims to implement minimally costly optimization barriers for every architecture that has asm!() support (currently x86(_64), 32-bit ARM, AArch64 and RISC-V, more possible on nightly via the asm_experimental_arch unstable feature).

You can use these barriers to prevent the compiler from optimizing out selected redundant or unnecessary computations in situations where such optimization is undesirable. The most typical usage scenario is microbenchmarking, but there might also be applications to cryptography or low-level development, where optimization must also be controlled.

Implementations

The barriers will be implemented for any type from core/std that either…

  • Can be shoved into CPU registers, with “natural” target registers dictated by normal ABI calling conventions.
  • Can be losslessly converted back and forth to a set of values that have this property, in a manner that is easily optimized out.

Any type which is not directly supported can still be subjected to an optimization barrier by taking a reference to it and subjecting that reference to an optimization barrier, at the cost of causing the value to be spilled to memory. If the nightly default_impl feature is enabled, the crate will provide a default Pessimize impl that does this for you.

You can tell which types implement Pessimize on your compiler target by running cargo doc and checking the implementor list of Pessimize and BorrowPessimize.

To implement Pessimize for your own types, you should consider implementing PessimizeCast and BorrowPessimize, which make the job a bit easier. Pessimize is automatically implemented for any type that implements BorrowPessimize.

Semantics

For pointer-like entities, optimization barriers other than hide can have the side-effect of causing the compiler to assume that global and thread-local variables might have been accessed using similar semantics as the pointer itself. This will reduce applicable compiler optimizations for such variables, so the use of hide should be favored whenever global or thread-local variables are used (or you don’t know if they are used).

In general, barriers other than hide have more avenues for surprising behavior (see their documentation for details), so you should strive to do what you want with hide if possible, and only reach for other barriers where the extra expressive power of these primitives is truly needed.

While the barriers will accept zero-sized types such as PhantomData, they will only be effective for those that access global or thread-local state, like std::alloc::System does. That is because without such external state, zero-sized objects do not own or provide access to any information, so the compiler can trivially infer that the optimization barrier cannot read or modify any internal state. Implementations of Pessimize on such types are only provided to ease automatic derivation of Pessimize like tuples (and hopefully custom structs too in the future).

The documentation of the top-level functions (hide, assume_read, consume, assume_accessed and assume_accessed_imut) contain more details on the optimization barrier that is being implemented.

When to use this crate

You should consider use of this crate over core::hint::black_box, or third party cousins thereof, because…

  • It works on stable Rust
  • It has a better-defined API contract with stronger guarantees (unlike core::hint::black_box, where “do nothing” is a valid implementation).
  • It exposes finer-grained operations, which clarify your code’s intent and reduce harmful side-effects.

The main drawbacks of this crate’s approach being that…

  • It only works on selected hardware architectures (though they are the ones on which you are most likely to run benchmarks, and it should get better over time as more inline assembly architectures get stabilized).
  • It needs a lot of tricky unsafe code.

Modules

  • Hardware-specific functionality

Traits

  • Extract references to Pessimize values from references to Self (Pessimize impl helper)
  • Optimization barriers provided by this crate
  • Convert Self back and forth to a Pessimize impl (Pessimize impl helper)

Functions

  • Force the compiler to assume that any data transitively reachable via a pointer/reference has been read, and modified if Rust rules allow for it.
  • Variant of assume_accessed for internally mutable types
  • Assume that all global and thread-local variables have been read and modified
  • Assume that all global and thread-local variables have been read
  • Force the compiler to assume that a value, and data transitively reachable via that value (for pointers/refs), is being used if Rust rules allow for it.
  • Like assume_read, but by value
  • Re-emit the input value as its output (identity function), but force the compiler to assume that it is a completely different value.
  • Implementation of BorrowPessimize::assume_accessed_impl for types where there is a way to get a T::Pessimized from an &mut T
  • Implementation of BorrowPessimize::assume_accessed_impl for types where there is a cheap way to extract the inner T from an &mut T
  • Implementation of BorrowPessimize::with_pessimize for Copy types