[][src]Crate safe_arch

A crate that safely exposes arch intrinsics via cfg.

This crate lets you safely use CPU intrinsics. Those things in core::arch.

  • Most of them are 100% safe to use as long as the CPU feature is available, like addition and multiplication and stuff.
  • Some of them require that you uphold extra alignment requirements or whatever, which we do via the type system when necessary.
  • Some of them are absolutely not safe at all because it causes UB at the LLVM level, so those things are not exposed here.
  • Some of them are pointless to expose here because the core crate already provides the same functionality in a cross-platform way, so we skip those.
  • This crate works purely via cfg and compile time feature selection, there are no runtime checks added. This means that if you do want to do runtime feature detection and then dynamically call an intrinsic if it happens to be available, then this crate sadly isn't for you.
  • This crate aims to be as minimal as possible. Just exposing each intrinsic as a safe function with an easier to understand name and some minimal docs. Building higher level abstractions on top of the intrinsics is the domain of other crates.
  • That said, each raw SIMD type is newtype'd as a wrapper (with a pub field) so that better trait impls can be provided.

Current Support

This grows slowly because there's just so many intrinsics.

  • Intel (x86 / x86_64)
    • 128-bit: sse, sse2, sse3, ssse3

Compile Time CPU Target Features

At the time of me writing this, Rust enables the sse and sse2 CPU features by default for all i686 (x86) and x86_64 builds. Those CPU features are built into the design of x86_64, and you'd need a super old x86 CPU for it to not support at least sse and sse2, so they're a safe bet for the language to enable all the time. In fact, because the standard library is compiled with them enabled, simply trying to disable those features would actually cause ABI issues and fill your program with UB (link).

If you want additional CPU features available at compile time you'll have to enable them with an additional arg to rustc. For a feature named name you pass -C target-feature=+name, such as -C target-feature=+sse3 for sse3.

You can alternately enable all target features of the current CPU with -C target-cpu=native. This is primarily of use if you're building a program you'll only run on your own system.

It's sometimes hard to know if your target platform will support a given feature set, but the Steam Hardware Survey is generally taken as a guide to what you can expect people to have available. If you click "Other Settings" it'll expand into a list of CPU target features and how common they are. These days, it seems that sse3 can be safely assumed, and ssse3, sse4.1, and sse4.2 are pretty safe bets as well. The stuff above 128-bit isn't as common yet, give it another few years.

Please note that executing a program on a CPU that doesn't support the target features it was compiles for is Undefined Behavior.

Currently, Rust doesn't actually support an easy way for you to check that a feature enabled at compile time is actually available at runtime. There is the "feature_detected" family of macros, but if you enable a feature they will evaluate to a constant true instead of actually deferring the check for the feature to runtime. This means that, if you did want a check at the start of your program, to confirm that all the assumed features are present and error out when the assumptions don't hold, you can't use that macro. You gotta use CPUID and check manually. rip. Hopefully we can make that process easier in a future version of this crate.

A Note On Working With Cfg

There's two main ways to use cfg:

  • Via an attribute placed on an item, block, or expression:
    • #[cfg(debug_assertions)] println!("hello");
  • Via a macro used within an expression position:
    • if cfg!(debug_assertions) { println!("hello"); }

The difference might seem small but it's actually very important:

  • The attribute form will include code or not before deciding if all the items named and so forth really exist or not. This means that code that is configured via attribute can safely name things that don't always exist as long as the things they name do exist whenever that code is configured into the build.
  • The macro form will include the configured code no matter what, and then the macro resolves to a constant true or false and the compiler uses dead code elimination to cut out the path not taken.

This crate uses cfg via the attribute, so the functions it exposes don't exist at all when the appropriate CPU target features aren't enabled. Accordingly, if you plan to call this crate or not depending on what features are enabled in the build you'll also need to control your use of this crate via cfg attribute, not cfg macro.

Re-exports

pub use intel::*;

Modules

intel

Types and functions for safe x86 / x86_64 intrinsic usage.

Macros

byte_shift_left_u128_immediate_m128i

Shifts all bits in the entire register left by a number of bytes.

byte_shift_right_u128_immediate_m128i

Shifts all bits in the entire register right by a number of bytes.

extract_i16_as_i32_m128i

Gets an i16 value out of an m128i, returns as i32.

insert_i16_from_i32_m128i

Inserts the low 16 bits of an i32 value into an m128i.

shift_left_i16_immediate_m128i

Shifts all i16 lanes left by an immediate.

shift_left_i32_immediate_m128i

Shifts all i32 lanes left by an immediate.

shift_left_i64_immediate_m128i

Shifts both i64 lanes left by an immediate.

shift_right_i16_immediate_m128i

Shifts all i16 lanes right by an immediate.

shift_right_i32_immediate_m128i

Shifts all i32 lanes right by an immediate.

shift_right_u16_immediate_m128i

Shifts all u16 lanes right by an immediate.

shift_right_u32_immediate_m128i

Shifts all u32 lanes right by an immediate.

shift_right_u64_immediate_m128i

Shifts both u64 lanes right by an immediate.

shuffle_i16_high_lanes_m128i

Shuffles the higher i16 lanes, low lanes unaffected.

shuffle_i16_low_lanes_m128i

Shuffles the lower i16 lanes, high lanes unaffected.

shuffle_i32_m128i

Shuffles the i32 lanes around.

shuffle_m128

Shuffles the lanes around.

shuffle_m128d

Shuffles the lanes around.