npsimd 0.3.0

An ergonomic library for architecture-specific vectorization.
Documentation
//! Well-typed SIMD intrinsics.
//!
//! This module provides basic redefinitions of the SIMD intrinsics from
//! [`core::arch`].  It has several benefits:
//!
//! 1. Missing instructions are implemented using inline assembly.  This will
//!    prevent the compiler from optimizing them as it can other intrinsics, but
//!    this may be worth it if the instruction is useful enough.  This is a
//!    temporary measure -- [`core::arch`] should eventually provide all such
//!    instructions as proper intrinsics.
//!
//! 2. It provides a comprehensive model of Intel's feature-gating conventions,
//!    ensuring at compile-time that instructions are safely used when the CPU
//!    has been confirmed to support them.  Intrinsics for each generation are
//!    implemented on a type, e.g. [`sse::Use`], which can only be created using
//!    [`RuntimeSupport`] information.
//!
//! 3. Instructions use more appropriate typing.  SIMD vectors are represented
//!    using [`u8x16`], [`u32x4`], [`u64x8`], etc.  Instructions which are
//!    supposed to return booleans return [`bool`] instead of [`i32`].  Where a
//!    magic immediate byte is required, it is well-typed.
//!
//! 4. Intrinsics are deduplicated appropriately.  Some intrinsics can be
//!    expressed in terms of others, and the optimizer is capable of noticing
//!    such patterns.  In these cases, the additional intrinsics are left out.
//!    The exposed API is smaller and more representative of the underlying
//!    instruction set.
//!
//! # Usage
//!
//! 1. Construct a [`RuntimeSupport`], using [`RuntimeSupport::detect()`].  This
//!    will interrogate the running CPU for which SIMD features it supports.
//!
//! 2. Pick a SIMD generation to use: SSE, AVX, or AVX-512.  This selects which
//!    instructions will be available and how they are encoded in machine code.
//!    Each generation has a sub-module here, e.g. [`sse`].
//!
//! 3. Construct a feature set (using [`feature_set`]) of features that you need
//!    from this generation.  This can be aliased to a named type for
//!    convenience.
//!
//! 4. Construct the `Use` type for the selected generation, providing it the
//!    constructed feature set type.  Call `new()` ([`sse::Use::new()`]) with
//!    the [`RuntimeSupport`], which will check whether the current CPU actually
//!    supports the required features.
//!
//! 5. Use the low-level intrinsic methods on the `Support` type.

// Utility Modules:

#[macro_use]
mod macros;

pub mod types;
pub use types::Vector;
pub use crate::intel_vector as vector;

mod impls;
pub use impls::Imm;
pub use crate::intel_imm as imm;

pub mod feats;
pub use feats::RuntimeSupport;
pub use crate::intel_features as feature_set;

// Utility Imports:

// Intel x86 intrinsic imports for documentation.
#[allow(unused_imports)]
#[cfg(target_arch = "x86")]
use core::arch::x86::*;

// Intel x86_64 intrinsic imports for documentation.
#[allow(unused_imports)]
#[cfg(target_arch = "x86_64")]
use core::arch::x86_64::*;

use impls::*;
use types::*;
use feats::*;

// Extension Modules:

pub mod sse;
pub use sse::Use as SSE;

pub mod avx;
pub use avx::Use as AVX;