npsimd 0.3.0

An ergonomic library for architecture-specific vectorization.
Documentation
//! SIMD on Intel.
//!
//! This module provides a safe and idiomatic API for writing vectorized code
//! for Intel processors, using the SSE, AVX, and/or AVX-512 instruction sets.
//! It ensures that the running CPU supports the instructions being executed,
//! making almost every SIMD instruction safe to use.  SIMD vectors are typed
//! appropriately, and custom element types can be defined if necessary.
//!
//! This interface is targeted to programmers who are experienced with Intel's
//! SIMD instructions already.  Every provided operation is expected to compile
//! to a specific instruction or sequence thereof.  Programmers are expected to
//! know which instructions are available and design their vectorized algorithms
//! accordingly.  For a higher-level API for vectorization, look to a portable
//! SIMD interface, such as [`core::simd`].
//!
//! The following resources are crucial to this kind of SIMD programming:
//!
//! - [The Intel 64 and IA-32 Architectures Software Developer's Manual, Volume
//!   2][sdm-2] provides the complete reference documentation for every Intel
//!   x86 instruction.  It is incredibly useful for exploring the various SIMD
//!   instruction sets and fully understanding complex SIMD instructions.
//!
//!   [sdm-2]: https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html
//!
//! - [Félix Cloutier's x86 and amd64 instruction reference][fcl] is an online
//!   copy of the Intel Software Developer's Manual, automatically constructed
//!   by a "dump script".  It is useful to quickly look up an instruction, but
//!   should not be used as an authoritative source.
//!
//!   [fcl]: https://www.felixcloutier.com/x86/
//!
//! - [The Intel Intrinsics Guide][guide] correlates Intel C intrinsics to the
//!   underlying instructions, and provides partial information about how each
//!   instruction works and the performance on some platforms.  It is organized
//!   by SIMD instruction set and is useful for exploring the instructions in
//!   each set.
//!
//!   [guide]: https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html
//!
//! - [uops.info](https://uops.info) provides performance measurements of every
//!   Intel instruction across every Intel platform.  It is an incredibly useful
//!   tool for understanding how a sequence of instructions will execute, based
//!   on latency, throughput, and port usage, and is essential for designing an
//!   efficient vectorized algorithm.
//!
//! # Architecture
//!
//! 

#![cfg(any(target_arch = "x86", target_arch = "x86_64"))]

use core::marker::Freeze;

// Utility Modules:

pub mod feats;
use feats::*;

pub mod prims;
use prims::*;

#[macro_use]
mod macros;

mod impls;
use impls::*;

// Basic Types:

mod vector;
pub use vector::*;

mod value;
pub use value::*;

pub mod masks;
use masks::*;

// Extensions:

pub mod sse;

// TODO: Remove
pub mod low;

// Utility Imports:

// Intel x86 intrinsic imports for documentation.
#[allow(unused_imports)]
#[cfg(target_arch = "x86")]
use core::arch::x86::*;

// Intel x86_64 intrinsic imports for documentation.
#[allow(unused_imports)]
#[cfg(target_arch = "x86_64")]
use core::arch::x86_64::*;

/// A SIMD-compatible element.
///
/// # Safety
///
/// A type `T` can soundly implement `Element` if and only if all of the
/// following conditions hold:
///
/// - `T` contains no niches (any bit-pattern forms a valid instance of `T`).
/// - `[T; LEN]` has the same size as `Primitive`.
/// - The alignment of `T` is less than or equal to that of `Primitive`.
/// - `[T; LEN]` can be soundly transmuted to and from `Primitive`.
/// - `T` does not contain interior mutability.
pub unsafe trait Element<const LEN: usize>: Copy + Freeze
where [Self; LEN]: Array<Primitive = Self::Primitive> {
    /// The matching primitive vector type.
    type Primitive: PrimitiveVector;
}

pub unsafe trait Array: Copy + Freeze {
    type Primitive: PrimitiveVector;
}

unsafe impl<T, const LEN: usize> Array for [T; LEN]
where T: Element<LEN> {
    type Primitive = T::Primitive;
}

/// The ability to load/store a vector from/into memory.
pub unsafe trait Movable<G, const LEN: usize>
where Self: Element<LEN> {
    /// The SIMD extension providing these instructions.
    type Feature: Feature<G>;

    /// Load a primitive vector from unaligned memory.
    unsafe fn load(ptr: &[Self; LEN]) -> Self::Primitive;

    /// Load a primitive vector from aligned memory.
    unsafe fn load_aligned(ptr: &Vector<[Self; LEN]>) -> Self::Primitive;

    /// Store a primitive vector into unaligned memory.
    unsafe fn store(this: Self::Primitive, ptr: &mut [Self; LEN]);

    /// Store a primitive vector into aligned memory.
    unsafe fn store_aligned(this: Self::Primitive, ptr: &mut Vector<[Self; LEN]>);
}