Skip to main content

diskann_wide/
lib.rs

1/*
2 * Copyright (c) Microsoft Corporation.
3 * Licensed under the MIT license.
4 */
5
6//! # Wide - Cross Architecture SIMD
7//!
8//! This crate attempts to provide (mostly) Miri-compatible, cross-platform SIMD with support
9//! for light-weight architecture dispatching.
10//!
11//! ## Traits
12//!
13//! SIMD vectors are weird types as they behave both like scalars and containers. Primary
14//! traits exposed by `wide` are:
15//!
16//! * [`SIMDVector`]: General trait for working with a SIMD vector, including creation and
17//!   data access.
18//!
19//! * [`SIMDMask`]: Basically a SIMD boolean. Comparisons between `SIMDVectors` are done
20//!   lanewise, with the mask containing the results for each lane. Each [`SIMDVector`] has
21//!   an associated mask.
22//!
23//! * [`Architecture`]: SIMD instructions are architecture specific. Some server CPUs like
24//!   new(ish) x86 models support AVX512, while most consumer CPUs do not yet support that
25//!   instruction set extension.
26//!
27//!   To allow compilation of single binaries that support multiple architectures, `wide` has
28//!   taken the position that the [`Architecture`] is largely explicit when it comes to SIMD
29//!   types.
30//!
31//!   Generic, cross-architecture algorithms are still supported by using an [`Architecture`]s
32//!   associated SIMD types.
33//!
34//! A host of secondary SIMD related traits are also exported, all prefixed with `SIMD`.
35//! Refer to the documentation on each trait for more information.
36//!
37//! ## Structs
38//!
39//! Types implementing [`SIMDMask`] can take a variety of architecture specific shapes.
40//! To that end, each architecture-specific [`SIMDMask`] is associated with a [`BitMask`],
41//! where bit `i` is set to 1 if the corresponding lane in the full mask representation
42//! evaluates to a logic `true`, and `0` otherwise.
43//!
44//! Masks can be converted to and from their corresponding [`BitMask`] as needed.
45//!
46//! ## Safety
47//!
48//! One source of unsafety in SIMD is the accidental use of an intrinsic that is not supported
49//! by the current runtime CPU. This is made safe in `wide` by using the following strategy:
50//!
51//! * Each [`SIMDVector`] and [`SIMDMask`] type is uniquely associated with an [`Architecture`].
52//!
53//! * Construction of a new [`SIMDVector`] or [`SIMDMask`] requires either an instance of its
54//!   associated architecture, or a [`SIMDVector`]/[`SIMDMask`] of the same [`Architecture`].
55//!
56//! * [`Architecture`] instances can only be obtained:
57//!
58//!   - From an instance of a [`SIMDVector`]/[`SIMDMask`] associated with that [`Architecture`].
59//!   - From one of the safe constructors like [`arch::dispatch`] or `new_checked` which
60//!     perform runtime checks necessary to ensure the compatibility.
61//!   - Through an `unsafe` constructor, on which case all bets are off.
62//!
63//! So an [`Architecture`] is needed to bootstrap the use of SIMD, but from then on, the
64//! existence of SIMD types for a given [`Architecture`] serve as proof-of-safety.
65//!
66//! ## Special Architectures
67//!
68//! Some [`Architecture`]s are special and always available to use safely:
69//!
70//! * [`arch::Scalar`]: An architecture that uses emulation via loops to implement
71//!   SIMD-like operations. This architecture is safe because no special hardware intrinsics
72//!   are invoked.
73//!
74//! * [`arch::Current`]: The [`Architecture`] that is the closest fit to the current
75//!   compilation target. This is not always [`arch::Scalar`]. For example, if compiling
76//!   for `x86-64-v3`, then the [`arch::Current`] will be [`arch::x86_64::V3`]. This is
77//!   safe because it only uses intrinsics that are already available for the compiler to use.
78//!
79//!   The current architecture can be obtained using with [`arch::current()`] or the
80//!   constant [`crate::ARCH`].
81//!
82//! # Dev Docs
83//!
84//! ## Adding a new `TxN` vector type.
85//!
86//! 1. Implement the type for the backends in `arch` (you can usually follow and slightly
87//!    modify the existing examples).
88//!
89//! 2. Implement for `Emulated` for the implementations that require macro instantiation.
90//!
91//! 3. Add the type to the [`Architecture`] trait.
92//!
93//! At each step, be sure to include tests, which should be fairly straight forward.
94//!
95//! ## Adding a New Implementation to an Existing Trait
96//!
97//! Basically do steps 2-4 of the above list.
98//!
99//! ## Adding a New Trait
100//!
101//! 1. If needed, provide a reference implementation in the `reference` module.
102//!
103//! 2. If it's a relatively simple op, adding a new macro in `test_utils/ops.rs` that
104//!    invokes the reference implementation may be all that's needed.
105//!
106//!    More complicated operations may require their own test harness (see
107//!    `test_utils/dot_product.rs`).
108//!
109//!    Tests should go through the utilities in `test_utils::driver` to ensure adequate
110//!    coverage and low compile time.
111//!
112//! 3. Implement the trait for the needed types, implementing for [`Emulated`],
113//!    architecture-specific types, [`Architecture`].
114//!
115//! # Testing and Architectural Levels
116//!
117//! By default, `wide` will only run tests supported by the current runtime hardware. This
118//! allows the tests to pass on a wide variety of machines during development.
119//!
120//! However, this can mean that tests targeting architecture not supported by the runtime
121//! hardware will silently succeed.
122//!
123//! To ensure all tests either run, or generate an error if the runtime hardware does not
124//! support a test, set the environment variable
125//! ```text
126//! WIDE_TEST_MIN_ARCH="all"
127//! ```
128//! Various back-end specific values are supported. Note that this variable sets the
129//! minimum level of tests that are **required** to run. Tests for higher architecture
130//! levels will still be run if supported by the runtime hardware.
131//!
132//! ## x86_64
133//!
134//! * `x86-64-v4`: Target Wide's [`arch::x86_64::V4`] architecture.
135//! * `x86-64-v3`: Target Wide's [`arch::x86_64::V3`] architecture.
136//! * `scalar`: Target the scalar architecture.
137
138mod constant;
139pub use constant::{Const, Constant, SupportedLaneCount};
140
141pub(crate) mod reference;
142pub use reference::{cast_f16_to_f32, cast_f32_to_f16};
143
144mod traits;
145pub use traits::{
146    AsSIMD, SIMDAbs, SIMDCast, SIMDDotProduct, SIMDFloat, SIMDMask, SIMDMinMax, SIMDMulAdd,
147    SIMDPartialEq, SIMDPartialOrd, SIMDReinterpret, SIMDSelect, SIMDSigned, SIMDSumTree,
148    SIMDUnsigned, SIMDVector, ZipUnzip,
149};
150
151mod splitjoin;
152pub use splitjoin::{LoHi, SplitJoin};
153
154mod bitmask;
155pub use bitmask::{BitMask, FromInt};
156
157pub(crate) mod doubled;
158
159mod emulated;
160pub use emulated::Emulated;
161
162pub mod lifetime;
163
164/////////////////////////////
165// Architecture Resolution //
166/////////////////////////////
167
168pub mod arch;
169pub use arch::Architecture;
170
171/// The current architecture that is the closest fit for the current compilation target.
172pub const ARCH: arch::Current = arch::current();
173
174///////////////////////
175// Alias Definitions //
176///////////////////////
177
178/// Convenience aliases for aliasing SIMD types.
179///
180/// There are currently four supported flavors (the examples below use `f32x4` as an example
181/// identifier:
182///
183/// 1. `diskann_wide::alias!(f32x4) => type f32x4 = <diskann_wide::arch::Current as diskann_wide::Architecture>::f32x4`:
184///    Type alias directly to the compile-time architecture's type.
185///
186/// 2. `diskann_wide::alias!(f32s = f32x4) => type f32s = <diskann_wide::arch::Current as
187///    diskann_wide::Architecture>::f32x4`: Type alias a SIMD type with a custom name.
188///
189/// 3. `diskann_wide::alias!(f32s = <A>::f32x4) => type f32s = <A as diskann_wide::Architecture>::f32x4`:
190///    Type alias a SIMD type from a specific architecture.
191///
192/// 4. `diskann_wide::alias!(f32s<A> = f32x4) => type f32s<A> = <A as diskann_wide::Architecture>::f32x4`:
193///    Type alias a SIMD type in a generic context. This can be useful to work around errors
194///    like
195///    ```text
196///    use of generic parameter from outer item
197///    ```
198#[macro_export]
199macro_rules! alias {
200    ($var:ident) => {
201        $crate::alias!($var = $var);
202    };
203    ($var:ident = $type:ident) => {
204        $crate::alias!($var = <diskann_wide::arch::Current>::$type);
205    };
206    ($var:ident = <$arch:ty>::$type:ident) => {
207        #[allow(non_camel_case_types)]
208        type $var = <$arch as $crate::Architecture>::$type;
209    };
210    ($var:ident<$arch:ident> = $type:ident) => {
211        #[allow(non_camel_case_types)]
212        type $var<$arch> = <$arch as $crate::Architecture>::$type;
213    };
214}
215
216//////////////
217// Internal //
218//////////////
219
220#[cfg(all(test, any(target_arch = "x86_64", target_arch = "aarch64")))]
221const TEST_MIN_ARCH: &str = "WIDE_TEST_MIN_ARCH";
222
223#[cfg(all(test, any(target_arch = "x86_64", target_arch = "aarch64")))]
224fn get_test_arch() -> Option<String> {
225    match std::env::var(TEST_MIN_ARCH) {
226        Ok(v) => Some(v),
227        Err(e) => match e {
228            std::env::VarError::NotPresent => None,
229            std::env::VarError::NotUnicode(s) => panic!("could not parse test arch: {s:?}"),
230        },
231    }
232}
233
234pub(crate) mod helpers;
235
236#[cfg(test)]
237pub(crate) mod test_utils;
238
239///////////
240// Tests //
241///////////
242
243#[cfg(test)]
244mod tests {
245    use super::*;
246
247    fn generic_architecture<A>(arch: A) -> f32
248    where
249        A: Architecture,
250    {
251        alias!(f32s<A> = f32x4);
252        f32s::<A>::from_array(arch, [1.0, 2.0, 3.0, 4.0]).sum_tree()
253    }
254
255    #[test]
256    fn test_generic() {
257        assert_eq!(generic_architecture(arch::Scalar), 10.0);
258    }
259}