Skip to main content

diskann_wide/
lib.rs

1/*
2 * Copyright (c) Microsoft Corporation.
3 * Licensed under the MIT license.
4 */
5
6//! # Wide - Cross Architecture SIMD
7//!
8//! This crate attempts to provide (mostly) Miri-compatible, cross-platform SIMD with support
9//! for light-weight architecture dispatching.
10//!
11//! ## Traits
12//!
13//! SIMD vectors are weird types as they behave both like scalars and containers. Primary
14//! traits exposed by `wide` are:
15//!
16//! * [`SIMDVector`]: General trait for working with a SIMD vector, including creation and
17//!   data access.
18//!
19//! * [`SIMDMask`]: Basically a SIMD boolean. Comparisons between `SIMDVectors` are done
20//!   lanewise, with the mask containing the results for each lane. Each [`SIMDVector`] has
21//!   an associated mask.
22//!
23//! * [`Architecture`]: SIMD instructions are architecture specific. Some server CPUs like
24//!   new(ish) x86 models support AVX512, while most consumer CPUs do not yet support that
25//!   instruction set extension.
26//!
27//!   To allow compilation of single binaries that support multiple architectures, `wide` has
28//!   taken the position that the [`Architecture`] is largely explicit when it comes to SIMD
29//!   types.
30//!
31//!   Generic, cross-architecture algorithms are still supported by using an [`Architecture`]s
32//!   associated SIMD types.
33//!
34//! A host of secondary SIMD related traits are also exported, all prefixed with `SIMD`.
35//! Refer to the documentation on each trait for more information.
36//!
37//! ## Structs
38//!
39//! Types implementing [`SIMDMask`] can take a variety of architecture specific shapes.
40//! To that end, each architecture-specific [`SIMDMask`] is associated with a [`BitMask`],
41//! where bit `i` is set to 1 if the corresponding lane in the full mask representation
42//! evaluates to a logic `true`, and `0` otherwise.
43//!
44//! Masks can be converted to and from their corresponding [`BitMask`] as needed.
45//!
46//! ## Safety
47//!
48//! One source of unsafety in SIMD is the accidental use of an intrinsic that is not supported
49//! by the current runtime CPU. This is made safe in `wide` by using the following strategy:
50//!
51//! * Each [`SIMDVector`] and [`SIMDMask`] type is uniquely associated with an [`Architecture`].
52//!
53//! * Construction of a new [`SIMDVector`] or [`SIMDMask`] requires either an instance of its
54//!   associated architecture, or a [`SIMDVector`]/[`SIMDMask`] of the same [`Architecture`].
55//!
56//! * [`Architecture`] instances can only be obtained:
57//!
58//!   - From an instance of a [`SIMDVector`]/[`SIMDMask`] associated with that [`Architecture`].
59//!   - From one of the safe constructors like [`arch::dispatch`] or `new_checked` which
60//!     perform runtime checks necessary to ensure the compatibility.
61//!   - Through an `unsafe` constructor, on which case all bets are off.
62//!
63//! So an [`Architecture`] is needed to bootstrap the use of SIMD, but from then on, the
64//! existence of SIMD types for a given [`Architecture`] serve as proof-of-safety.
65//!
66//! ## Special Architectures
67//!
68//! Some [`Architecture`]s are special and always available to use safely:
69//!
70//! * [`arch::Scalar`]: An architecture that uses emulation via loops to implement
71//!   SIMD-like operations. This architecture is safe because no special hardware intrinsics
72//!   are invoked.
73//!
74//! * [`arch::Current`]: The [`Architecture`] that is the closest fit to the current
75//!   compilation target. This is not always [`arch::Scalar`]. For example, if compiling
76//!   for `x86-64-v3`, then the [`arch::Current`] will be [`arch::x86_64::V3`]. This is
77//!   safe because it only uses intrinsics that are already available for the compiler to use.
78//!
79//!   The current architecture can be obtained using with [`arch::current()`] or the
80//!   constant [`crate::ARCH`].
81//!
82//! # Dev Docs
83//!
84//! ## Adding a new `TxN` vector type.
85//!
86//! 1. Implement the type for the backends in `arch` (you can usually follow and slightly
87//!    modify the existing examples).
88//!
89//! 2. Implement for `Emulated` for the implementations that require macro instantiation.
90//!
91//! 3. Add the type to the [`Architecture`] trait.
92//!
93//! At each step, be sure to include tests, which should be fairly straight forward.
94//!
95//! ## Adding a New Implementation to an Existing Trait
96//!
97//! Basically do steps 2-4 of the above list.
98//!
99//! ## Adding a New Trait
100//!
101//! 1. If needed, provide a reference implementation in the `reference` module.
102//!
103//! 2. If it's a relatively simple op, adding a new macro in `test_utils/ops.rs` that
104//!    invokes the reference implementation may be all that's needed.
105//!
106//!    More complicated operations may require their own test harness (see
107//!    `test_tuils/dot_product.rs`).
108//!
109//!    Tests should go through the utilities in `test_utils::driver` to ensure adequate
110//!    coverage and low compile time.
111//!
112//! 3. Implement the trait for the needed types, implementing for [`Emulated`],
113//!    architecture-specific types, [`Architecture`].
114//!
115//! # Testing and Architectural Levels
116//!
117//! By default, `wide` will only run tests supported by the current runtime hardware. This
118//! allows the tests to pass on a wide variety of machines during development.
119//!
120//! However, this can mean that tests targeting architecture not supported by the runtime
121//! hardware will silently succeed.
122//!
123//! To ensure all tests either run, or generate an error if the runtime hardware does not
124//! support a test, set the environment variable
125//! ```text
126//! WIDE_TEST_MIN_ARCH="all"
127//! ```
128//! Various back-end specific values are supported. Note that this variable sets the
129//! minimum level of tests that are **required** to run. Tests for higher architecture
130//! levels will still be run if supported by the runtime hardware.
131//!
132//! ## x86_64
133//!
134//! * `x86-64-v4`: Target Wide's [`arch::x86_64::V4`] architecture.
135//! * `x86-64-v3`: Target Wide's [`arch::x86_64::V3`] architecture.
136//! * `scalar`: Target the scalar architecture.
137
138mod constant;
139pub use constant::{Const, Constant, SupportedLaneCount};
140
141pub(crate) mod reference;
142pub use reference::{cast_f16_to_f32, cast_f32_to_f16};
143
144mod traits;
145pub use traits::{
146    AsSIMD, SIMDAbs, SIMDCast, SIMDDotProduct, SIMDFloat, SIMDMask, SIMDMinMax, SIMDMulAdd,
147    SIMDPartialEq, SIMDPartialOrd, SIMDReinterpret, SIMDSelect, SIMDSigned, SIMDSumTree,
148    SIMDUnsigned, SIMDVector,
149};
150
151mod splitjoin;
152pub use splitjoin::{LoHi, SplitJoin};
153
154mod bitmask;
155pub use bitmask::{BitMask, FromInt};
156
157#[cfg(target_arch = "x86_64")]
158pub(crate) mod doubled;
159
160mod emulated;
161pub use emulated::Emulated;
162
163pub mod lifetime;
164
165/////////////////////////////
166// Architecture Resolution //
167/////////////////////////////
168
169pub mod arch;
170pub use arch::Architecture;
171
172/// The current architecture that is the closest fit for the current compilation target.
173///
174/// The type [`Wide`] is always configured to use this as its associated architecture type.
175pub const ARCH: arch::Current = arch::current();
176
177///////////////////////
178// Alias Definitions //
179///////////////////////
180
181/// Convenience aliases for aliasing SIMD types.
182///
183/// There are currently four supported flavors (the examples below use `f32x4` as an example
184/// identifier:
185///
186/// 1. `diskann_wide::alias!(f32x4) => type f32x4 = <diskann_wide::arch::Current as diskann_wide::Architecture>::f32x4`:
187///    Type alias directly to the compile-time architecture's type.
188///
189/// 2. `diskann_wide::alias!(f32s = f32x4) => type f32s = <diskann_wide::arch::Current as
190///    diskann_wide::Architecture>::f32x4`: Type alias a SIMD type with a custom name.
191///
192/// 3. `diskann_wide::alias!(f32s = <A>::f32x4) => type f32s = <A as diskann_wide::Architecture>::f32x4`:
193///    Type alias a SIMD type from a specific architecture.
194///
195/// 4. `diskann_wide::alias!(f32s<A> = f32x4) => type f32s<A> = <A as diskann_wide::Architecture>::f32x4`:
196///    Type alias a SIMD type in a generic context. This can be useful to work around errors
197///    like
198///    ```text
199///    use of generic parameter from outer item
200///    ```
201#[macro_export]
202macro_rules! alias {
203    ($var:ident) => {
204        $crate::alias!($var = $var);
205    };
206    ($var:ident = $type:ident) => {
207        $crate::alias!($var = <diskann_wide::arch::Current>::$type);
208    };
209    ($var:ident = <$arch:ty>::$type:ident) => {
210        #[allow(non_camel_case_types)]
211        type $var = <$arch as $crate::Architecture>::$type;
212    };
213    ($var:ident<$arch:ident> = $type:ident) => {
214        #[allow(non_camel_case_types)]
215        type $var<$arch> = <$arch as $crate::Architecture>::$type;
216    };
217}
218
219//////////////
220// Internal //
221//////////////
222
223#[cfg(all(test, target_arch = "x86_64"))]
224const TEST_MIN_ARCH: &str = "WIDE_TEST_MIN_ARCH";
225
226#[cfg(all(test, target_arch = "x86_64"))]
227fn get_test_arch() -> Option<String> {
228    match std::env::var(TEST_MIN_ARCH) {
229        Ok(v) => Some(v),
230        Err(e) => match e {
231            std::env::VarError::NotPresent => None,
232            std::env::VarError::NotUnicode(s) => panic!("could not parse test arch: {s:?}"),
233        },
234    }
235}
236
237#[cfg(not(target_arch = "aarch64"))]
238pub(crate) mod helpers;
239
240#[cfg(test)]
241pub(crate) mod test_utils;
242
243///////////
244// Tests //
245///////////
246
247#[cfg(test)]
248mod tests {
249    use super::*;
250
251    fn generic_architecture<A>(arch: A) -> f32
252    where
253        A: Architecture,
254    {
255        alias!(f32s<A> = f32x4);
256        f32s::<A>::from_array(arch, [1.0, 2.0, 3.0, 4.0]).sum_tree()
257    }
258
259    #[test]
260    fn test_generic() {
261        assert_eq!(generic_architecture(arch::Scalar), 10.0);
262    }
263}