diskann_wide/lib.rs
1/*
2 * Copyright (c) Microsoft Corporation.
3 * Licensed under the MIT license.
4 */
5
6//! # Wide - Cross Architecture SIMD
7//!
8//! This crate attempts to provide (mostly) Miri-compatible, cross-platform SIMD with support
9//! for light-weight architecture dispatching.
10//!
11//! ## Traits
12//!
13//! SIMD vectors are weird types as they behave both like scalars and containers. Primary
14//! traits exposed by `wide` are:
15//!
16//! * [`SIMDVector`]: General trait for working with a SIMD vector, including creation and
17//! data access.
18//!
19//! * [`SIMDMask`]: Basically a SIMD boolean. Comparisons between `SIMDVectors` are done
20//! lanewise, with the mask containing the results for each lane. Each [`SIMDVector`] has
21//! an associated mask.
22//!
23//! * [`Architecture`]: SIMD instructions are architecture specific. Some server CPUs like
24//! new(ish) x86 models support AVX512, while most consumer CPUs do not yet support that
25//! instruction set extension.
26//!
27//! To allow compilation of single binaries that support multiple architectures, `wide` has
28//! taken the position that the [`Architecture`] is largely explicit when it comes to SIMD
29//! types.
30//!
31//! Generic, cross-architecture algorithms are still supported by using an [`Architecture`]s
32//! associated SIMD types.
33//!
34//! A host of secondary SIMD related traits are also exported, all prefixed with `SIMD`.
35//! Refer to the documentation on each trait for more information.
36//!
37//! ## Structs
38//!
39//! Types implementing [`SIMDMask`] can take a variety of architecture specific shapes.
40//! To that end, each architecture-specific [`SIMDMask`] is associated with a [`BitMask`],
41//! where bit `i` is set to 1 if the corresponding lane in the full mask representation
42//! evaluates to a logic `true`, and `0` otherwise.
43//!
44//! Masks can be converted to and from their corresponding [`BitMask`] as needed.
45//!
46//! ## Safety
47//!
48//! One source of unsafety in SIMD is the accidental use of an intrinsic that is not supported
49//! by the current runtime CPU. This is made safe in `wide` by using the following strategy:
50//!
51//! * Each [`SIMDVector`] and [`SIMDMask`] type is uniquely associated with an [`Architecture`].
52//!
53//! * Construction of a new [`SIMDVector`] or [`SIMDMask`] requires either an instance of its
54//! associated architecture, or a [`SIMDVector`]/[`SIMDMask`] of the same [`Architecture`].
55//!
56//! * [`Architecture`] instances can only be obtained:
57//!
58//! - From an instance of a [`SIMDVector`]/[`SIMDMask`] associated with that [`Architecture`].
59//! - From one of the safe constructors like [`arch::dispatch`] or `new_checked` which
60//! perform runtime checks necessary to ensure the compatibility.
61//! - Through an `unsafe` constructor, on which case all bets are off.
62//!
63//! So an [`Architecture`] is needed to bootstrap the use of SIMD, but from then on, the
64//! existence of SIMD types for a given [`Architecture`] serve as proof-of-safety.
65//!
66//! ## Special Architectures
67//!
68//! Some [`Architecture`]s are special and always available to use safely:
69//!
70//! * [`arch::Scalar`]: An architecture that uses emulation via loops to implement
71//! SIMD-like operations. This architecture is safe because no special hardware intrinsics
72//! are invoked.
73//!
74//! * [`arch::Current`]: The [`Architecture`] that is the closest fit to the current
75//! compilation target. This is not always [`arch::Scalar`]. For example, if compiling
76//! for `x86-64-v3`, then the [`arch::Current`] will be [`arch::x86_64::V3`]. This is
77//! safe because it only uses intrinsics that are already available for the compiler to use.
78//!
79//! The current architecture can be obtained using with [`arch::current()`] or the
80//! constant [`crate::ARCH`].
81//!
82//! # Dev Docs
83//!
84//! ## Adding a new `TxN` vector type.
85//!
86//! 1. Implement the type for the backends in `arch` (you can usually follow and slightly
87//! modify the existing examples).
88//!
89//! 2. Implement for `Emulated` for the implementations that require macro instantiation.
90//!
91//! 3. Add the type to the [`Architecture`] trait.
92//!
93//! At each step, be sure to include tests, which should be fairly straight forward.
94//!
95//! ## Adding a New Implementation to an Existing Trait
96//!
97//! Basically do steps 2-4 of the above list.
98//!
99//! ## Adding a New Trait
100//!
101//! 1. If needed, provide a reference implementation in the `reference` module.
102//!
103//! 2. If it's a relatively simple op, adding a new macro in `test_utils/ops.rs` that
104//! invokes the reference implementation may be all that's needed.
105//!
106//! More complicated operations may require their own test harness (see
107//! `test_tuils/dot_product.rs`).
108//!
109//! Tests should go through the utilities in `test_utils::driver` to ensure adequate
110//! coverage and low compile time.
111//!
112//! 3. Implement the trait for the needed types, implementing for [`Emulated`],
113//! architecture-specific types, [`Architecture`].
114//!
115//! # Testing and Architectural Levels
116//!
117//! By default, `wide` will only run tests supported by the current runtime hardware. This
118//! allows the tests to pass on a wide variety of machines during development.
119//!
120//! However, this can mean that tests targeting architecture not supported by the runtime
121//! hardware will silently succeed.
122//!
123//! To ensure all tests either run, or generate an error if the runtime hardware does not
124//! support a test, set the environment variable
125//! ```text
126//! WIDE_TEST_MIN_ARCH="all"
127//! ```
128//! Various back-end specific values are supported. Note that this variable sets the
129//! minimum level of tests that are **required** to run. Tests for higher architecture
130//! levels will still be run if supported by the runtime hardware.
131//!
132//! ## x86_64
133//!
134//! * `x86-64-v4`: Target Wide's [`arch::x86_64::V4`] architecture.
135//! * `x86-64-v3`: Target Wide's [`arch::x86_64::V3`] architecture.
136//! * `scalar`: Target the scalar architecture.
137
138mod constant;
139pub use constant::{Const, Constant, SupportedLaneCount};
140
141pub(crate) mod reference;
142pub use reference::{cast_f16_to_f32, cast_f32_to_f16};
143
144mod traits;
145pub use traits::{
146 AsSIMD, SIMDAbs, SIMDCast, SIMDDotProduct, SIMDFloat, SIMDMask, SIMDMinMax, SIMDMulAdd,
147 SIMDPartialEq, SIMDPartialOrd, SIMDReinterpret, SIMDSelect, SIMDSigned, SIMDSumTree,
148 SIMDUnsigned, SIMDVector,
149};
150
151mod splitjoin;
152pub use splitjoin::{LoHi, SplitJoin};
153
154mod bitmask;
155pub use bitmask::{BitMask, FromInt};
156
157#[cfg(target_arch = "x86_64")]
158pub(crate) mod doubled;
159
160mod emulated;
161pub use emulated::Emulated;
162
163pub mod lifetime;
164
165/////////////////////////////
166// Architecture Resolution //
167/////////////////////////////
168
169pub mod arch;
170pub use arch::Architecture;
171
172/// The current architecture that is the closest fit for the current compilation target.
173///
174/// The type [`Wide`] is always configured to use this as its associated architecture type.
175pub const ARCH: arch::Current = arch::current();
176
177///////////////////////
178// Alias Definitions //
179///////////////////////
180
181/// Convenience aliases for aliasing SIMD types.
182///
183/// There are currently four supported flavors (the examples below use `f32x4` as an example
184/// identifier:
185///
186/// 1. `diskann_wide::alias!(f32x4) => type f32x4 = <diskann_wide::arch::Current as diskann_wide::Architecture>::f32x4`:
187/// Type alias directly to the compile-time architecture's type.
188///
189/// 2. `diskann_wide::alias!(f32s = f32x4) => type f32s = <diskann_wide::arch::Current as
190/// diskann_wide::Architecture>::f32x4`: Type alias a SIMD type with a custom name.
191///
192/// 3. `diskann_wide::alias!(f32s = <A>::f32x4) => type f32s = <A as diskann_wide::Architecture>::f32x4`:
193/// Type alias a SIMD type from a specific architecture.
194///
195/// 4. `diskann_wide::alias!(f32s<A> = f32x4) => type f32s<A> = <A as diskann_wide::Architecture>::f32x4`:
196/// Type alias a SIMD type in a generic context. This can be useful to work around errors
197/// like
198/// ```text
199/// use of generic parameter from outer item
200/// ```
201#[macro_export]
202macro_rules! alias {
203 ($var:ident) => {
204 $crate::alias!($var = $var);
205 };
206 ($var:ident = $type:ident) => {
207 $crate::alias!($var = <diskann_wide::arch::Current>::$type);
208 };
209 ($var:ident = <$arch:ty>::$type:ident) => {
210 #[allow(non_camel_case_types)]
211 type $var = <$arch as $crate::Architecture>::$type;
212 };
213 ($var:ident<$arch:ident> = $type:ident) => {
214 #[allow(non_camel_case_types)]
215 type $var<$arch> = <$arch as $crate::Architecture>::$type;
216 };
217}
218
219//////////////
220// Internal //
221//////////////
222
223#[cfg(all(test, target_arch = "x86_64"))]
224const TEST_MIN_ARCH: &str = "WIDE_TEST_MIN_ARCH";
225
226#[cfg(all(test, target_arch = "x86_64"))]
227fn get_test_arch() -> Option<String> {
228 match std::env::var(TEST_MIN_ARCH) {
229 Ok(v) => Some(v),
230 Err(e) => match e {
231 std::env::VarError::NotPresent => None,
232 std::env::VarError::NotUnicode(s) => panic!("could not parse test arch: {s:?}"),
233 },
234 }
235}
236
237#[cfg(not(target_arch = "aarch64"))]
238pub(crate) mod helpers;
239
240#[cfg(test)]
241pub(crate) mod test_utils;
242
243///////////
244// Tests //
245///////////
246
247#[cfg(test)]
248mod tests {
249 use super::*;
250
251 fn generic_architecture<A>(arch: A) -> f32
252 where
253 A: Architecture,
254 {
255 alias!(f32s<A> = f32x4);
256 f32s::<A>::from_array(arch, [1.0, 2.0, 3.0, 4.0]).sum_tree()
257 }
258
259 #[test]
260 fn test_generic() {
261 assert_eq!(generic_architecture(arch::Scalar), 10.0);
262 }
263}