diskann_wide/lib.rs
1/*
2 * Copyright (c) Microsoft Corporation.
3 * Licensed under the MIT license.
4 */
5
6//! # Wide - Cross Architecture SIMD
7//!
8//! This crate attempts to provide (mostly) Miri-compatible, cross-platform SIMD with support
9//! for light-weight architecture dispatching.
10//!
11//! ## Traits
12//!
13//! SIMD vectors are weird types as they behave both like scalars and containers. Primary
14//! traits exposed by `wide` are:
15//!
16//! * [`SIMDVector`]: General trait for working with a SIMD vector, including creation and
17//! data access.
18//!
19//! * [`SIMDMask`]: Basically a SIMD boolean. Comparisons between `SIMDVectors` are done
20//! lanewise, with the mask containing the results for each lane. Each [`SIMDVector`] has
21//! an associated mask.
22//!
23//! * [`Architecture`]: SIMD instructions are architecture specific. Some server CPUs like
24//! new(ish) x86 models support AVX512, while most consumer CPUs do not yet support that
25//! instruction set extension.
26//!
27//! To allow compilation of single binaries that support multiple architectures, `wide` has
28//! taken the position that the [`Architecture`] is largely explicit when it comes to SIMD
29//! types.
30//!
31//! Generic, cross-architecture algorithms are still supported by using an [`Architecture`]s
32//! associated SIMD types.
33//!
34//! A host of secondary SIMD related traits are also exported, all prefixed with `SIMD`.
35//! Refer to the documentation on each trait for more information.
36//!
37//! ## Structs
38//!
39//! Types implementing [`SIMDMask`] can take a variety of architecture specific shapes.
40//! To that end, each architecture-specific [`SIMDMask`] is associated with a [`BitMask`],
41//! where bit `i` is set to 1 if the corresponding lane in the full mask representation
42//! evaluates to a logic `true`, and `0` otherwise.
43//!
44//! Masks can be converted to and from their corresponding [`BitMask`] as needed.
45//!
46//! ## Safety
47//!
48//! One source of unsafety in SIMD is the accidental use of an intrinsic that is not supported
49//! by the current runtime CPU. This is made safe in `wide` by using the following strategy:
50//!
51//! * Each [`SIMDVector`] and [`SIMDMask`] type is uniquely associated with an [`Architecture`].
52//!
53//! * Construction of a new [`SIMDVector`] or [`SIMDMask`] requires either an instance of its
54//! associated architecture, or a [`SIMDVector`]/[`SIMDMask`] of the same [`Architecture`].
55//!
56//! * [`Architecture`] instances can only be obtained:
57//!
58//! - From an instance of a [`SIMDVector`]/[`SIMDMask`] associated with that [`Architecture`].
59//! - From one of the safe constructors like [`arch::dispatch`] or `new_checked` which
60//! perform runtime checks necessary to ensure the compatibility.
61//! - Through an `unsafe` constructor, on which case all bets are off.
62//!
63//! So an [`Architecture`] is needed to bootstrap the use of SIMD, but from then on, the
64//! existence of SIMD types for a given [`Architecture`] serve as proof-of-safety.
65//!
66//! ## Special Architectures
67//!
68//! Some [`Architecture`]s are special and always available to use safely:
69//!
70//! * [`arch::Scalar`]: An architecture that uses emulation via loops to implement
71//! SIMD-like operations. This architecture is safe because no special hardware intrinsics
72//! are invoked.
73//!
74//! * [`arch::Current`]: The [`Architecture`] that is the closest fit to the current
75//! compilation target. This is not always [`arch::Scalar`]. For example, if compiling
76//! for `x86-64-v3`, then the [`arch::Current`] will be [`arch::x86_64::V3`]. This is
77//! safe because it only uses intrinsics that are already available for the compiler to use.
78//!
79//! The current architecture can be obtained using with [`arch::current()`] or the
80//! constant [`crate::ARCH`].
81//!
82//! # Dev Docs
83//!
84//! ## Adding a new `TxN` vector type.
85//!
86//! 1. Implement the type for the backends in `arch` (you can usually follow and slightly
87//! modify the existing examples).
88//!
89//! 2. Implement for `Emulated` for the implementations that require macro instantiation.
90//!
91//! 3. Add the type to the [`Architecture`] trait.
92//!
93//! At each step, be sure to include tests, which should be fairly straight forward.
94//!
95//! ## Adding a New Implementation to an Existing Trait
96//!
97//! Basically do steps 2-4 of the above list.
98//!
99//! ## Adding a New Trait
100//!
101//! 1. If needed, provide a reference implementation in the `reference` module.
102//!
103//! 2. If it's a relatively simple op, adding a new macro in `test_utils/ops.rs` that
104//! invokes the reference implementation may be all that's needed.
105//!
106//! More complicated operations may require their own test harness (see
107//! `test_utils/dot_product.rs`).
108//!
109//! Tests should go through the utilities in `test_utils::driver` to ensure adequate
110//! coverage and low compile time.
111//!
112//! 3. Implement the trait for the needed types, implementing for [`Emulated`],
113//! architecture-specific types, [`Architecture`].
114//!
115//! # Testing and Architectural Levels
116//!
117//! By default, `wide` will only run tests supported by the current runtime hardware. This
118//! allows the tests to pass on a wide variety of machines during development.
119//!
120//! However, this can mean that tests targeting architecture not supported by the runtime
121//! hardware will silently succeed.
122//!
123//! To ensure all tests either run, or generate an error if the runtime hardware does not
124//! support a test, set the environment variable
125//! ```text
126//! WIDE_TEST_MIN_ARCH="all"
127//! ```
128//! Various back-end specific values are supported. Note that this variable sets the
129//! minimum level of tests that are **required** to run. Tests for higher architecture
130//! levels will still be run if supported by the runtime hardware.
131//!
132//! ## x86_64
133//!
134//! * `x86-64-v4`: Target Wide's [`arch::x86_64::V4`] architecture.
135//! * `x86-64-v3`: Target Wide's [`arch::x86_64::V3`] architecture.
136//! * `scalar`: Target the scalar architecture.
137
138mod constant;
139pub use constant::{Const, Constant, SupportedLaneCount};
140
141pub(crate) mod reference;
142pub use reference::{cast_f16_to_f32, cast_f32_to_f16};
143
144mod traits;
145pub use traits::{
146 AsSIMD, SIMDAbs, SIMDCast, SIMDDotProduct, SIMDFloat, SIMDMask, SIMDMinMax, SIMDMulAdd,
147 SIMDPartialEq, SIMDPartialOrd, SIMDReinterpret, SIMDSelect, SIMDSigned, SIMDSumTree,
148 SIMDUnsigned, SIMDVector, ZipUnzip,
149};
150
151mod splitjoin;
152pub use splitjoin::{LoHi, SplitJoin};
153
154mod bitmask;
155pub use bitmask::{BitMask, FromInt};
156
157pub(crate) mod doubled;
158
159mod emulated;
160pub use emulated::Emulated;
161
162pub mod lifetime;
163
164/////////////////////////////
165// Architecture Resolution //
166/////////////////////////////
167
168pub mod arch;
169pub use arch::Architecture;
170
171/// The current architecture that is the closest fit for the current compilation target.
172pub const ARCH: arch::Current = arch::current();
173
174///////////////////////
175// Alias Definitions //
176///////////////////////
177
178/// Convenience aliases for aliasing SIMD types.
179///
180/// There are currently four supported flavors (the examples below use `f32x4` as an example
181/// identifier:
182///
183/// 1. `diskann_wide::alias!(f32x4) => type f32x4 = <diskann_wide::arch::Current as diskann_wide::Architecture>::f32x4`:
184/// Type alias directly to the compile-time architecture's type.
185///
186/// 2. `diskann_wide::alias!(f32s = f32x4) => type f32s = <diskann_wide::arch::Current as
187/// diskann_wide::Architecture>::f32x4`: Type alias a SIMD type with a custom name.
188///
189/// 3. `diskann_wide::alias!(f32s = <A>::f32x4) => type f32s = <A as diskann_wide::Architecture>::f32x4`:
190/// Type alias a SIMD type from a specific architecture.
191///
192/// 4. `diskann_wide::alias!(f32s<A> = f32x4) => type f32s<A> = <A as diskann_wide::Architecture>::f32x4`:
193/// Type alias a SIMD type in a generic context. This can be useful to work around errors
194/// like
195/// ```text
196/// use of generic parameter from outer item
197/// ```
198#[macro_export]
199macro_rules! alias {
200 ($var:ident) => {
201 $crate::alias!($var = $var);
202 };
203 ($var:ident = $type:ident) => {
204 $crate::alias!($var = <diskann_wide::arch::Current>::$type);
205 };
206 ($var:ident = <$arch:ty>::$type:ident) => {
207 #[allow(non_camel_case_types)]
208 type $var = <$arch as $crate::Architecture>::$type;
209 };
210 ($var:ident<$arch:ident> = $type:ident) => {
211 #[allow(non_camel_case_types)]
212 type $var<$arch> = <$arch as $crate::Architecture>::$type;
213 };
214}
215
216//////////////
217// Internal //
218//////////////
219
220#[cfg(all(test, any(target_arch = "x86_64", target_arch = "aarch64")))]
221const TEST_MIN_ARCH: &str = "WIDE_TEST_MIN_ARCH";
222
223#[cfg(all(test, any(target_arch = "x86_64", target_arch = "aarch64")))]
224fn get_test_arch() -> Option<String> {
225 match std::env::var(TEST_MIN_ARCH) {
226 Ok(v) => Some(v),
227 Err(e) => match e {
228 std::env::VarError::NotPresent => None,
229 std::env::VarError::NotUnicode(s) => panic!("could not parse test arch: {s:?}"),
230 },
231 }
232}
233
234pub(crate) mod helpers;
235
236#[cfg(test)]
237pub(crate) mod test_utils;
238
239///////////
240// Tests //
241///////////
242
243#[cfg(test)]
244mod tests {
245 use super::*;
246
247 fn generic_architecture<A>(arch: A) -> f32
248 where
249 A: Architecture,
250 {
251 alias!(f32s<A> = f32x4);
252 f32s::<A>::from_array(arch, [1.0, 2.0, 3.0, 4.0]).sum_tree()
253 }
254
255 #[test]
256 fn test_generic() {
257 assert_eq!(generic_architecture(arch::Scalar), 10.0);
258 }
259}