Quicksim

Quick-sim(d) provides several SIMD-accelerated, drop-in replacements for common algorithms. It's designed to make speeding up trivial parts of your code as easy as possible - no manual SIMD programming required. AVX features are automatically detected at runtime and enabled whenever possible. Quick-sim(d) is not intended as a replacement for Portable SIMD. Instead, it aims to completely hide vectorization logic from the API, keeping usage simple and ergonomic. The project is currently in alpha state, and does not yet provide vectorized implementations for all vectorizable functions in the Rust standard library. The implementations in this crate generally achieve significant speedups compared to naive versions. However, there are some trade-offs. For more information, see the Limitations section.

Note: Iterators are usually fast enough for most cases. Use this crate carefully, and make sure to identify critical hot-paths before replacing all of your implementations with the accelerated functions provided by this crate.

Add to your project

quicksim = "0.1"

Example

use quicksim::prelude::*;

fn main() {
    // Note: SIMD implementations only start to take effect with arrays of 32 elements or more.
    let array: Vec<u32> = vec![0, 1, 2, 3, 4];

    assert_eq!(array.max_simd(), array.iter().max().copied());
    assert_eq!(array.min_simd(), array.iter().min().copied());
    assert_eq!(array.find_simd(4), Some(4));
    assert!(array.contains_simd(4));
}

Limitations

The SIMD implementation becomes effective for arrays with more than 32 items. This means that if your array length is below 32 more than 50% of the time, using the *_simd() functions of this crate will generally be slower on average. Additionally, if no optimized implementation exists for a specific architecture, the SIMD version will always be slower than the regular implementation due to the overhead of checking for SIMD support and falling back to the default version.

Attribution

Many algorithms implemented here are based on those from Algorithmica, which helped me to get deeper into SIMD and implement some of the algorithms in this crate. Check this book out if you're interested in SIMD programming yourself, or if you want to learn more about computers, algorithms and low-level optimizations.

Benchmarks

The SIMD implementation becomes effective for arrays with more than 32 items. For smaller arrays (32 items or fewer), the SIMD versions are actually slower than the standard implementations. Please note that the benchmark results may contain some noise, so they shouldn't be treated as 100% accurate. All measurements were taken on a Ryzen 5 5600x CPU with 32 GB of RAM.

Some benchmarks include two cases: -avg and -worst. These represent the average-case and worst-case performance of the algorithm. For example, the find algorithm’s average case assumes the element is located at index size/2, while its worst case assumes the element is the last item in the collection.

&[u32]

🔍 Find - avg

size	simd (ns)	iter (ns)	speedup
16	2.8042	2.367	0.844091
32	2.1073	4.7695	2.26332
64	2.5788	8.4581	3.27986
150	3.5278	22.757	6.45076
530	9.2504	67.652	7.31341
1028	16.543	126.14	7.62498
5010	85.771	595.31	6.94069
8000	132.41	948.74	7.16517

🔍 Find - worst

size	simd (ns)	iter (ns)	speedup
16	5.1783	4.1654	0.804395
32	2.3703	8.2243	3.46973
64	3.0473	15.68	5.14554
150	11.059	40.107	3.62664
530	20.414	129.42	6.33977
1028	36.596	245.98	6.7215
5010	153.56	1178.3	7.67322
8000	243.77	1880.8	7.71547

Count

size	simd (ns)	iter (ns)	speedup
16	2.3437	2.6018	1.11013
32	2.8135	4.6383	1.64859
64	3.0854	8.9749	2.90883
150	7.3019	20.38	2.79105
530	12.712	73.599	5.78973
1028	18.706	137.6	7.35593
5010	83.04	658.32	7.92775
8000	126.9	1061.6	8.36564

Min

size	simd (ns)	iter (ns)	speedup
16	6.8805	4.4347	0.644532
32	2.5866	7.578	2.92971
64	3.2832	15.214	4.63389
150	10.318	35.093	3.40114
530	13.826	116.56	8.43049
1028	18.393	270.9	14.7284
5010	80.394	1195.6	14.8718
8000	126.19	1885.2	14.9394

Max

size	simd (ns)	iter (ns)	speedup
16	5.3713	4.9116	0.914416
32	2.5786	9.9717	3.8671
64	3.0447	24.564	8.06779
150	12.016	65.061	5.41453
530	14.317	244.71	17.0923
1028	18.343	479.27	26.1282
5010	80.861	2337	28.9014
8000	126.19	3734	29.5903

Contains - avg

size	simd (ns)	iter (ns)	speedup
16	2.5754	2.5422	0.987109
32	1.8953	4.401	2.32206
64	2.3444	8.6579	3.69301
150	3.4732	18.912	5.44512
530	8.1109	68.4	8.4331
1028	17.607	127.71	7.25337
5010	88.245	590.65	6.6933
8000	129.19	941.28	7.28601

Contains - worst

size	simd (ns)	iter (ns)	speedup
16	4.8998	4.9446	1.00914
32	2.1081	8.6596	4.10777
64	2.5934	20.38	7.85841
150	12.114	40.873	3.37403
530	21.691	130.17	6.00111
1028	31.116	249.47	8.01742
5010	154.94	1192.1	7.69395
8000	245.39	1874.9	7.64049

&[f32]

Min

size	simd (ns)	iter (ns)	speedup
16	3.46	3.2625	0.942919
32	2.3321	5.1986	2.22915
64	2.8477	10.93	3.83819
150	6.8676	27.82	4.05091
530	14.967	118.27	7.90205
1028	21.059	234.49	11.1349
5010	79.785	1177.4	14.7572
8000	128.59	1859.9	14.4638

Max

size	simd (ns)	iter (ns)	speedup
16	3.6937	3.2061	0.867991
32	2.3294	5.0832	2.18219
64	3.299	10.929	3.31282
150	6.7782	28.02	4.13384
530	14.808	118.77	8.02066
1028	25.172	234.02	9.29684
5010	120.41	1157.3	9.61133
8000	190.97	1861.4	9.74708

quicksim 0.1.1