quicksim 0.1.1

Drop-in SIMD-accelerated replacements for common Rust algorithms, with automatic runtime AVX detection. Designed for simplicity and performance without requiring manual SIMD programming.
Documentation
# Quicksim
<p align="center">
    <a href="https://crates.io/crates/quicksim"><img alt="Static Badge" src="https://img.shields.io/crates/v/quicksim"></a>
    <a href="https://docs.rs/quicksim/0.1.0/quicksim/"><img alt="Static Badge" src="https://img.shields.io/docsrs/quicksim"></a>
</p>
<br>

Quick-sim(d) provides several <b>SIMD-accelerated, drop-in replacements</b> for common algorithms.<br>
It's designed to make speeding up trivial parts of your code as easy as possible - <b>no manual SIMD programming required</b>.<br>
AVX features are <b>automatically detected at runtime</b> and enabled whenever possible. <br>
<br>
Quick-sim(d) is not intended as a replacement for [Portable SIMD](https://github.com/rust-lang/portable-simd/). Instead, it aims to <b>completely hide vectorization logic</b> from the API, keeping usage simple and ergonomic. <br>
The project is currently in <b>alpha state</b>, and does <b>not yet provide vectorized implementations</b> for all vectorizable functions in the Rust standard library. <br>
<br>
The implementations in this crate generally achieve significant speedups compared to naive versions. However, there are some trade-offs. For more information, see the [Limitations](#Limitations) section.
<br><br>

```
Note: Iterators are usually fast enough for most cases. Use this crate carefully, and make sure to identify critical hot-paths before replacing all of your implementations with the accelerated functions provided by this crate.
```

# Add to your project

```
quicksim = "0.1"
```

# Example
```rust
use quicksim::prelude::*;

fn main() {
    // Note: SIMD implementations only start to take effect with arrays of 32 elements or more.
    let array: Vec<u32> = vec![0, 1, 2, 3, 4];

    assert_eq!(array.max_simd(), array.iter().max().copied());
    assert_eq!(array.min_simd(), array.iter().min().copied());
    assert_eq!(array.find_simd(4), Some(4));
    assert!(array.contains_simd(4));
}
```


# Limitations
The SIMD implementation becomes effective for arrays with more than 32 items. This means that if your array length is below 32 more than 50% of the time, using the `*_simd()` functions of this crate will generally be slower on average.
Additionally, if no optimized implementation exists for a specific architecture, the SIMD version will always be slower than the regular implementation due to the overhead of checking for SIMD support and falling back to the default version.

# Attribution
Many algorithms implemented here are based on those from [Algorithmica](https://en.algorithmica.org/hpc/), which helped me to get deeper into SIMD and implement some of the algorithms in this crate.
Check this book out if you're interested in SIMD programming yourself, or if you want to learn more about computers, algorithms and low-level optimizations.

# Benchmarks

The SIMD implementation becomes effective for arrays with <b>more than 32 items</b>. For smaller arrays (32 items or fewer), the SIMD versions are actually <b>slower</b> than the standard implementations. <br>
Please note that the benchmark results <b>may contain some noise</b>, so they shouldn't be treated as 100% accurate.<br>
All measurements were taken on a <b>Ryzen 5 5600x</b> CPU with <b>32 GB of RAM</b>. <br><br>

Some benchmarks include two cases: `-avg` and `-worst`. These represent the average-case and worst-case performance of the algorithm.
For example, the find algorithm’s average case assumes the element is located at index `size/2`, while its worst case assumes the element is the last item in the collection.

## &[u32]

<table>
<tr><td>

### 🔍 Find - avg
|   size |  simd (ns) | iter (ns) |   speedup |
|--------|----------|----------|-----------|
|     16 |   2.8042 |   <b>2.367</b>  |  0.844091 |
|     32 |   <b>2.1073</b> |   4.7695 |  2.26332  |
|     64 |   <b>2.5788</b> |   8.4581 |  3.27986  |
|    150 |   <b>3.5278</b> |  22.757  |  6.45076  |
|    530 |   <b>9.2504</b> |  67.652  |  7.31341  |
|   1028 |  <b>16.543</b>  | 126.14   |  7.62498  |
|   5010 |  <b>85.771</b>  | 595.31   |  6.94069  |
|   8000 | <b>132.41</b>   | 948.74   |  7.16517  |

</td><td>

### 🔍 Find - worst
|   size |  simd (ns) | iter (ns) |   speedup |
|--------|----------|-----------|-----------|
|     16 |   5.1783 |    <b>4.1654</b> |  0.804395 |
|     32 |   <b>2.3703</b> |    8.2243 |  3.46973  |
|     64 |   <b>3.0473</b> |   15.68   |  5.14554  |
|    150 |  <b>11.059</b>  |   40.107  |  3.62664  |
|    530 |  <b>20.414</b>  |  129.42   |  6.33977  |
|   1028 |  <b>36.596</b>  |  245.98   |  6.7215   |
|   5010 | <b>153.56</b>   | 1178.3    |  7.67322  |
|   8000 | <b>243.77</b>   | 1880.8    |  7.71547  |

</td></tr> </table>

<table>
<tr><td>

### Count
|   size |  simd (ns) | iter (ns) |   speedup |
|-------|----------|-----------|-----------|
|     16 |   <b>2.3437</b> |    2.6018 |   1.11013 |
|     32 |   <b>2.8135</b> |    4.6383 |   1.64859 |
|     64 |   <b>3.0854</b> |    8.9749 |   2.90883 |
|    150 |   <b>7.3019</b> |   20.38   |   2.79105 |
|    530 |  <b>12.712</b>  |   73.599  |   5.78973 |
|   1028 |  <b>18.706</b>  |  137.6    |   7.35593 |
|   5010 |  <b>83.04</b>   |  658.32   |   7.92775 |
|   8000 | <b>126.9</b>    | 1061.6    |   8.36564 |
</td></tr> </table>

<table>
<tr><td>

### Min
|   size |  simd (ns) | iter (ns) |   speedup |
|--------|----------|-----------|----------|
|     16 |   6.8805 |    <b>4.4347</b> |  0.644532 |
|     32 |   <b>2.5866</b> |    7.578  |  2.92971  |
|     64 |   <b>3.2832</b> |   15.214  |  4.63389  |
|    150 |  <b>10.318</b>  |   35.093  |  3.40114  |
|    530 |  <b>13.826</b>  |  116.56   |  8.43049  |
|   1028 |  <b>18.393</b>  |  270.9    | 14.7284   |
|   5010 |  <b>80.394</b>  | 1195.6    | 14.8718   |
|   8000 | <b>126.19</b>   | 1885.2    | 14.9394   |

</td><td>

### Max
|   size |  simd (ns) | iter (ns) |   speedup |
|--------|----------|-----------|-----------|
|     16 |   5.3713 |    <b>4.9116</b> |  0.914416 |
|     32 |   <b>2.5786</b> |    9.9717 |  3.8671   |
|     64 |   <b>3.0447</b> |   24.564  |  8.06779  |
|    150 |  <b>12.016</b>  |   65.061  |  5.41453  |
|    530 |  <b>14.317</b>  |  244.71   | 17.0923   |
|   1028 |  <b>18.343</b>  |  479.27   | 26.1282   |
|   5010 |  <b>80.861</b>  | 2337      | 28.9014   |
|   8000 | <b>126.19</b>   | 3734      | 29.5903   |

</td></tr> </table>

<table>
<tr><td>

### Contains - avg
|   size |  simd (ns) | iter (ns) |   speedup |
|--------|----------|----------|-----------|
|     16 |   2.5754 |   <b>2.5422</b> |  0.987109 |
|     32 |   <b>1.8953</b> |   4.401  |  2.32206  |
|     64 |   <b>2.3444</b> |   8.6579 |  3.69301  |
|    150 |   <b>3.4732</b> |  18.912  |  5.44512  |
|    530 |   <b>8.1109</b> |  68.4    |  8.4331   |
|   1028 |  <b>17.607</b>  | 127.71   |  7.25337  |
|   5010 |  <b>88.245</b>  | 590.65   |  6.6933   |
|   8000 | <b>129.19</b>   | 941.28   |  7.28601  |

</td><td>

### Contains - worst
|   size |  simd (ns) | iter (ns) |   speedup |
|--------|----------|-----------|-----------|
|     16 |   <b>4.8998</b> |    4.9446 |   1.00914 |
|     32 |   <b>2.1081</b> |    8.6596 |   4.10777 |
|     64 |   <b>2.5934</b> |   20.38   |   7.85841 |
|    150 |  <b>12.114</b>  |   40.873  |   3.37403 |
|    530 |  <b>21.691</b>  |  130.17   |   6.00111 |
|   1028 |  <b>31.116</b>  |  249.47   |   8.01742 |
|   5010 | <b>154.94</b>   | 1192.1    |   7.69395 |
|   8000 | <b>245.39</b>   | 1874.9    |   7.64049 |

</td></tr> </table>

## &[f32]

<table>
<tr><td>

### Min
|   size |  simd (ns) | iter (ns) |   speedup |
|--------|----------|-----------|-----------|
|     16 |   3.46   |    <b>3.2625</b> |  0.942919 |
|     32 |   <b>2.3321</b> |    5.1986 |  2.22915  |
|     64 |   <b>2.8477</b> |   10.93   |  3.83819  |
|    150 |   <b>6.8676</b> |   27.82   |  4.05091  |
|    530 |  <b>14.967</b>  |  118.27   |  7.90205  |
|   1028 |  <b>21.059</b>  |  234.49   | 11.1349   |
|   5010 |  <b>79.785</b>  | 1177.4    | 14.7572   |
|   8000 | <b>128.59</b>   | 1859.9    | 14.4638   |

</td><td>

### Max
|   size |  simd (ns) | iter (ns) |   speedup |
|--------|----------|-----------|-----------|
|     16 |   3.6937 |    <b>3.2061</b> |  0.867991 |
|     32 |   <b>2.3294</b> |    5.0832 |  2.18219  |
|     64 |   <b>3.299</b>  |   10.929  |  3.31282  |
|    150 |   <b>6.7782</b> |   28.02   |  4.13384  |
|    530 |  <b>14.808</b>  |  118.77   |  8.02066  |
|   1028 |  <b>25.172</b>  |  234.02   |  9.29684  |
|   5010 | <b>120.41</b>   | 1157.3    |  9.61133  |
|   8000 | <b>190.97</b>   | 1861.4    |  9.74708  |

</td></tr> </table>