Crate simdly

Expand description

§simdly

🚀 A high-performance Rust library that leverages SIMD (Single Instruction, Multiple Data) instructions for fast vectorized computations. This library provides efficient implementations of mathematical operations using modern CPU features.

§Features

SIMD Optimized: Leverages AVX2 (256-bit) and NEON (128-bit) instructions for vector operations
Memory Efficient: Supports both aligned and unaligned memory access patterns
Generic Traits: Provides consistent interfaces across different SIMD implementations
Safe Abstractions: Wraps unsafe SIMD operations in safe, ergonomic APIs
Cross-Platform: Supports both x86/x86_64 and ARM/AArch64 architectures
Performance: Optimized for high-throughput numerical computations

§Architecture Support

Currently supports:

x86/x86_64 with AVX2 (256-bit vectors)
ARM/AArch64 with NEON (128-bit vectors)

Future support planned for:

SSE (128-bit vectors for older x86 processors)

§Usage

The library provides traits for SIMD operations that automatically detect and use the best available instruction set on the target CPU.

§High-Level SIMD Usage

use simdly::simd::SimdMath;

// Vectorized mathematical operations - works on both AVX2 and NEON
let angles = vec![0.0, std::f32::consts::PI / 4.0, std::f32::consts::PI / 2.0];
let cosines = angles.cos(); // SIMD accelerated

// 2D distance calculations
let x_coords = vec![3.0, 5.0, 8.0, 7.0];
let y_coords = vec![4.0, 12.0, 15.0, 24.0];
let distances = x_coords.hypot(y_coords); // [5.0, 13.0, 17.0, 25.0]

// Power calculations
let bases = vec![2.0, 3.0, 4.0, 5.0];
let exponents = vec![2.0, 2.0, 2.0, 2.0];
let powers = bases.pow(exponents); // [4.0, 9.0, 16.0, 25.0]

§Parallel SIMD Operations

For maximum performance on large datasets, use the parallel SIMD methods that automatically select between single-threaded and multi-threaded implementations based on array size:

use simdly::simd::SimdMath;

// Large dataset - automatically uses parallel SIMD
let large_data: Vec<f32> = (0..1_000_000).map(|i| i as f32 * 0.001).collect();
let results = large_data.par_cos(); // Multi-threaded SIMD

// Small dataset - automatically uses regular SIMD  
let small_data = vec![1.0, 2.0, 3.0, 4.0];
let results = small_data.par_sin(); // Single-threaded SIMD

// Works with all math functions
let sqrt_results = large_data.par_sqrt();
let exp_results = large_data.par_exp();
let abs_results = large_data.par_abs();

§Performance Considerations

Memory Alignment: Use aligned memory when possible for optimal performance
Batch Processing: Process data in chunks that match SIMD vector sizes
CPU Features: Enable appropriate target features during compilation

Modules§

simd: SIMD operations and platform-specific implementations. SIMD (Single Instruction, Multiple Data) operations module.

Constants§

PARALLEL_SIMD_THRESHOLD: Minimum array size where parallel SIMD operations become beneficial.
SIMD_THRESHOLD: Threshold below which scalar operations outperform SIMD.

Crate simdlyCopy item path