avila-parallel 0.2.0

Zero-dependency parallel computation library with advanced operations: sorting, zipping, chunking
Documentation

avila-parallel

Crates.io Documentation License Rust CI

A zero-dependency parallel computation library for Rust with true parallel execution.

๐Ÿ“š Documentation

โœจ Features

  • ๐Ÿš€ True Parallel Execution: Real multi-threaded processing using std::thread::scope
  • ๐Ÿ“ฆ Zero Dependencies: Only uses Rust standard library (std::thread, std::sync)
  • ๐Ÿ”’ Thread Safe: All operations use proper synchronization primitives
  • ๐Ÿ“Š Order Preservation: Results maintain original element order
  • โšก Smart Optimization: Automatically falls back to sequential for small datasets
  • ๐ŸŽฏ Rich API: Familiar iterator-style methods

๐Ÿ“‹ Quick Start

Add to your Cargo.toml:

[dependencies]
avila-parallel = "0.1.0"

Basic Usage

use avila_parallel::prelude::*;

fn main() {
    // Parallel iteration
    let data = vec![1, 2, 3, 4, 5];
    let sum: i32 = data.par_iter()
        .map(|x| x * 2)
        .sum();
    println!("Sum: {}", sum); // Sum: 30

    // High-performance par_vec API
    let results: Vec<i32> = data.par_vec()
        .map(|&x| x * x)
        .collect();
    println!("{:?}", results); // [1, 4, 9, 16, 25]
}

๐ŸŽฏ Available Operations

Transformation

  • map - Transform each element
  • filter - Keep elements matching predicate
  • cloned - Clone elements (for reference iterators)

Aggregation

  • sum - Sum all elements
  • reduce - Reduce with custom operation
  • fold - Fold with identity and operation
  • count - Count elements matching predicate

Search

  • find_any - Find any element matching predicate
  • all - Check if all elements match
  • any - Check if any element matches

Other

  • partition - Split into two vectors based on predicate
  • for_each - Execute function on each element
  • collect - Collect results into a collection

๐Ÿ“Š Performance

The library automatically:

  • Detects CPU core count
  • Distributes work efficiently across threads
  • Falls back to sequential execution for small datasets (< 512 elements/chunk)
  • Maintains result order

Benchmark Results (10M elements)

Operation Sequential Parallel Speedup
Simple Sum 7.1ms 14.2ms 0.50x*
Complex Computation 229.5ms 236.1ms 0.97x
Filter 67.5ms 92.4ms 0.73x

*Simple operations have thread overhead. Use for CPU-bound work (>100ยตs per element).

๐Ÿ”ง Advanced Usage

Using Executor Functions Directly

use avila_parallel::executor::*;

let data = vec![1, 2, 3, 4, 5];

// Parallel map
let results = parallel_map(&data, |x| x * 2);

// Parallel filter
let evens = parallel_filter(&data, |x| *x % 2 == 0);

// Parallel reduce
let sum = parallel_reduce(&data, |a, b| a + b);

// Parallel partition
let (evens, odds) = parallel_partition(&data, |x| *x % 2 == 0);

// Find first matching
let found = parallel_find(&data, |x| *x > 3);

// Count matching
let count = parallel_count(&data, |x| *x % 2 == 0);

Mutable Iteration

use avila_parallel::prelude::*;

let mut data = vec![1, 2, 3, 4, 5];
data.par_iter_mut()
    .for_each(|x| *x *= 2);
println!("{:?}", data); // [2, 4, 6, 8, 10]

๐Ÿ—๏ธ Architecture

Thread Management

  • Uses std::thread::scope for lifetime-safe thread spawning
  • Automatic CPU detection via std::thread::available_parallelism()
  • Chunk-based work distribution with adaptive sizing

Synchronization

  • Arc<Mutex<>> for safe result collection
  • No unsafe code in public API
  • Order preservation through indexed chunks

Performance Tuning

Default Configuration:

const MIN_CHUNK_SIZE: usize = 1024;  // Optimized based on benchmarks
const MAX_CHUNKS_PER_THREAD: usize = 8;

Environment Variables:

# Customize minimum chunk size (useful for tuning specific workloads)
export AVILA_MIN_CHUNK_SIZE=2048

# Run your program
cargo run --release

When to Adjust:

  • Increase (2048+): Very expensive operations (>1ms per element)
  • Decrease (512): Light operations but large datasets
  • Keep default (1024): Most use cases

๐Ÿงช Examples

CPU-Intensive Computation

use avila_parallel::prelude::*;

let data: Vec<i32> = (0..10_000_000).collect();

// Perform expensive computation in parallel
let results = data.par_vec()
    .map(|&x| {
        // Simulate expensive operation
        let mut result = x;
        for _ in 0..100 {
            result = (result * 13 + 7) % 1_000_000;
        }
        result
    })
    .collect();

Data Analysis

use avila_parallel::prelude::*;

let data: Vec<f64> = vec![1.0, 2.0, 3.0, 4.0, 5.0];

// Calculate statistics in parallel
let sum: f64 = data.par_iter().sum();
let count = data.len();
let mean = sum / count as f64;

let variance = data.par_vec()
    .map(|&x| (x - mean).powi(2))
    .into_iter()
    .sum::<f64>() / count as f64;

๐Ÿ” When to Use

โœ… Good Use Cases

  • CPU-bound operations (image processing, calculations, etc.)
  • Large datasets (>10,000 elements)
  • Independent computations per element
  • Expensive operations (>100ยตs per element)

โŒ Not Ideal For

  • I/O-bound operations (use async instead)
  • Very small datasets (<1,000 elements)
  • Simple operations (<10ยตs per element)
  • Operations requiring shared mutable state

๐Ÿ› ๏ธ Building from Source

git clone https://github.com/your-org/avila-parallel
cd avila-parallel
cargo build --release
cargo test

๐Ÿ“ License

MIT License - see LICENSE file for details

๐Ÿค Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

๐Ÿ“š Documentation

Full API documentation is available at docs.rs/avila-parallel

๐Ÿ”— Related Projects

  • Rayon - Full-featured data parallelism library
  • crossbeam - Concurrent programming tools

โญ Star History

If you find this project useful, consider giving it a star!