easy-async-opencl3 0.2.4

A declarative, multi-device asynchronous executor for OpenCL based on cl3.
Documentation
# easy-async-cl3


A high-level, async-first Rust wrapper for OpenCL with intelligent multi-device management and declarative task execution.

## Overview


`easy-async-cl3` provides a modern, ergonomic interface to OpenCL that embraces Rust's async/await paradigm. The library automatically manages resources, distributes work across multiple devices, and provides compile-time safety guarantees.

### Key Features


- **Async/Await Integration**: All GPU operations return futures for seamless async workflows
- **Automatic Multi-Device Support**: Intelligent work distribution across multiple GPUs based on device capabilities
- **Type-Safe API**: Compile-time guarantees prevent common errors (e.g., using unbuilt programs)
- **Declarative Task Building**: Fluent builder pattern for constructing GPU tasks
- **Zero-Cost Abstractions**: RAII-based resource management with no runtime overhead
- **Comprehensive OpenCL Support**: Full support for OpenCL 1.1 through 3.0 features including Pipes, SVM, and Images
- **Built-in Profiling**: Optional performance measurement with negligible overhead

## Installation


Add this to your `Cargo.toml`:

```toml
[dependencies]
easy-async-cl3 = "0.1"
tokio = { version = "1", features = ["macros", "rt-multi-thread"] }
```

## Quick Start


```rust
use easy_async_cl3::{
    async_executor::AsyncExecutor,
    cl_types::memory_flags::MemoryFlags,
    error::ClError,
};

#[tokio::main]

async fn main() -> Result<(), ClError> {
    // Initialize executor with best available platform
    let executor = AsyncExecutor::new_best_platform()?;
    
    // Define and build kernel
    let source = r#"
        kernel void vector_add(global float* a, global const float* b) {
            size_t i = get_global_id(0);
            a[i] += b[i];
        }
    "#;
    let program = executor.build_program(source.to_string(), None)?;
    let kernel = executor.create_kernel(&program, "vector_add")?;
    
    // Prepare data
    let size = 1_000_000;
    let mut a = vec![1.0f32; size];
    let b = vec![2.0f32; size];
    
    // Create GPU buffers
    let buf_a = executor.create_buffer(
        &[MemoryFlags::ReadWrite, MemoryFlags::CopyHostPtr],
        size * std::mem::size_of::<f32>(),
        a.as_mut_ptr() as *mut _
    )?;
    let buf_b = executor.create_buffer(
        &[MemoryFlags::ReadOnly, MemoryFlags::CopyHostPtr],
        size * std::mem::size_of::<f32>(),
        b.as_ptr() as *mut _
    )?;
    
    // Execute task
    executor.create_task(kernel)
        .arg_buffer(0, &buf_a)
        .arg_buffer(1, &buf_b)
        .global_work_dims(size, 1, 1)
        .read_buffer(&buf_a, &mut a)
        .run()
        .await?;
    
    assert_eq!(a[0], 3.0);
    Ok(())
}
```

## Advanced Features


### Multi-Device Execution


The library automatically detects and utilizes all available compute devices:

```rust
let executor = AsyncExecutor::new_best_platform_with_options(true)?; // Enable profiling

let report = executor.create_task(kernel)
    .arg_buffer(0, &buffer)
    .global_work_dims(10_000_000, 1, 1)
    .run()
    .await?;

println!("Execution time: {} μs", report.total_kernel_duration_ns() / 1000);
```

### Shared Virtual Memory (OpenCL 2.0+)


Zero-copy memory sharing between CPU and GPU:

```rust
let mut svm_buffer = executor.create_svm_buffer::<f32>(
    &[MemoryFlags::ReadWrite], 
    1024
)?;

executor.create_task(kernel)
    .arg_svm(0, &svm_buffer)
    .global_work_dims(1024, 1, 1)
    .run()
    .await?;

// Direct CPU access without explicit copy
let queue = &executor.get_queues()[0];
let mapped = svm_buffer.map_mut(queue, &vec![MemoryFlags::ReadWrite])?;
println!("Result: {}", mapped[0]);
```

### Image Processing


Native support for OpenCL images with hardware-accelerated filtering:

```rust
use easy_async_cl3::cl_types::cl_image::{
    ClImageFormats, ClImageDesc, image_type::ClImageType
};

let format = ClImageFormats::rgba_unorm_int8();
let desc = ClImageDesc {
    image_type: ClImageType::Image2D,
    image_width: Some(1920),
    image_height: Some(1080),
    ..Default::default()
};

let image = executor.create_image(
    &[MemoryFlags::ReadWrite],
    &format,
    &desc,
    std::ptr::null_mut()
)?;

executor.create_task(kernel)
    .arg_image(0, &image)
    .global_work_dims(1920, 1080, 1)
    .run()
    .await?;
```

### Pipes for Inter-Kernel Communication (OpenCL 2.0+)


Stream data between kernels without CPU involvement:

```rust
use easy_async_cl3::cl_types::cl_pipe::ClPipe;

let pipe = ClPipe::new(
    executor.get_context().as_ref(),
    &[MemoryFlags::ReadWrite],
    4,    // packet size (bytes)
    1024  // max packets
)?;

// Producer writes to pipe
executor.create_task(producer_kernel)
    .arg_pipe(0, &pipe)
    .global_work_dims(1024, 1, 1)
    .run()
    .await?;

// Consumer reads from pipe
executor.create_task(consumer_kernel)
    .arg_pipe(0, &pipe)
    .global_work_dims(1024, 1, 1)
    .run()
    .await?;
```

## Architecture


The library is structured in three main layers:

1. **AsyncExecutor**: High-level interface managing platforms, devices, and command queues
2. **TaskBuilder**: Declarative API for constructing and executing GPU tasks
3. **CL Types**: Type-safe wrappers around OpenCL objects (buffers, images, kernels, etc.)

Work is automatically distributed across available devices based on their compute capabilities and memory capacity.

## Documentation


- [API Documentation]https://docs.rs/easy-async-cl3 - Complete API reference
- [Examples]./tests/api_guide.rs - Comprehensive usage examples
- [OpenCL Specification]https://www.khronos.org/opencl/ - Kernel programming reference

## Requirements


- Rust 1.70 or later
- OpenCL runtime (provided by GPU vendor drivers)
- Tokio async runtime

## Use Cases


- High-performance scientific computing
- Real-time image and video processing
- Machine learning inference and training
- Cryptographic operations
- Financial modeling and simulations
- Parallel data analytics

## Contributing


Contributions are welcome. Please ensure all tests pass and follow the existing code style.

## License


Licensed under either of:

- Apache License, Version 2.0 ([LICENSE-APACHE]LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license ([LICENSE-MIT]LICENSE-MIT or http://opensource.org/licenses/MIT)

at your option.

## Acknowledgments


Built on the [cl3](https://github.com/kenba/cl3) library.