Expand description
A minimal OpenCL, WGPU, CUDA and host CPU array manipulation engine / framework written in Rust.
This crate provides the tools for executing custom array operations with the CPU, as well as with CUDA, WGPU and OpenCL devices.
This guide demonstrates how operations can be implemented for the compute devices: implement_operations.md
or to see it at a larger scale, look here custos-math or here sliced.
§Examples
custos only implements four Buffer
operations. These would be the write
, read
, copy_slice
and clear
operations,
however, there are also unary (device only) operations.
On the other hand, custos-math implements a lot more operations, including Matrix operations for a custom Matrix struct.
Implement an operation for CPU
:
If you want to implement your own operations for all compute devices, consider looking here: implement_operations.md
use std::ops::Mul;
use custos::prelude::*;
pub trait MulBuf<T, S: Shape = (), D: Device = Self>: Sized + Device {
fn mul(&self, lhs: &Buffer<T, D, S>, rhs: &Buffer<T, D, S>) -> Buffer<T, Self, S>;
}
impl<T, S, D> MulBuf<T, S, D> for CPU
where
T: Mul<Output = T> + Copy,
S: Shape,
D: MainMemory,
{
fn mul(&self, lhs: &Buffer<T, D, S>, rhs: &Buffer<T, D, S>) -> Buffer<T, CPU, S> {
let mut out = self.retrieve(lhs.len(), (lhs, rhs));
for ((lhs, rhs), out) in lhs.iter().zip(&*rhs).zip(&mut out) {
*out = *lhs * *rhs;
}
out
}
}
A lot more usage examples can be found in the tests and examples folder.
Re-exports§
pub use devices::cpu::CPU;
pub use devices::opencl::OpenCL;
pub use devices::stack::Stack;
pub use devices::*;
pub use autograd::*;
Modules§
- autograd
- Provides tools for automatic differentiation.
- devices
- This module defines all available compute devices
- exec_
on_ cpu - This module includes macros and functions for executing operations on the CPU.
They move the supplied (CUDA, OpenCL, WGPU, …)
Buffer
s to the CPU and execute the operation on the CPU. Most of the time, you should actually implement the operation for the device natively, as it is typically faster. - flag
- Describes the type of allocation.
- number
- Contains traits for generic math.
- prelude
- Typical imports for using custos.
- static_
api - Exposes an API for static devices.
The usage is similiar to pytorch as
Buffer
s are moved to the gpu or another compute device via.to_gpu
,.to_cl
, …
Macros§
- buf
- A macro that creates a
CPU
Buffer
using the staticCPU
device. - cl_
cpu_ exec_ unified - If the current device supports unified memory, data is not deep-copied. This is way faster than cpu_exec, as new memory is not allocated.
- cl_
cpu_ exec_ unified_ mut - If the current device supports unified memory, data is not deep-copied. This is way faster than cpu_exec_mut, as new memory is not allocated.
- cpu_
exec - Moves
n
Buffer
s stored on another device ton
CPU
Buffer
s and executes an operation on theCPU
. - cpu_
exec_ mut - Moves
n
Buffer
s stored on another device ton
CPU
Buffer
s and executes an operation on theCPU
. The results are written back to the originalBuffer
s. - to_cpu
- Shadows all supplied
Buffer
s toCPU
`Buffer’s. - to_
cpu_ mut - Moves
Buffer
s toCPU
Buffer
s. The name of the newCPU
Buffer
s are provided by the user. The newBuffer
s are declared as mutable. - to_
raw_ host - Takes
Buffer
s having a host pointer and wraps them intoCPU
Buffer
’s. The oldBuffer
s are shadowed. - to_
raw_ host_ mut - Takes
Buffer
s having a host pointer and wraps them into mutableCPU
Buffer
’s. New names for theCPU
Buffer
s are provided by the user.
Structs§
- Buffer
- The underlying non-growable array structure of
custos
. ABuffer
may be encapsulated in other data structures. By default, theBuffer
is a f32 CPU Buffer with no statically known shape. - Cache
Trace - A
CacheTrace
is a list of nodes that shows whichBuffer
s could use the same cache. - Count
- used to reset the cache count
- Count
Into Iter - The iterator used for setting the cache count.
- Dim1
- A 1D shape.
- Dim2
- A 2D shape.
- Dim3
- A 3D shape.
- Global
Count - Uses the global count as the next index for a
Node
. - Graph
- A graph of
Node
s. It is typically built up during the forward process. (callingdevice.retrieve(.., (lhs, rhs))
) - Node
- A node in the
Graph
. - Node
Count - Uses the amount of nodes in the graph as the next index for a
Node
. - Num
- Makes it possible to use a single number in a
Buffer
. - Resolve
- Resolves to either a mathematical expression as string or a computed value.
This is used to create generic kernels / operations over
OpenCL
,CUDA
andCPU
.
Enums§
- Device
Error - ‘generic’ device errors that can occur on any device.
Constants§
- UNIFIED_
CL_ MEM - If the OpenCL device selected by the environment variable
CUSTOS_CL_DEVICE_IDX
supports unified memory, then this will betrue
. In your case, this isfalse
.
Traits§
- AddGraph
- Trait for adding a node to a graph.
- Alloc
- This trait is for allocating memory on the implemented device.
- Apply
Function - Applies a function to a buffer and returns a new buffer.
- AsRange
Arg - Converts ranges into a start and end index.
- Clear
Buf - Trait for implementing the clear() operation for the compute devices.
- Clone
Buf - This trait is used to clone a buffer based on a specific device type.
- Combiner
- A trait that allows combining math operations. (Similiar to an Iterator)
- Common
Ptrs - custos v5 compatibility for “common pointers”. The commmon pointers contain the following pointers: host, opencl and cuda
- Copy
Slice - Trait for copying a slice of a buffer, to implement the slice() operation.
- Device
- This trait is the base trait for every device.
- Deviceless
Able - All type of devices that can create
Buffer
s - Error
Kind - A trait for downcasting errors.
- Eval
- Evaluates a combined (via
Combiner
) math operations chain to a value. - Graph
Return - Returns a mutable reference to the graph.
- IsConst
Dim - If the
Shape
is provides a fixed size, than this trait should be implemented. Forgot how this is useful. - IsShape
Indep - If the
Shape
does not matter for a specific deviceBuffer
, than this trait should be implemented. - Main
Memory - Devices that can access the main memory / RAM of the host.
- MayDim2
- The shape may be 2D or ().
- MayTape
Return - If the
autograd
feature is enabled, then this will be implemented for all types that implementTapeReturn
. On the other hand, if theautograd
feature is disabled, noTape
will be returneable. - MayToCL
Source - If the
no-std
feature is disabled, this trait is implemented for all types that implementToCLSource
. In this case,no-std
is disabled. - NodeIdx
- Returns the next index for a
Node
. - PtrType
- This trait is implemented for every pointer type.
- Read
- Trait for reading buffers. Syncronizationpoint for CUDA.
- Shallow
Copy - Used to shallow-copy a pointer. Use is discouraged.
- Shape
- Determines the shape of a
Buffer
.Shape
is used to get the size and ND-Array for a stack allocatedBuffer
. - ToCL
Source - Evaluates a combined (via
Combiner
) math operations chain to a valid OpenCL C (and possibly CUDA) source string. - ToDim
- Converts a pointer to a different
Shape
. - ToMarker
- Converts a &’static str to a
Resolve
. - ToVal
- Converts a value to a
Resolve
. - Unary
Element Wise MayGrad - Applies the forward function of a new/cached
Buffer
and returns it. If theautograd
feature is enabled, the gradient function is also calculated via the grad function. - Unary
Grad - Writes the unary gradient (with chainrule) to the lhs_grad buffer.
- With
Shape - Trait for creating
Buffer
s with aShape
. TheShape
is inferred from the array. - Write
Buf - Trait for writing data to buffers.
Functions§
- range
range
resets the cache count in every iteration. The cache count is used to retrieve the same allocation in each iteration. Not addingrange
results in allocating new memory in each iteration, which is only freed when the device is dropped.
To disable this caching behaviour, therealloc
feature can be enabled.
Type Aliases§
- Error
- A type alias for Box<dyn std::error::Error + Send + Sync>
- Result
- A type alias for
Result<T, Error>
.
Attribute Macros§
- impl_
stack - Expands a
CPU
implementation to aStack
andCPU
implementation.