pub struct ComputeClient<R: Runtime> { /* private fields */ }Expand description
The ComputeClient is the entry point to require tasks from the ComputeServer.
It should be obtained for a specific device via the Compute struct.
Implementations§
Source§impl<R: Runtime> ComputeClient<R>
impl<R: Runtime> ComputeClient<R>
Sourcepub fn info(&self) -> &<R::Server as ComputeServer>::Info
pub fn info(&self) -> &<R::Server as ComputeServer>::Info
Get the info of the current backend.
Sourcepub fn init<D: Device>(device: &D, server: R::Server) -> Self
pub fn init<D: Device>(device: &D, server: R::Server) -> Self
Create a new client with a new server.
Sourcepub unsafe fn set_stream(&mut self, stream_id: StreamId)
pub unsafe fn set_stream(&mut self, stream_id: StreamId)
Set the stream in which the current client is operating on.
§Safety
This is highly unsafe and should probably only be used by the CubeCL/Burn projects for now.
Sourcepub fn read_async(
&self,
handles: Vec<Handle>,
) -> impl Future<Output = Result<Vec<Bytes>, ServerError>> + Send
pub fn read_async( &self, handles: Vec<Handle>, ) -> impl Future<Output = Result<Vec<Bytes>, ServerError>> + Send
Given bindings, returns owned resources as bytes.
Sourcepub fn read_one(&self, handle: Handle) -> Result<Bytes, ServerError>
pub fn read_one(&self, handle: Handle) -> Result<Bytes, ServerError>
Given a binding, returns owned resource as bytes.
Sourcepub fn read_one_unchecked(&self, handle: Handle) -> Bytes
pub fn read_one_unchecked(&self, handle: Handle) -> Bytes
Given a binding, returns owned resource as bytes.
§Remarks
Panics if the read operation fails. Useful for tests.
Sourcepub fn read_tensor_async(
&self,
descriptors: Vec<CopyDescriptor>,
) -> impl Future<Output = Result<Vec<Bytes>, ServerError>> + Send
pub fn read_tensor_async( &self, descriptors: Vec<CopyDescriptor>, ) -> impl Future<Output = Result<Vec<Bytes>, ServerError>> + Send
Given bindings, returns owned resources as bytes.
Sourcepub fn read_tensor(&self, descriptors: Vec<CopyDescriptor>) -> Vec<Bytes>
pub fn read_tensor(&self, descriptors: Vec<CopyDescriptor>) -> Vec<Bytes>
Given bindings, returns owned resources as bytes.
§Remarks
Panics if the read operation fails.
The tensor must be in the same layout as created by the runtime, or more strict. Contiguous tensors are always fine, strided tensors are only ok if the stride is similar to the one created by the runtime (i.e. padded on only the last dimension). A way to check stride compatibility on the runtime will be added in the future.
Also see ComputeClient::create_tensor.
Sourcepub fn read_one_tensor_async(
&self,
descriptor: CopyDescriptor,
) -> impl Future<Output = Result<Bytes, ServerError>> + Send
pub fn read_one_tensor_async( &self, descriptor: CopyDescriptor, ) -> impl Future<Output = Result<Bytes, ServerError>> + Send
Given a binding, returns owned resource as bytes.
See ComputeClient::read_tensor
Sourcepub fn read_one_unchecked_tensor(&self, descriptor: CopyDescriptor) -> Bytes
pub fn read_one_unchecked_tensor(&self, descriptor: CopyDescriptor) -> Bytes
Given a binding, returns owned resource as bytes.
§Remarks
Panics if the read operation fails.
See ComputeClient::read_tensor
Sourcepub fn get_resource(
&self,
handle: Handle,
) -> Result<ManagedResource<<<R::Server as ComputeServer>::Storage as ComputeStorage>::Resource>, ServerError>
pub fn get_resource( &self, handle: Handle, ) -> Result<ManagedResource<<<R::Server as ComputeServer>::Storage as ComputeStorage>::Resource>, ServerError>
Given a resource handle, returns the storage resource.
Sourcepub fn create_from_slice(&self, slice: &[u8]) -> Handle
pub fn create_from_slice(&self, slice: &[u8]) -> Handle
Returns a resource handle containing the given data.
§Notes
Prefer using the more efficient Self::create function.
Sourcepub fn exclusive<Re: Send + 'static, F: FnOnce() -> Re + Send + 'static>(
&self,
task: F,
) -> Result<Re, ServerError>
pub fn exclusive<Re: Send + 'static, F: FnOnce() -> Re + Send + 'static>( &self, task: F, ) -> Result<Re, ServerError>
Executes a task that has exclusive access to the current device.
Sourcepub fn scoped<'a, Re: Send, F: FnOnce() -> Re + Send + 'a>(
&'a self,
task: F,
) -> Result<Re, ServerError>
pub fn scoped<'a, Re: Send, F: FnOnce() -> Re + Send + 'a>( &'a self, task: F, ) -> Result<Re, ServerError>
todo: docs
Sourcepub fn memory_persistent_allocation<'a, Re: Send, Input: Send, F: FnOnce(Input) -> Re + Send + 'a>(
&'a self,
input: Input,
task: F,
) -> Result<Re, ServerError>
pub fn memory_persistent_allocation<'a, Re: Send, Input: Send, F: FnOnce(Input) -> Re + Send + 'a>( &'a self, input: Input, task: F, ) -> Result<Re, ServerError>
dodo: Docs
Sourcepub fn create(&self, data: Bytes) -> Handle
pub fn create(&self, data: Bytes) -> Handle
Returns a resource handle containing the given Bytes.
Sourcepub fn create_tensor_from_slice(
&self,
slice: &[u8],
shape: Shape,
elem_size: usize,
) -> MemoryLayout
pub fn create_tensor_from_slice( &self, slice: &[u8], shape: Shape, elem_size: usize, ) -> MemoryLayout
Given a resource and shape, stores it and returns the tensor handle and strides. This may or may not return contiguous strides. The layout is up to the runtime, and care should be taken when indexing.
Currently the tensor may either be contiguous (most runtimes), or “pitched”, to use the CUDA terminology. This means the last (contiguous) dimension is padded to fit a certain alignment, and the strides are adjusted accordingly. This can make memory accesses significantly faster since all rows are aligned to at least 16 bytes (the maximum load width), meaning the GPU can load as much data as possible in a single instruction. It may be aligned even more to also take cache lines into account.
However, the stride must be taken into account when indexing and reading the tensor
(also see ComputeClient::read_tensor).
§Notes
Prefer using Self::create_tensor for better performance.
Sourcepub fn create_tensor(
&self,
bytes: Bytes,
shape: Shape,
elem_size: usize,
) -> MemoryLayout
pub fn create_tensor( &self, bytes: Bytes, shape: Shape, elem_size: usize, ) -> MemoryLayout
Given a resource and shape, stores it and returns the tensor handle and strides. This may or may not return contiguous strides. The layout is up to the runtime, and care should be taken when indexing.
Currently the tensor may either be contiguous (most runtimes), or “pitched”, to use the CUDA terminology. This means the last (contiguous) dimension is padded to fit a certain alignment, and the strides are adjusted accordingly. This can make memory accesses significantly faster since all rows are aligned to at least 16 bytes (the maximum load width), meaning the GPU can load as much data as possible in a single instruction. It may be aligned even more to also take cache lines into account.
However, the stride must be taken into account when indexing and reading the tensor
(also see ComputeClient::read_tensor).
Sourcepub fn create_tensors_from_slices(
&self,
descriptors: Vec<(MemoryLayoutDescriptor, &[u8])>,
) -> Vec<MemoryLayout>
pub fn create_tensors_from_slices( &self, descriptors: Vec<(MemoryLayoutDescriptor, &[u8])>, ) -> Vec<MemoryLayout>
Reserves all shapes in a single storage buffer, copies the corresponding data into each
handle, and returns the handles for them.
See ComputeClient::create_tensor
§Notes
Prefer using Self::create_tensors for better performance.
Sourcepub fn create_tensors(
&self,
descriptors: Vec<(MemoryLayoutDescriptor, Bytes)>,
) -> Vec<MemoryLayout>
pub fn create_tensors( &self, descriptors: Vec<(MemoryLayoutDescriptor, Bytes)>, ) -> Vec<MemoryLayout>
Reserves all shapes in a single storage buffer, copies the corresponding data into each
handle, and returns the handles for them.
See ComputeClient::create_tensor
Sourcepub fn empty(&self, size: usize) -> Handle
pub fn empty(&self, size: usize) -> Handle
Reserves size bytes in the storage, and returns a handle over them.
Sourcepub fn empty_tensor(&self, shape: Shape, elem_size: usize) -> MemoryLayout
pub fn empty_tensor(&self, shape: Shape, elem_size: usize) -> MemoryLayout
Reserves shape in the storage, and returns a tensor handle for it.
See ComputeClient::create_tensor
Sourcepub fn empty_tensors(
&self,
descriptors: Vec<MemoryLayoutDescriptor>,
) -> Vec<MemoryLayout>
pub fn empty_tensors( &self, descriptors: Vec<MemoryLayoutDescriptor>, ) -> Vec<MemoryLayout>
Reserves all shapes in a single storage buffer, and returns the handles for them.
See ComputeClient::create_tensor
Sourcepub fn staging<'a, I>(&self, bytes: I, file_only: bool)
pub fn staging<'a, I>(&self, bytes: I, file_only: bool)
Marks the given Bytes as being a staging buffer, maybe transferring it to pinned memory for faster data transfer with compute device.
TODO: This blocks the compute queue, so it will drop the compute utilization.
Sourcepub fn to_client(&self, src: Handle, dst_server: &Self) -> Handle
pub fn to_client(&self, src: Handle, dst_server: &Self) -> Handle
Transfer data from one client to another
Sourcepub fn sync_collective(&self)
pub fn sync_collective(&self)
Wait on the communication stream.
Sourcepub fn all_reduce(
&self,
src: Handle,
dst: Handle,
dtype: ElemType,
device_ids: Vec<DeviceId>,
op: ReduceOperation,
)
pub fn all_reduce( &self, src: Handle, dst: Handle, dtype: ElemType, device_ids: Vec<DeviceId>, op: ReduceOperation, )
Perform an all_reduce operation on the given devices.
Sourcepub fn to_client_tensor(
&self,
src_descriptor: CopyDescriptor,
dst_server: &Self,
) -> Handle
pub fn to_client_tensor( &self, src_descriptor: CopyDescriptor, dst_server: &Self, ) -> Handle
Transfer data from one client to another
Make sure the source description can be read in a contiguous manner.
Sourcepub fn launch(
&self,
kernel: <R::Server as ComputeServer>::Kernel,
count: CubeCount,
bindings: KernelArguments,
)
pub fn launch( &self, kernel: <R::Server as ComputeServer>::Kernel, count: CubeCount, bindings: KernelArguments, )
Launches the kernel with the given bindings.
Sourcepub unsafe fn launch_unchecked(
&self,
kernel: <R::Server as ComputeServer>::Kernel,
count: CubeCount,
bindings: KernelArguments,
)
pub unsafe fn launch_unchecked( &self, kernel: <R::Server as ComputeServer>::Kernel, count: CubeCount, bindings: KernelArguments, )
Launches the kernel with the given bindings without performing any bound checks.
§Safety
To ensure this is safe, you must verify your kernel:
- Has no out-of-bound reads and writes that can happen.
- Has no infinite loops that might never terminate.
Sourcepub fn flush(&self) -> Result<(), ServerError>
pub fn flush(&self) -> Result<(), ServerError>
Flush all outstanding commands.
Sourcepub fn sync(&self) -> DynFut<Result<(), ServerError>>
pub fn sync(&self) -> DynFut<Result<(), ServerError>>
Wait for the completion of every task in the server.
Sourcepub fn properties(&self) -> &DeviceProperties
pub fn properties(&self) -> &DeviceProperties
Get the features supported by the compute server.
Sourcepub fn properties_mut(&mut self) -> Option<&mut DeviceProperties>
pub fn properties_mut(&mut self) -> Option<&mut DeviceProperties>
§Warning
For private use only.
Sourcepub fn memory_usage(&self) -> Result<MemoryUsage, ServerError>
pub fn memory_usage(&self) -> Result<MemoryUsage, ServerError>
Get the current memory usage of this client.
Sourcepub fn enumerate_devices(&self, type_id: u16) -> Vec<DeviceId>
pub fn enumerate_devices(&self, type_id: u16) -> Vec<DeviceId>
Get all devices of a specific type available to this runtime
Sourcepub fn enumerate_all_devices(&self) -> Vec<DeviceId>
pub fn enumerate_all_devices(&self) -> Vec<DeviceId>
Get all devices available to this runtime
Sourcepub fn device_count(&self, type_id: u16) -> usize
pub fn device_count(&self, type_id: u16) -> usize
Get the number of devices of a specific type available to this runtime
Sourcepub fn device_count_total(&self) -> usize
pub fn device_count_total(&self) -> usize
Get the number of devices of a specific type available to this runtime
Sourcepub unsafe fn allocation_mode(&self, mode: MemoryAllocationMode)
pub unsafe fn allocation_mode(&self, mode: MemoryAllocationMode)
Change the memory allocation mode.
§Safety
This function isn’t thread safe and might create memory leaks.
Sourcepub fn memory_cleanup(&self)
pub fn memory_cleanup(&self)
Ask the client to release memory that it can release.
Nb: Results will vary on what the memory allocator deems beneficial, so it’s not guaranteed any memory is freed.
Sourcepub fn profile<O: Send + 'static>(
&self,
func: impl FnOnce() -> O + Send,
func_name: &str,
) -> Result<(O, ProfileDuration), ProfileError>
pub fn profile<O: Send + 'static>( &self, func: impl FnOnce() -> O + Send, func_name: &str, ) -> Result<(O, ProfileDuration), ProfileError>
Measure the execution time of some inner operations.
Sourcepub fn io_optimized_vector_sizes(
&self,
size: usize,
) -> impl Iterator<Item = VectorSize> + Clone
pub fn io_optimized_vector_sizes( &self, size: usize, ) -> impl Iterator<Item = VectorSize> + Clone
Returns all vector sizes that are useful to perform optimal IO operation on the given element.