pub struct Context { /* private fields */ }Expand description
A shared CUDA driver context.
Unlike cuBLAS, cuDNN, cuFFT, and similar library handles, a CUDA context is the underlying execution environment for a device. It is intended to be shared by streams, modules, libraries, events, allocations, and higher-level library wrappers.
This type is therefore reference-counted by returning Arc<Self> from the
constructors, and it remains Send + Sync. Shared references do not mutate
Rust-visible state on the Context object itself; methods such as bind
update the calling thread’s current CUDA context in the driver.
Prefer one long-lived context per device and share it across dependent CUDA objects instead of creating many short-lived contexts.
Implementations§
Source§impl Context
impl Context
pub fn create() -> Result<Arc<Self>>
pub fn create_with_flags(flags: ContextFlags) -> Result<Arc<Self>>
pub fn create_for_device(device: Device) -> Result<Arc<Self>>
pub fn create_for_device_with_flags( device: Device, flags: ContextFlags, ) -> Result<Arc<Self>>
pub fn retain_primary_for_device(device: Device) -> Result<Arc<Self>>
Sourcepub fn bind(&self) -> Result<()>
pub fn bind(&self) -> Result<()>
Binds this CUDA context to the calling CPU thread.
The “current context” is thread-local driver state. Calling this method
does not mutate the Rust Context value itself; it makes this context
current for subsequent CUDA driver and interoperating runtime calls on
the current host thread.
§Errors
Returns an error if CUDA Driver cannot query or set the current context.
Sourcepub fn load_module(self: &Arc<Self>, image: &ModuleImage<'_>) -> Result<Module>
pub fn load_module(self: &Arc<Self>, image: &ModuleImage<'_>) -> Result<Module>
Loads the corresponding module from the given image into the current context. The image may be a cubin or fatbin as output by nvcc, or a NUL-terminated PTX string, either as output by nvcc or hand-written, or Tile IR data.
§Errors
Returns an error if the context cannot be bound, CUDA cannot load the module, or a previous asynchronous launch reported an error.
Sourcepub fn create_graph(self: &Arc<Self>) -> Result<Graph>
pub fn create_graph(self: &Arc<Self>) -> Result<Graph>
Creates an empty CUDA graph associated with this context.
Prefer this over RawGraph::create
for ordinary Singe code. The returned graph carries its context
association into instantiated executable graphs, allowing launches and
uploads to reject streams from another context before calling CUDA.
§Errors
Returns an error if the context cannot be bound or CUDA cannot create the graph.
pub fn unload_module(self: &Arc<Self>, module: Module) -> Result<()>
Sourcepub fn load_module_with_options(
self: &Arc<Self>,
image: &ModuleImage<'_>,
jit_options: JitOptions<'_>,
) -> Result<Module>
pub fn load_module_with_options( self: &Arc<Self>, image: &ModuleImage<'_>, jit_options: JitOptions<'_>, ) -> Result<Module>
Loads the corresponding module from the given image into the current context. The image may be a cubin or fatbin as output by nvcc, or a NUL-terminated PTX string, either as output by nvcc or hand-written, or Tile IR data.
§Errors
Returns an error if the context cannot be bound, CUDA cannot load the module, JIT options are rejected, or a previous asynchronous launch reported an error.
pub fn load_nvrtc_module( self: &Arc<Self>, program: &Program, output: OutputKind, ) -> Result<Module>
pub fn load_nvrtc_module_with_options( self: &Arc<Self>, program: &Program, output: OutputKind, jit_options: JitOptions<'_>, ) -> Result<Module>
pub fn load_library( self: &Arc<Self>, image: &ModuleImage<'_>, ) -> Result<Library>
Sourcepub fn load_library_with_options(
self: &Arc<Self>,
image: &ModuleImage<'_>,
jit_options: JitOptions<'_>,
) -> Result<Library>
pub fn load_library_with_options( self: &Arc<Self>, image: &ModuleImage<'_>, jit_options: JitOptions<'_>, ) -> Result<Library>
Loads the corresponding library from the given image based on the application defined library loading mode:
- If module loading is set to EAGER by the environment variables described in “Module loading”, the library is loaded eagerly into all contexts at the time of the call and future contexts at the time of creation until the library
is unloaded with
sys::cuLibraryUnload. - If the environment variables are set to LAZY, the library is not immediately loaded into existing contexts and is loaded only when a function is needed for that context, such as a kernel launch.
These environment variables are described in the CUDA programming guide under the “CUDA environment variables” section.
The code may be a cubin or fatbin emitted by nvcc, a NUL-terminated PTX string emitted by nvcc or written by hand, or Tile IR data. A fatbin must also contain relocatable code when doing separate compilation.
If the library contains managed variables and no device in the system supports them, this call returns crate::error::Status::NotSupported.
pub fn load_nvrtc_library( self: &Arc<Self>, program: &Program, output: OutputKind, ) -> Result<Library>
pub fn load_nvrtc_library_with_options( self: &Arc<Self>, program: &Program, output: OutputKind, jit_options: JitOptions<'_>, ) -> Result<Library>
Sourcepub fn load_library_from_file(self: &Arc<Self>, path: &str) -> Result<Library>
pub fn load_library_from_file(self: &Arc<Self>, path: &str) -> Result<Library>
Loads the corresponding library from the given file based on the application defined library loading mode:
- If module loading is set to EAGER by the environment variables described in “Module loading”, the library is loaded eagerly into all contexts at the time of the call and future contexts at the time of creation until the library
is unloaded with
sys::cuLibraryUnload. - If the environment variables are set to LAZY, the library is not immediately loaded into existing contexts and is loaded only when a function is needed for that context, such as a kernel launch.
These environment variables are described in the CUDA programming guide under the “CUDA environment variables” section.
The file must be a cubin emitted by nvcc, a PTX file emitted by nvcc or written by hand, a fatbin emitted by nvcc or written by hand, or a Tile IR file. A fatbin must also contain relocatable code when doing separate compilation.
If the library contains managed variables and no device in the system supports them, this call returns crate::error::Status::NotSupported.
§Errors
Returns an error if this context cannot be bound, if path contains an
interior NUL byte, or if CUDA Driver cannot load the library.
Sourcepub fn synchronize(&self) -> Result<()>
pub fn synchronize(&self) -> Result<()>
Blocks until the current context has completed all preceding requested tasks.
If the current context is the primary context, child contexts that have been created are also synchronized.
Context::synchronize returns an error if one of the preceding tasks failed.
If the context was created with ContextFlags::SCHEDULE_BLOCKING_SYNC, the CPU thread blocks until the GPU context has finished its work.
§Errors
Returns an error if the context cannot be bound, a preceding task failed, or a previous asynchronous launch reported an error.
Sourcepub fn flags(&self) -> Result<ContextFlags>
pub fn flags(&self) -> Result<ContextFlags>
Returns the flags of the current context.
See ContextFlags for flag values.
§Errors
Returns an error if the context cannot be bound, CUDA cannot query the flags, or a previous asynchronous launch reported an error.
Sourcepub fn limit(&self, limit: Limit) -> Result<usize>
pub fn limit(&self, limit: Limit) -> Result<usize>
Returns the current size of limit.
The supported Limit values are:
Limit::StackSize: stack size in bytes of each GPU thread.Limit::PrintfFifoSize: size in bytes of the FIFO used by theprintf()device system call.Limit::MallocHeapSize: size in bytes of the heap used by themalloc()andfree()device system calls.Limit::DevRuntimeSyncDepth: maximum grid depth at which a thread can issue the device runtime callDevice::synchronizeto wait on child grid launches to complete.Limit::DevRuntimePendingLaunchCount: maximum number of outstanding device runtime launches that can be made from this context.Limit::MaxL2FetchGranularity: L2 cache fetch granularity.Limit::PersistingL2CacheSize: persisting L2 cache size in bytes.
§Errors
Returns an error if the context cannot be bound, limit is unsupported, CUDA cannot query
the limit, or a previous asynchronous launch reported an error.
Sourcepub fn set_limit(&self, limit: Limit, value: usize) -> Result<()>
pub fn set_limit(&self, limit: Limit, value: usize) -> Result<()>
Setting limit to value is a request by the application to update the current limit maintained by the context.
The driver may modify the requested value to meet hardware requirements, such as clamping to minimum or maximum values or rounding up to the nearest element size.
Use Context::limit to query the effective value.
Setting each Limit has its own restrictions.
-
Limit::StackSizecontrols the stack size in bytes of each GPU thread. The driver automatically increases the per-thread stack size for each kernel launch as needed. This size is not reset back to the original value after each launch. Setting this value will take effect immediately, and if necessary, the device will block until all preceding requested tasks are complete. -
Limit::PrintfFifoSizecontrols the size in bytes of the FIFO used by theprintf()device system call. ConfigureLimit::PrintfFifoSizebefore launching any kernel that uses theprintf()device system call; otherwisecrate::error::Status::InvalidValueis returned. -
Limit::MallocHeapSizecontrols the size in bytes of the heap used by themalloc()andfree()device system calls. ConfigureLimit::MallocHeapSizebefore launching any kernel that uses themalloc()orfree()device system calls; otherwisecrate::error::Status::InvalidValueis returned. -
Limit::DevRuntimeSyncDepthcontrols the maximum nesting depth of a grid at which a thread can safely callDevice::synchronize. Setting this limit must be performed before any launch of a kernel that uses the device runtime and callsDevice::synchronizeabove the default sync depth, two levels of grids. Calls toDevice::synchronizefail if this limit is violated. This limit can be set smaller than the default or up to the maximum launch depth of 24. Additional sync-depth levels require the driver to reserve large amounts of device memory that can no longer be used for application allocations. If these reservations of device memory fail,Context::set_limitreturnscrate::error::Status::OutOfMemory, and the limit can be reset to a lower value. This limit is only applicable to devices of compute capability < 9.0. Setting this limit on devices of other compute capability versions returnscrate::error::Status::UnsupportedLimit. -
Limit::DevRuntimePendingLaunchCountcontrols the maximum number of outstanding device runtime launches that can be made from the current context. A grid is outstanding from launch until it is known to have completed. Device runtime launches that violate this limit fail. If a module using the device runtime needs more pending launches than the default 2048 launches, this limit can be increased. Sustaining additional pending launches requires the driver to reserve larger amounts of device memory up front, which can no longer be used for allocations. If these reservations fail,Context::set_limitreturnscrate::error::Status::OutOfMemory, and the limit can be reset to a lower value. This limit is only applicable to devices of compute capability 3.5 and higher. Attempting to set this limit on devices of compute capability less than 3.5 returnscrate::error::Status::UnsupportedLimit. -
Limit::MaxL2FetchGranularitycontrols the L2 cache fetch granularity. Values can range from 0B to 128B. Performance hint that may be ignored or clamped depending on the platform. -
Limit::PersistingL2CacheSizecontrols size in bytes available for persisting L2 cache. Performance hint that may be ignored or clamped depending on the platform.
§Errors
Returns an error if the context cannot be bound, limit is unsupported, CUDA rejects the
requested value, or a previous asynchronous launch reported an error.
pub const fn device(&self) -> Device
pub const fn as_raw(&self) -> CUcontext
Sourcepub unsafe fn from_raw(
handle: CUcontext,
device: Device,
ownership: RawContextOwnership,
) -> Result<Arc<Self>>
pub unsafe fn from_raw( handle: CUcontext, device: Device, ownership: RawContextOwnership, ) -> Result<Arc<Self>>
Takes ownership of a raw CUDA context.
§Safety
handle must be a valid CUDA context for device, and no other Rust
wrapper may own the same release responsibility. ownership must match
how the context should be released: created contexts are destroyed with
cuCtxDestroy, while primary contexts are released with
cuDevicePrimaryCtxRelease.
Sourcepub fn into_raw_parts(self) -> (CUcontext, Device, RawContextOwnership)
pub fn into_raw_parts(self) -> (CUcontext, Device, RawContextOwnership)
Transfers ownership of the raw CUDA context to the caller.
The caller becomes responsible for releasing the returned context according to the returned ownership mode.
Source§impl Context
impl Context
pub fn create_event(self: &Arc<Self>) -> Result<Event>
Sourcepub fn create_event_with_flags(
self: &Arc<Self>,
flags: EventFlags,
) -> Result<Event>
pub fn create_event_with_flags( self: &Arc<Self>, flags: EventFlags, ) -> Result<Event>
Creates an event object for the current device with the specified flags. Valid flags include:
EventFlags::DEFAULT: default event creation flag.EventFlags::BLOCKING_SYNC: the event uses blocking synchronization. A host thread that usesEvent::synchronizeto wait on an event created with this flag will block until the event actually completes.EventFlags::DISABLE_TIMING: the created event does not record timing data. Events created with this flag specified andEventFlags::BLOCKING_SYNCnot specified will provide the best performance when used withStream::wait_eventandEvent::query.EventFlags::INTERPROCESS: the created event may be used as an interprocess event bysys::cudaIpcGetEventHandle.EventFlags::INTERPROCESSmust be specified along withEventFlags::DISABLE_TIMING.
§Errors
Returns an error if the context cannot be bound, the flag combination is
invalid, CUDA cannot create the event, or CUDA returns a null event
handle. CUDA may also report errors from previous asynchronous launches,
internal runtime initialization errors such as
crate::error::Status::NotInitialized, crate::error::Status::CallRequiresNewerDriver,
or crate::error::Status::NoDevice, and callback diagnostics such as
crate::error::Status::NotPermitted.
Source§impl Context
impl Context
pub fn create_stream(self: &Arc<Self>) -> Result<Stream>
Sourcepub fn create_stream_with_flags(
self: &Arc<Self>,
flags: StreamFlags,
) -> Result<Stream>
pub fn create_stream_with_flags( self: &Arc<Self>, flags: StreamFlags, ) -> Result<Stream>
Creates a new asynchronous stream on the context that is current to the calling host thread.
If no context is current to the calling host thread, then the primary context for a device is selected, made current to the calling thread, and initialized before creating a stream on it.
The flags argument determines the behaviors of the stream.
Valid values are provided by StreamFlags:
StreamFlags::DEFAULT: default stream creation behavior.StreamFlags::NON_BLOCKING: allows the created stream to run concurrently with the legacy default stream without implicit synchronization.
§Errors
Returns an error if CUDA cannot create the stream, if it does not return a valid stream
handle, or if a previous asynchronous launch reported an error. CUDA may also return
initialization-related errors such as crate::error::Status::NotInitialized,
crate::error::Status::CallRequiresNewerDriver, or crate::error::Status::NoDevice if this call initializes
internal runtime state. Callbacks must not call CUDA functions; see
Stream::add_callback.
Sourcepub fn create_stream_with_priority(
self: &Arc<Self>,
flags: StreamFlags,
priority: i32,
) -> Result<Stream>
pub fn create_stream_with_priority( self: &Arc<Self>, flags: StreamFlags, priority: i32, ) -> Result<Stream>
Creates a stream with the specified priority. The stream is created on this context. This affects the scheduling priority of work in the stream. Priorities provide a hint to preferentially run work with higher priority when possible, but do not preempt already-running work or provide any other functional guarantee on execution order.
priority follows a convention where lower numbers represent higher priorities.
0 represents default priority.
The range of meaningful numerical priorities can be queried using Device::stream_priority_range.
If the specified priority is outside the numerical range returned by Device::stream_priority_range, it will automatically be clamped to the lowest or the highest number in the range.
- Stream priorities are supported only on GPUs with compute capability 3.5 or higher.
- In the current implementation, only compute kernels launched in priority streams are affected by the stream’s priority. Stream priorities have no effect on host-to-device and device-to-host memory operations.
§Errors
Returns an error if the context cannot be bound, CUDA cannot create the stream, CUDA returns a null stream handle, a previous asynchronous launch reports an error, or CUDA reports runtime initialization diagnostics.