cudarc/driver/mod.rs
1//! Wrappers around the [CUDA driver API](https://docs.nvidia.com/cuda/cuda-driver-api/index.html),
2//! in three levels. See crate documentation for description of each.
3//!
4//! # safe api usage
5//!
6//! 1. Instantiate a [CudaContext]:
7//!
8//! ```rust
9//! # use cudarc::driver::*;
10//! let ctx = CudaContext::new(0).unwrap();
11//! ```
12//!
13//! 2. Create a [CudaStream] to schedule work on using [CudaContext::default_stream()] or [CudaContext::new_stream()]:
14//!
15//! ```rust
16//! # use cudarc::driver::*;
17//! # let ctx = CudaContext::new(0).unwrap();
18//! let stream = ctx.default_stream();
19//! ```
20//!
21//! 3. Allocate device memory with [CudaStream::memcpy_stod()]/[CudaStream::memcpy_htod], [CudaStream::alloc_zeros()].
22//!
23//! ```rust
24//! # use cudarc::driver::*;
25//! # let ctx = CudaContext::new(0).unwrap();
26//! # let stream = ctx.default_stream();
27//! let a_dev: CudaSlice<f32> = stream.alloc_zeros(10).unwrap();
28//! let b_dev: CudaSlice<f32> = stream.memcpy_stod(&[0.0; 10]).unwrap();
29//! ```
30//!
31//! 3. Transfer to host memory with [CudaStream::memcpy_dtov()], or [CudaStream::memcpy_dtoh()]
32//!
33//! ```rust
34//! # use cudarc::driver::*;
35//! # let ctx = CudaContext::new(0).unwrap();
36//! # let stream = ctx.default_stream();
37//! let a_dev: CudaSlice<f32> = stream.alloc_zeros(10).unwrap();
38//! let mut a_host: [f32; 10] = [1.0; 10];
39//! stream.memcpy_dtoh(&a_dev, &mut a_host);
40//! assert_eq!(a_host, [0.0; 10]);
41//! let a_host: Vec<f32> = stream.memcpy_dtov(&a_dev).unwrap();
42//! assert_eq!(&a_host, &[0.0; 10]);
43//! ```
44//!
45//! ## Mutating device memory - [CudaModule]/[CudaFunction]
46//!
47//! See [CudaStream::launch_builder()]/[LaunchArgs::launch()] and [CudaFunction].
48//!
49//! In order to mutate device data, you need to use cuda kernels.
50//!
51//! Loading kernels is done with [CudaContext::load_module()]
52//! ```rust
53//! # use cudarc::{driver::*, nvrtc::*};
54//! # use std::sync::Arc;
55//! let ptx = compile_ptx("extern \"C\" __global__ void my_function(float *out) { }").unwrap();
56//! let ctx = CudaContext::new(0).unwrap();
57//! let module: Arc<CudaModule> = ctx.load_module(ptx).unwrap();
58//! ```
59//!
60//! Retrieve functions using the [CudaModule::load_function()]
61//! ```rust
62//! # use cudarc::{driver::*, nvrtc::*};
63//! # let ptx = compile_ptx("extern \"C\" __global__ void my_function(float *out) { }").unwrap();
64//! # let ctx = CudaContext::new(0).unwrap();
65//! # let module = ctx.load_module(ptx).unwrap();
66//! let f: CudaFunction = module.load_function("my_function").unwrap();
67//! ```
68//!
69//! Asynchronously execute the kernel:
70//! ```rust
71//! # use cudarc::{driver::*, nvrtc::*};
72//! # let ptx = compile_ptx("extern \"C\" __global__ void my_function(float *out) { }").unwrap();
73//! # let ctx = CudaContext::new(0).unwrap();
74//! # let module = ctx.load_module(ptx).unwrap();
75//! # let f: CudaFunction = module.load_function("my_function").unwrap();
76//! let stream = ctx.default_stream();
77//! let mut a = stream.alloc_zeros::<f32>(10).unwrap();
78//! let cfg = LaunchConfig::for_num_elems(10);
79//! unsafe { stream.launch_builder(&f).arg(&mut a).launch(cfg) }.unwrap();
80//! ```
81//!
82//! Note: Launching kernels is **extremely unsafe**. See [LaunchArgs::launch()] for more info.
83//!
84//! ## Sub slices of [CudaSlice] - [CudaView] & [CudaViewMut]
85//!
86//! For some operations, it is necessary to only operate on a small part of a single [CudaSlice].
87//! For example, the slice may represent a batch of items, and you want to run separate kernels
88//! on each of the items in the batch.
89//!
90//! Use [CudaSlice::try_slice()] and [CudaSlice::try_slice_mut()] for this. The returned
91//! views ([CudaView] and [CudaViewMut] hold references to the owning [CudaSlice],
92//! so rust's ownership system handles safety here.
93//!
94//! These view structs can be used with [CudaFunction], and any [CudaStream] methods.
95//!
96//! ```rust
97//! # use cudarc::{driver::*, nvrtc::*};
98//! # let ptx = compile_ptx("extern \"C\" __global__ void my_function(float *out) { }").unwrap();
99//! # let ctx = CudaContext::new(0).unwrap();
100//! # let stream = ctx.default_stream();
101//! # let module = ctx.load_module(ptx).unwrap();
102//! # let f = module.load_function("my_function").unwrap();
103//! let cfg = LaunchConfig::for_num_elems(10);
104//! let mut a: CudaSlice<f32> = stream.alloc_zeros::<f32>(3 * 10).unwrap();
105//! for i_batch in 0..3 {
106//! let mut a_sub_view: CudaViewMut<f32> = a.try_slice_mut(i_batch * 10..).unwrap();
107//! unsafe { stream.launch_builder(&f).arg(&mut a_sub_view).launch(cfg) }.unwrap();
108//! }
109//! ```
110//!
111//! #### A note on implementation
112//!
113//! It would be possible to re-use [CudaSlice] itself for sub-slices, however that would involve adding
114//! another structure underneath the hood that is wrapped in an [std::sync::Arc] to minimize data cloning.
115//! Overall it seemed more complex than the current implementation.
116//!
117//! # Multi threading
118//!
119//! We implement [Send]/[Sync] on all types that it is safe to do so on. [CudaContext] will auto bind to whatever
120//! thread is currently using it.
121//!
122//! # Safety
123//!
124//! There are a number of aspects to this, but at a high level this API utilizes [std::sync::Arc] to
125//! control when [CudaContext] can be dropped.
126//!
127//! ### Context/Stream lifetimes
128//!
129//! The first part of safety is ensuring that [crate::driver::sys::CUcontext],
130//! [crate::driver::sys::CUdevice], and [crate::driver::sys::CUstream] all
131//! live the required amount of time (i.e. device outlives context, which outlives stream).
132//!
133//! This is accomplished by putting all of them inside one struct, the [CudaContext]. There are other ways,
134//! such as adding newtypes that carry lifetimes with them, but this approach was chosen to make working
135//! with device pointers easier.
136//!
137//! Additionally, [CudaContext] implements [Drop] as releasing all the data from the device in
138//! the expected way.
139//!
140//! ### Device Data lifetimes
141//!
142//! The next part of safety is ensuring that [CudaSlice] do not outlive
143//! the [CudaContext]. For usability, each [CudaSlice] owns an `Arc<CudaContext>`
144//! to ensure the device stays alive.
145//!
146//! Additionally we don't want to double free any device pointers, so free is only
147//! called when the device pointer is dropped. Thanks rust!
148//!
149//! ### Host and Device Data lifetimes
150//!
151//! When copying data between host & device, we ensure proper use of [CudaEvent::synchronize()]
152//! and [CudaStream::synchronize()] to make sure no data is freed during use.
153
154pub mod result;
155pub mod safe;
156#[allow(warnings)]
157pub mod sys;
158
159pub use safe::*;