Crate caffe2_common
source ·Macros
- | #cmakedefine CAFFE2_BUILD_SHARED_LIBS | #cmakedefine CAFFE2_FORCE_FALLBACK_CUDA_MPI | #cmakedefine CAFFE2_HAS_MKL_DNN | #cmakedefine CAFFE2_HAS_MKL_SGEMM_PACK | #cmakedefine CAFFE2_PERF_WITH_AVX | #cmakedefine CAFFE2_PERF_WITH_AVX2 | #cmakedefine CAFFE2_PERF_WITH_AVX512 | #cmakedefine CAFFE2_THREADPOOL_MAIN_IMBALANCE | #cmakedefine CAFFE2_THREADPOOL_STATS | #cmakedefine CAFFE2_USE_EXCEPTION_PTR | #cmakedefine CAFFE2_USE_ACCELERATE | #cmakedefine CAFFE2_USE_CUDNN #cmakedefine | CAFFE2_USE_EIGEN_FOR_BLAS #cmakedefine | CAFFE2_USE_FBCODE #cmakedefine CAFFE2_USE_GOOGLE_GLOG | #cmakedefine CAFFE2_USE_LITE_PROTO | #cmakedefine CAFFE2_USE_MKL #cmakedefine | CAFFE2_USE_MKLDNN #cmakedefine CAFFE2_USE_NVTX | #cmakedefine CAFFE2_USE_TRT | | #ifndef EIGEN_MPL2_ONLY #cmakedefine | EIGEN_MPL2_ONLY #endif |
- | Format of each probe arguments as operand. | | Size of the argument tagged with CAFFE_SDT_Sn, | with “n” constraint. | | Value of the argument tagged with CAFFE_SDT_An, | with configured constraint.
- | Default constraint for the probe arguments | as operands. |
- | Templates to reference the arguments | from operands in note section. |
- Instruction to emit for the probe.
- Note section properties.
- CUDA: various checks for different function calls.
- | A macro that wraps around a cudnn statement | so we can check if the cudnn execution | finishes or not. |
Structs
- | Turn on the flag g_caffe2_has_cuda_linked | to true for HasCudaRuntime() function. |
- | cudnnTensorDescWrapper is the placeholder | that wraps around a cudnnTensorDescriptor_t, | allowing us to do descriptor change | as-needed during runtime. |
- Concise util class to mutate a net in a chaining fashion.
- | Concise util class to mutate a workspace | in a chaining fashion. |
Enums
- | Storage orders that are often used in | the image applications. |
Constants
- | The maximum number of peers that each | gpu can have when doing p2p setup. | | Currently, according to NVidia documentation, | each device can support a system-wide | maximum of eight peer connections. | | When Caffe2 sets up peer access resources, | if we have more than 8 gpus, we will enable | peer access in groups of 8. |
- | The number of cuda threads to use. Since | work is assigned to SMs at the granularity | of a block, 128 is chosen to allow utilizing | more SMs for smaller input sizes. 1D | grid |
- 2D grid
- | The maximum number of blocks to use in | the default kernel call. We set it to | 4096 which would work for compute capability | 2.x (where 65536 is the limit). | | This number is very carelessly chosen. | Ideally, one would like to look at the | hardware at runtime, and pick the number | of blocks that makes most sense for the | specific runtime environment. This | is a todo item. 1D grid |
- 2D grid
- | TODO | | cudnn_sys::cudnnGetVersion(); |
Traits
- | cudnnTypeWrapper is a wrapper class | that allows us to refer to the cudnn type | in a template function. The class is | specialized explicitly for different | data types below. |
- | at::Half is defined in | c10/util/Half.h. Currently half float operators | are mainly on CUDA gpus. | | The reason we do not directly use the cuda | __half data type is because that requires | compilation with nvcc. The float16 data type | should be compatible with the cuda __half data | type, but will allow us to refer to the data | type without the need of cuda.
- | SkipIndices are used in | operator_fallback_gpu.h and | operator_fallback_mkl.h as utility functions | that marks input / output indices to skip when | we use a CPU operator as the fallback of | GPU/MKL operator option. | | note: this is supposed to be a variadic | template
Functions
- | Asserts that two float values are close | within epsilon. |
- Assertion for tensor sizes and values.
- | Asserts that the values of two tensors | are the same. |
- | Asserts that the numeric values of a | tensor is equal to a data vector. |
- | Asserts a list of tensors presented | in two workspaces are equal. |
- | Gets the current GPU id. This is a simple | wrapper around cudaGetDevice(). |
- | Gets the current GPU id. This is a simple | wrapper around cudaGetDevice(). |
- | @brief | | Compute the number of blocks needed | to run N threads. |
- | @brief | | Compute the number of blocks needed | to run N threads for a 2D grid |
- | Check compatibility of compiled and | runtime cuDNN versions |
- Fill a constant to a tensor.
- Create a new tensor in the workspace.
- Create a tensor and fill a constant.
- Create a tensor and fill data.
- | Return a human readable cublas error | string. |
- | A runtime function to report the cuda | version that Caffe2 is built with. |
- report the version of cuDNN Caffe2 was compiled with
- | A helper function to obtain cudnn error | strings. |
- | report the runtime version of cuDNN |
- | Return a human readable curand error | string. |
- | From caffe2::DataType protobuffer | enum to TypeMeta |
- | Runs a device query function and prints | out the results to LOG(INFO). |
- | dynamic cast reroute: if RTTI is disabled, | go to reinterpret_cast |
- Fill data from a vector to a tensor.
- | Returns which setting Caffe2 was configured | and built with (exported from CMake) |
- | Return a peer access pattern by returning | a matrix (in the format of a nested vector) | of boolean values specifying whether | peer access is possible. | | This function returns false if anything | wrong happens during the query of the | GPU access pattern. |
- | A wrapper function to convert the Caffe | storage order to cudnn storage order | enum values. |
- | Gets the device property for the given | device. This function is thread safe. | | The initial run on this function is ~1ms/device; | however, the results are cached so subsequent | runs should be much faster. |
- Read a tensor from the workspace.
- | Check if the current running session | has a cuda gpu present. | | ———– | @note | | this is different from having caffe2 | built with cuda. | | Building Caffe2 with cuda only guarantees | that this function exists. | | If there are no cuda gpus present in the | machine, or there are hardware configuration | problems like an insufficient driver, | this function will still return false, | meaning that there is no usable GPU present. | | In the open source build, it is possible | that | | Caffe2’s GPU code is dynamically loaded, | and as a result a library could be only | linked to the | | CPU code, but want to test if cuda is later | available or not. | | In this case, one should use HasCudaRuntime() | from common.h. |
- | HasCudaRuntime() tells the program whether the | binary has Cuda runtime linked. | | This function should not be used in static | initialization functions as the underlying | boolean variable is going to be switched on | when one loads libtorch_gpu.so.
- | Returns the number of devices. |
- | Fill a buffer with randomly generated | numbers given range [min, max) T can | only be float, double or long double | default RealType = float |
- | Sets the Cuda Runtime flag that is used by | HasCudaRuntime(). | | You should never use this function - it is | only used by the Caffe2 gpu code to notify | Caffe2 core that cuda runtime has been loaded.
- | Return the availability of TensorCores | for math |
- Are
T
andU
are the same type? - | From TypeMeta to caffe2::DataType | protobuffer enum. |