Crate caffe2_common

source ·

Macros

caffe2_build_strings
| #cmakedefine CAFFE2_BUILD_SHARED_LIBS | #cmakedefine CAFFE2_FORCE_FALLBACK_CUDA_MPI | #cmakedefine CAFFE2_HAS_MKL_DNN | #cmakedefine CAFFE2_HAS_MKL_SGEMM_PACK | #cmakedefine CAFFE2_PERF_WITH_AVX | #cmakedefine CAFFE2_PERF_WITH_AVX2 | #cmakedefine CAFFE2_PERF_WITH_AVX512 | #cmakedefine CAFFE2_THREADPOOL_MAIN_IMBALANCE | #cmakedefine CAFFE2_THREADPOOL_STATS | #cmakedefine CAFFE2_USE_EXCEPTION_PTR | #cmakedefine CAFFE2_USE_ACCELERATE | #cmakedefine CAFFE2_USE_CUDNN #cmakedefine | CAFFE2_USE_EIGEN_FOR_BLAS #cmakedefine | CAFFE2_USE_FBCODE #cmakedefine CAFFE2_USE_GOOGLE_GLOG | #cmakedefine CAFFE2_USE_LITE_PROTO | #cmakedefine CAFFE2_USE_MKL #cmakedefine | CAFFE2_USE_MKLDNN #cmakedefine CAFFE2_USE_NVTX | #cmakedefine CAFFE2_USE_TRT | | #ifndef EIGEN_MPL2_ONLY #cmakedefine | EIGEN_MPL2_ONLY #endif |
caffe2_version
caffe_sdt
caffe_sdt_arg
| Format of each probe arguments as operand. | | Size of the argument tagged with CAFFE_SDT_Sn, | with “n” constraint. | | Value of the argument tagged with CAFFE_SDT_An, | with configured constraint.
caffe_sdt_arg_constraint
| Default constraint for the probe arguments | as operands. |
caffe_sdt_arg_template_0
caffe_sdt_arg_template_1
caffe_sdt_arg_template_2
caffe_sdt_arg_template_3
caffe_sdt_arg_template_4
caffe_sdt_arg_template_5
caffe_sdt_arg_template_6
caffe_sdt_arg_template_7
caffe_sdt_arg_template_8
caffe_sdt_argfmt
| Templates to reference the arguments | from operands in note section. |
caffe_sdt_argsize
caffe_sdt_asm_1
caffe_sdt_asm_2
caffe_sdt_asm_3
caffe_sdt_asm_addr
caffe_sdt_asm_string
caffe_sdt_isarray
caffe_sdt_narg
caffe_sdt_narg_
caffe_sdt_nop
Instruction to emit for the probe.
caffe_sdt_note_content
caffe_sdt_note_name
Note section properties.
caffe_sdt_note_type
caffe_sdt_operands_0
caffe_sdt_operands_1
caffe_sdt_operands_2
caffe_sdt_operands_3
caffe_sdt_operands_4
caffe_sdt_operands_5
caffe_sdt_operands_6
caffe_sdt_operands_7
caffe_sdt_operands_8
caffe_sdt_probe
caffe_sdt_probe_n
caffe_sdt_s
cublas_check
cublas_enforce
cuda_1d_kernel_loop
cuda_2d_kernel_loop
cuda_check
cuda_driverapi_check
cuda_driverapi_enforce
cuda_enforce
CUDA: various checks for different function calls.
cudnn_check
cudnn_enforce
| A macro that wraps around a cudnn statement | so we can check if the cudnn execution | finishes or not. |
cudnn_version_min
curand_check
curand_enforce
dispatch_function_by_value_with_type_1
dispatch_function_by_value_with_type_2
dispatch_function_by_value_with_type_3
impl_cudnn_type_wrapper

Structs

CudaDevicePropWrapper
CudaRuntimeFlagFlipper
| Turn on the flag g_caffe2_has_cuda_linked | to true for HasCudaRuntime() function. |
CudnnFilterDescWrapper
CudnnTensorDescWrapper
| cudnnTensorDescWrapper is the placeholder | that wraps around a cudnnTensorDescriptor_t, | allowing us to do descriptor change | as-needed during runtime. |
NetMutator
Concise util class to mutate a net in a chaining fashion.
SimpleArray
WorkspaceMutator
| Concise util class to mutate a workspace | in a chaining fashion. |

Enums

StorageOrder
| Storage orders that are often used in | the image applications. |

Constants

CAFFE2_CUDA_MAX_PEER_SIZE
| The maximum number of peers that each | gpu can have when doing p2p setup. | | Currently, according to NVidia documentation, | each device can support a system-wide | maximum of eight peer connections. | | When Caffe2 sets up peer access resources, | if we have more than 8 gpus, we will enable | peer access in groups of 8. |
CAFFE_CUDA_NUM_THREADS
| The number of cuda threads to use. Since | work is assigned to SMs at the granularity | of a block, 128 is chosen to allow utilizing | more SMs for smaller input sizes. 1D | grid |
CAFFE_CUDA_NUM_THREADS_2D_DIMX
2D grid
CAFFE_CUDA_NUM_THREADS_2D_DIMY
CAFFE_MAXIMUM_NUM_BLOCKS
| The maximum number of blocks to use in | the default kernel call. We set it to | 4096 which would work for compute capability | 2.x (where 65536 is the limit). | | This number is very carelessly chosen. | Ideally, one would like to look at the | hardware at runtime, and pick the number | of blocks that makes most sense for the | specific runtime environment. This | is a todo item. 1D grid |
CAFFE_MAXIMUM_NUM_BLOCKS_2D_DIMX
2D grid
CAFFE_MAXIMUM_NUM_BLOCKS_2D_DIMY
CUDNN_VERSION
| TODO | | cudnn_sys::cudnnGetVersion(); |
gDefaultGPUID
kCUDAGridDimMaxX
kCUDAGridDimMaxY
kCUDAGridDimMaxZ
kCUDATensorMaxDims
kFp16CUDADevicePropMajor
kIsLittleEndian

Traits

CudnnTypeWrapper
| cudnnTypeWrapper is a wrapper class | that allows us to refer to the cudnn type | in a template function. The class is | specialized explicitly for different | data types below. |
Fp16Type
| at::Half is defined in | c10/util/Half.h. Currently half float operators | are mainly on CUDA gpus. | | The reason we do not directly use the cuda | __half data type is because that requires | compilation with nvcc. The float16 data type | should be compatible with the cuda __half data | type, but will allow us to refer to the data | type without the need of cuda.
SkipIndices
| SkipIndices are used in | operator_fallback_gpu.h and | operator_fallback_mkl.h as utility functions | that marks input / output indices to skip when | we use a CPU operator as the fallback of | GPU/MKL operator option. | | note: this is supposed to be a variadic | template

Functions

assert_near
| Asserts that two float values are close | within epsilon. |
assert_tensor
Assertion for tensor sizes and values.
assert_tensor_equals
| Asserts that the values of two tensors | are the same. |
assert_tensor_equals_data
| Asserts that the numeric values of a | tensor is equal to a data vector. |
assert_tensor_equals_with_type
assert_tensor_equals_with_type_f32
assert_tensor_list_equals
| Asserts a list of tensors presented | in two workspaces are equal. |
caffe_cuda_get_device
| Gets the current GPU id. This is a simple | wrapper around cudaGetDevice(). |
caffe_cuda_set_device
| Gets the current GPU id. This is a simple | wrapper around cudaGetDevice(). |
caffe_get_blocks
| @brief | | Compute the number of blocks needed | to run N threads. |
caffe_get_blocks_2d
| @brief | | Compute the number of blocks needed | to run N threads for a 2D grid |
check_cudnn_versions
| Check compatibility of compiled and | runtime cuDNN versions |
constant_fill_tensor
Fill a constant to a tensor.
create_tensor
Create a new tensor in the workspace.
create_tensor_and_constant_fill
Create a tensor and fill a constant.
create_tensor_and_fill
create_tensor_in_workspace_and_fill
Create a tensor and fill data.
cublas_get_error_string
| Return a human readable cublas error | string. |
cuda_version
| A runtime function to report the cuda | version that Caffe2 is built with. |
cudnn_compiled_version
report the version of cuDNN Caffe2 was compiled with
cudnn_get_error_string
| A helper function to obtain cudnn error | strings. |
cudnn_runtime_version
| report the runtime version of cuDNN |
curand_get_error_string
| Return a human readable curand error | string. |
data_type_to_type_meta
| From caffe2::DataType protobuffer | enum to TypeMeta |
device_query
| Runs a device query function and prints | out the results to LOG(INFO). |
dynamic_cast_if_rtti
| dynamic cast reroute: if RTTI is disabled, | go to reinterpret_cast |
fill_tensor
Fill data from a vector to a tensor.
get_build_options
| Returns which setting Caffe2 was configured | and built with (exported from CMake) |
get_cuda_peer_access_pattern
| Return a peer access pattern by returning | a matrix (in the format of a nested vector) | of boolean values specifying whether | peer access is possible. | | This function returns false if anything | wrong happens during the query of the | GPU access pattern. |
get_cudnn_tensor_format
| A wrapper function to convert the Caffe | storage order to cudnn storage order | enum values. |
get_defaultGPUID
get_device_property
| Gets the device property for the given | device. This function is thread safe. | | The initial run on this function is ~1ms/device; | however, the results are cached so subsequent | runs should be much faster. |
get_dim_from_order_string
get_tensor
Read a tensor from the workspace.
has_cuda_gpu
| Check if the current running session | has a cuda gpu present. | | ———– | @note | | this is different from having caffe2 | built with cuda. | | Building Caffe2 with cuda only guarantees | that this function exists. | | If there are no cuda gpus present in the | machine, or there are hardware configuration | problems like an insufficient driver, | this function will still return false, | meaning that there is no usable GPU present. | | In the open source build, it is possible | that | | Caffe2’s GPU code is dynamically loaded, | and as a result a library could be only | linked to the | | CPU code, but want to test if cuda is later | available or not. | | In this case, one should use HasCudaRuntime() | from common.h. |
has_cuda_runtime
| HasCudaRuntime() tells the program whether the | binary has Cuda runtime linked. | | This function should not be used in static | initialization functions as the underlying | boolean variable is going to be switched on | when one loads libtorch_gpu.so.
has_hip_runtime
name_scope_separator
num_cuda_devices
| Returns the number of devices. |
print_cudnn_info
random_fill
| Fill a buffer with randomly generated | numbers given range [min, max) T can | only be float, double or long double | default RealType = float |
set_cuda_runtime_flag
| Sets the Cuda Runtime flag that is used by | HasCudaRuntime(). | | You should never use this function - it is | only used by the Caffe2 gpu code to notify | Caffe2 core that cuda runtime has been loaded.
set_defaultGPUID
set_hip_runtime_flag
string_to_storage_order
tensor_core_available
| Return the availability of TensorCores | for math |
type_eq
Are T and U are the same type?
type_meta_to_data_type
| From TypeMeta to caffe2::DataType | protobuffer enum. |