| MIOPENWrapper is a class that wraps the miopen
| handles and miopen workspaces.
|
| The wrapper ensures that for each thread and
| each gpu, there is one identical miopen handle,
| which is also associated with the thread-local
| per-device hip stream. The wrapper also hosts
| the device-specific miopen workspace (scratch
| space for some miopen functions).
|
| MIOpenState is the owner of the MIOpenWorkspace,
| and serializes all executions of operations that
| use the state onto it’s own stream (so multiple
| Net workers can reuse the same workspace from
| different threads and HIP streams).
| MIOpenWorkspace is a wrapper around a raw cuda
| pointer that holds the miopen scratch
| space. This struct is meant to be only used in
| MIOPENWrapper to provide a program-wide scratch
| space for MIOPEN. The reason behind it is that
| miopen function calls are usually very
| efficient, hence one probably does not want to
| run multiple miopen calls at the same time. As
| a result, one should not need more than one
| miopen workspace per device.
| miopenTensorDescWrapper is the placeholder
| that wraps around a miopenTensorDescriptor_t,
| allowing us to do descriptor change
| as-needed during runtime.
|
| miopenTypeWrapper is a wrapper class
| that allows us to refer to the miopen
| type in a template function.
|
| The class is specialized explicitly
| for different data types below.
|