| @brief A templated class to allow one to wrap
| a CPU operator as a CUDA operator.
|
| This class can be used when one does not have
| the CUDA implementation ready yet for an
| operator. Essentially, what this op does is to
| automatically deal with data copy for
| you. Plausibly, this causes a lot of overhead
| and is not optimal, so you should use this
| operator mostly for quick prototyping purpose.
|
| All the input and output of the original
| operator should be TensorCPU.
|
| Example usage: if you have a class MyMagicOp
| that is CPU based, and you use the registration
| code
|
| REGISTER_CPU_OPERATOR(MyMagic, MyMagicOp);
|
| to register the CPU side, you can create its
| corresponding GPU operator (with performance
| hits of course) via
|
| REGISTER_CUDA_OPERATOR(MyMagic,
| GPUFallbackOp);
|
| Note that you will need to make sure that the
| operators actually share the same name.
|
| Advanced usage: if you want to have some
| specific outputs never copied, you can use the
| SkipOutputCopy template argument to do that.
|
| For example, if MyMagic produces two outputs
| and the first output is always going to live on
| the CPU, you can do
|
| REGISTER_CUDA_OPERATOR(MyMagic,
| GPUFallbackOpEx<SkipIndices<0>>);