In-place pair definition. It can queried from a compiled partition
indicating that an input and an output of the partition can share the same
memory buffer for computation. In-place computation helps to reduce the
memory footprint and improves cache locality. But since the library may not
have a global view of user’s application, it’s possible that the tensor with
input_id is used at other places in user’s computation graph. In this
case, the user should take the in-place pair as a hint and pass a different
memory buffer for output tensor to avoid overwriting the input memory buffer
which will probably cause unexpected incorrect results.
Gets the ISA specific hints that library can follow. See
#dnnl_cpu_isa_hints_t and #dnnl::cpu_isa_hints for the list of the values
returned by the C and C++ API functions respectively.
Gets the maximal ISA the library can dispatch to on the CPU. See
#dnnl_cpu_isa_t and #dnnl::cpu_isa for the list of the values returned by
the C and C++ API functions respectively.
Adds an operation into a graph. The API will return failure if the operator
has already been added to the graph or the operation cannot pass the schema
check in the library (eg. input and output numbers and data types, the
attributes of the operation, etc.).
Returns the hint of in-place pairs from a compiled partition. It indicates
that an input and an output of the partition can share the same memory
buffer for computation. In-place computation helps to reduce the memory
footprint and improves cache locality. But since the library may not have a
global view of user’s application, it’s possible that the tensor with
input_id is used at other places in user’s computation graph. In this
case, the user should take the in-place pair as a hint and pass a different
memory buffer for output tensor to avoid overwriting the input memory buffer
which will probably cause unexpected incorrect results.
Queries an input or output logical tensor according to tensor ID. If the
tensor ID doesn’t belong to any input or output of the compiled partition,
an error status #dnnl_invalid_arguments will be returned by the API.
Creates a new empty graph. A graph is associated to a specific engine kind.
The partitions returned from the graph will inherit the engine kind of the
graph.
Creates a new empty graph with an engine kind and a floating-point math
mode. All partitions returned from the graph will inherit the engine kind
and floating-point math mode.
Finalizes a graph. It means users have finished adding operations into the
graph and the graph is ready for partitioning. Adding a new operation into a
finalized graph will return failures. Similarly, partitioning on a
un-finalized graph will also return failures.
Returns the partitions from a filtered graph. Output partition instances
will be written into the parameter partitions. Users need to make sure
partitions is valid and has enough space to accept the partition
instances. Each output partition instance should be destroyed via
#dnnl_graph_partition_destroy explicitly after use.
Returns the memory size described by the logical tensor. If it’s a strided
layout, the size will be calculated by dims and strides. If it’s an
opaque layout, the size will be decided by layout_id.
Initializes a logical tensor with id, data type, number of dimensions,
layout type, and property. The logical tensor’s dims are unknown with this
interface.
Initializes a logical tensor with basic information and dims. The logical
tensor’s dimensions and layout will be initialized according to the input
arguments.
Compares if two logical tenors are equal. Users can decide accordingly
if layout reordering is needed for two logical tensors. The method will
return true for below two circumstances:
Compiles a partition with given input and output logical tensors. The output
logical tensors can contain unknown dimensions. For this case, the
compilation will deduce the output shapes according to input shapes. The
output logical tensors can also have layout type any. The compilation will
choose the optimal layout for output tensors. The optimal layout will be
represented as an opaque layout ID saved in the output logical tensor.
Creates a new partition with a given operator and engine kind. The API is
used to create a partition from an operation directly without creating the
graph and calling get_partitions(). The output partition contains only one
operation specified by the parameter. The output partition instance should
be destroyed via #dnnl_graph_partition_destroy after use.
Returns the supporting status of a partition. Some operations may not be
supported by the library under certain circumstances. During partitioning
stage, unsupported partitions will be returned to users with each containing
an unsupported operation. Users should check the supporting status of a
partition before transforming the computation graph or compiling the
partition.
Sets a number of compiled partitions that can be held in the compiled
partition cache at the same time. The default capacity of compiled partition
cache is 1024.
Control the enabling or disabling of constant tensor cache. This API must
be called once before compilation stage. By default, constant tensor cache is
disabled in the library.
Control the capacity for the constant tensor cache that used for specific
engine kind. This API is thread safe and can be called multiple times at
runtime. The capacity is set to zero by default which means the cache is
disabled. When calling this API, the corresponding cache will be flushed.
Setting capacity to 0 means to clear all cached tensors and disable cache.
Once the capacity limit is reached, no new tensors will be cached. If there
are multiple devices for an engine kind, the capacity set here is for each
device.
Creates a primitive descriptor for a layer normalization backward
propagation primitive with a user-provided data type for the
scale and shift memory objects.
Creates a primitive descriptor for a layer normalization forward propagation
primitive with a user-provided data type for the scale and shift
memory objects.
@param memory_desc Output memory descriptor.
@param parent_memory_desc An existing memory descriptor.
@param dims Sizes of the region.
@param offsets Offsets to the region from the encompassing
memory object in each dimension
@returns #dnnl_success on success and a status describing the error
otherwise.
Creates a memory descriptor by reshaping an existing one. The new
memory descriptor inherits the data type. This operation is valid only for
memory descriptors that have format_kind #dnnl_blocked or
#dnnl_format_kind_any.
Unmaps a memory object and writes back any changes made to the previously
mapped memory buffer. The pointer to the mapped buffer must be obtained
via the dnnl_memory_map_data() call.
Appends an accumulation v3 (sum) to post-ops. Prior to accumulating the
result, a zero point is subtracted from the previous value and is
multiplied by the scale.
Sets quantization scaling factors for RNN projection weights tensors. The
low-precision configuration of the RNN primitives expects input weights to
use the signed 8-bit integer data type. The scaling factors are used to
quantize floating-point data to signed integer and must be passed to RNN
primitives using attributes.
Sets quantization scaling factors for RNN weights tensors. The
low-precision configuration of the RNN primitives expects input weights to
use the signed 8-bit integer data type. The scaling factors are used to
quantize floating-point data to signed integer and must be passed to RNN
primitives using attributes.
Sets primitive attributes scaling factors for primitive operations for a
given memory argument. The scaling factors must be passed at execution time
as an argument with index #DNNL_ARG_ATTR_SCALES | arg.
Sets primitive attributes scaling factors for primitive operations for a
given memory argument. The scaling factors must be passed at execution time
as an argument with index #DNNL_ARG_ATTR_SCALES | arg.
Sets primitive attributes zero points for primitive operations for a given
memory argument. The zero points must be passed at execution time
as an argument with index #DNNL_ARG_ATTR_ZERO_POINTS | arg.
Sets primitive attributes zero points for primitive operations for a given
memory argument. The zero points must be passed at execution time
as an argument with index #DNNL_ARG_ATTR_ZERO_POINTS | arg.
@note If any argument in @p args is padded (padded_dims >
dims), the primitive execution will assume properly zero-padded
input arguments, and produce zero-padded output arguments.
Sets the hints flag for the CPU ISA. See #dnnl_cpu_isa_hints_t and
#dnnl::cpu_isa_hints for the list of the values accepted by the C and C++
API functions respectively.
Sets the maximal ISA the library can dispatch to on the CPU. See
#dnnl_cpu_isa_t and #dnnl::cpu_isa for the list of the values accepted by
the C and C++ API functions respectively.