Module dfdx::tensor_ops
source · Expand description
Operations on tensors like relu(), matmul(), softmax(), and more.
Generic function and struct methods
All functionality is provided in two ways.
- The generic standalone function that takes a generic parameter. e.g. relu().
- The struct method for tensor structs. e.g. crate::tensor::Tensor::relu().
The functions are all just pass throughs to the tensor methods.
Fallibility
All tensor methods also have a try_*
variant, like crate::tensor::Tensor::relu() and crate::tensor::Tensor::try_relu().
These methods return a Result, where the error in most cases indicates an allocation error.
Axes/Dimensions for broadcasting/reductions/selecting
For the following sections, some traits/functions utilizing const isize
to determine
the axis to apply the transformation to.
Here are the valid axes for each tensor:
- 0d tensor:
Axis<0>
- 1d tensor:
Axis<0>
- 2d tensor:
Axis<0>
,Axis<1>
- 3d tensor:
Axis<0>
,Axis<1>
,Axis<2>
, - 4d tensor:
Axis<0>
,Axis<1>
,Axis<2>
,Axis<3>
- etc.
To specify multiple axes you can use Axes2
, Axes3
, and Axes4
Reductions
There are a number of methods that reduce 1 or more axes.Anything that can be reduced can also be broadcasted back to the original shape using BroadcastTo.
Each axis reducing function has two generic parameters:
- The target shape
- The axes to reduce along You only need to specify one of these! Generally it is better practice to specify the target shape, unless it is ambiguous in which case you should specify the axes.
For example:
let t: Tensor<Rank3<2, 4, 6>, f32, _> = dev.zeros();
// shape version
let _ = t.clone().sum::<Rank1<4>, _>();
// axes version
let _ = t.clone().sum::<_, Axes2<0, 2>>();
// typed version
let _: Tensor<Rank1<4>, _, _> = t.clone().sum();
Complete list of reductions:
Broadcasts
Broadcasting tensors is provided through the BroadcastTo trait. Similar to reductions there are two generic parameters to broadcast:
- (Required) The target shape
- (usually optional) The axes of the result type to broadcast You’ll only need to specify axes if the shape makes the broadcasts ambiguous.
For example:
let t: Tensor<Rank1<4>, f32, _> = dev.zeros();
// shape version
let _ = t.clone().broadcast::<Rank3<2, 4, 6>, _>();
// typed version
let _: Tensor<Rank3<2, 4, 6>, _, _> = t.clone().broadcast();
Rust can also infer the output type if you use it in another operation:
let big: Tensor<Rank2<2, 5>, f32, _> = dev.zeros();
let small: Tensor<Rank1<5>, f32, _> = dev.zeros();
let _ = big + small.broadcast();
Permutes
Permuting has an identical interface to broadcasts/reductions:
let t: Tensor<Rank3<2, 3, 4>, f32, _> = dev.zeros();
// shape version
let _ = t.clone().permute::<Rank3<3, 4, 2>, _>();
// axes version
let _ = t.clone().permute::<_, Axes3<1, 2, 0>>();
Indexing using select and gather
Two traits provide indexing capability SelectTo and GatherTo. The difference is:
- SelectTo::select allows you to select a single value
- GatherTo::gather allows you select multiple values from the same axis.
For example you can select from the 0th axis like so:
let t = dev.tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]);
let r: Tensor<Rank1<3>, f32, _> = t.select(dev.tensor(1));
assert_eq!(r.array(), [4.0, 5.0, 6.0]);
Or you can gather from the 0th axis to select multiple entries:
let t = dev.tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]);
let r: Tensor<Rank2<3, 3>, f32, _> = t.gather(dev.tensor([1, 1, 0]));
assert_eq!(r.array(), [
[4.0, 5.0, 6.0],
[4.0, 5.0, 6.0],
[1.0, 2.0, 3.0],
]);
To select from anything after the 0th axis, you need a multi-dimensional axis. See GatherTo and SelectTo docstrings for examples of this.
But you can use BroadcastTo to make this easy! In this example we select the same index from the 1st axis of a tensor:
let t = dev.tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]);
let r = t.select::<Rank1<2>, _>(dev.tensor(1).broadcast());
assert_eq!(r.array(), [2.0, 5.0]);
Structs
- Configuration of hyperparameters for crate::optim::Adam.
- Upscales images using bilinear interpolation between a pixels neighbors
- Upscales images using a pixel’s nearest neighbor.
- Configuration of hyperparameters for crate::optim::RMSprop.
- Configuration of hyperparameters for crate::optim::Sgd.
Enums
- Momentum used for crate::optim::Sgd and others
- L2 and decoupled regularization methods
Traits
- Runs backprop algorithm with all operations contained in the tape that
t
has. - Broadcast self into a new shape.
- Choose values from two tensors using a boolean mask. Equivalent to
torch.where
from pytorch. - A Storage that requires all the tensor ops implementations
- Select multiple values from a single axis, replacing that dimension with a different one. Equivalent to
torch.gather
from pytorch. - Reduction along multiple axes using LogSumExp.
- Reduction along multiple axes using
max
. - Reduction along multiple axes using
mean
. - Reduction along multiple axes using
min
. - Changes order of dimensions/axes in a tensor.
- Realizes the concrete shape of the tensor as another compatable shape, or returns the original tensor if the new shape’s dimensions are incompatable.
- Changes the shape of a tensor without re-ordering axes. If the tensor is contiguous already, then no data movement will occur. If the tensor is not contiguous, the result of this will be contiguous.
- Shifts data along an axis by a specified amount.
- Select a single value from a single dimension, removing that dimension from the shape. Equivalent to
torch.select
from pytorch. - Reduction along multiple axes using standard deviation.
- Reduction along multiple axes using
sum
. - Fallible version of std::ops::Add. See add
- AttentionReshape qkv + past_key + past_value into (q, k, v) used in attention layer
- TryConcatDeprecatedConcatenate two tensors along the first dimension.
- Concatenate two tensors along a given axis.
- Apply the 2d convolution to a tensor.
- Fallible version of std::ops::Div. See div
- Fallible matrix multiplication. See matmul for examples.
- Fallible version of std::ops::Mul. See mul.
- Parametric Rectified Linear Unit (PReLU).
max(0, lhs) + rhs*min(0, lhs)
- Stack an array or vec of tensors together along a new dimension.
- Fallible version of std::ops::Sub. See sub
- Upscales an image to a new shape. Valid methods of upscaling are:
- Reduction alogn multiple axes using variance
Functions
- Absolute value (abs).
|t|
- Accurate Gaussian Linear Unit (GeLU). This is defined as
x * Phi(x)
wherePhi(x)
is the cumulative distribution function of a standard normal distribution. This can be calculated via the Error Functionerf(x)
using - Element wise and scalar addition.
- Elementwise
a * alpha + b * beta
. - Binary Cross Entropy With Logits in numerically stable way.
- Element wise and scalar boolean ‘and’.
- Inverts each value in a boolean tensor.
- Element wise and scalar boolean ‘or’.
- Element wise and scalar boolean ‘xor’.
- Clamp all elements between the provided min and max values.
- Element wise and scalar division.
- Zeros elements with probability
p
and scales all elements by1 / (1 - p)
. - Element-wise equality comparison.
==
- Fast Gaussian Linear Unit (GeLU). A fast version of the gaussiane linear unit calculated by
- Element-wise greater than or equals comparison.
>=
- geluDeprecatedUse fast_gelu instead
- Element-wise strictly greater than comparison.
>
- Huber Loss uses absolute error when the error is higher than
beta
, and squared error when the error is lower thanbeta
. - Element-wise less than or equals comparison.
<=
- Computes
prelu
, but with a scalar value.max(0, t) + a*min(0, t)
- Natural Logarithm (ln).
log_e(t)
. log(softmax(t))
in numerically stable way acrossAx
. Doest - logsumexp(t)
under the hood.- Applies a 2D lower triangular mask by setting values above the diagonal to
E::default()
. - Element-wise strictly less than comparison.
<
- Matrix * Matrix, Vector * Matrix, Vector * Vector, and broadcasted/batched versions.
- Element wise maximum.
- Element wise minimum.
- Element wise and scalar multiplication.
- Replaces any std::f32::NAN with
value
. - Element-wise inequality comparison.
!=
- Negates all elements.
- Normalizes
t
to have mean0.0
and stddev1.0
alongAx
.epsilon
is used during stddev. Computes(t - t.mean(Ax)) / t.std(Ax, epsilon)
. - Raises to a float power;
t^i
. - Raises to an integer power;
t^i
. - Parametric Rectified Linear Unit (PReLU).
max(0, lhs) + rhs*min(0, lhs)
1 / x
- Rectified Linear Unit (ReLU).
max(0, t)
- Sigmoid.
1 / (1 + exp(-t))
. - Slices all dimensions of a tensor, with the starting and ending indices of each dimension determined by a tuple of ranges.
- Computes the softmax function across
Ax
. √t
ort^0.5
t^2
- Element wise and scalar subtraction.
- Copies the elements of a tensor, converting its data to a different dtype.
- Applies a 2D upper triangular mask by setting values below the diagonal to
E::default()
.