pub struct Tensor { /* private fields */ }Expand description
A tensor wrapping a libtorch C++ tensor.
Owns the underlying C++ handle. When dropped, the C++ tensor is freed immediately — including any GPU memory. This is the entire VRAM management story.
Operations are chainable and return Result<Tensor>:
let y = x.matmul(&w)?.add(&b)?.relu()?;Implementations§
Source§impl Tensor
impl Tensor
Sourcepub fn zeros(shape: &[i64], opts: TensorOptions) -> Result<Self>
pub fn zeros(shape: &[i64], opts: TensorOptions) -> Result<Self>
Create a tensor filled with zeros.
let t = Tensor::zeros(&[2, 3], TensorOptions::default())?;
assert_eq!(t.shape(), vec![2, 3]);Sourcepub fn ones(shape: &[i64], opts: TensorOptions) -> Result<Self>
pub fn ones(shape: &[i64], opts: TensorOptions) -> Result<Self>
Create a tensor filled with ones. Like torch.ones().
let t = Tensor::ones(&[2, 3], TensorOptions::default())?;Sourcepub fn from_f32(data: &[f32], shape: &[i64], device: Device) -> Result<Self>
pub fn from_f32(data: &[f32], shape: &[i64], device: Device) -> Result<Self>
Create a tensor from f32 data.
let t = Tensor::from_f32(&[1.0, 2.0, 3.0, 4.0], &[2, 2], Device::CPU)?;
assert_eq!(t.shape(), vec![2, 2]);Sourcepub fn from_f64(data: &[f64], shape: &[i64], device: Device) -> Result<Self>
pub fn from_f64(data: &[f64], shape: &[i64], device: Device) -> Result<Self>
Create a Float64 tensor from f64 data. Use when full double precision is needed (e.g. loss accumulation, high-precision metrics).
Sourcepub fn from_i64(data: &[i64], shape: &[i64], device: Device) -> Result<Self>
pub fn from_i64(data: &[i64], shape: &[i64], device: Device) -> Result<Self>
Create an Int64 tensor from i64 data. Commonly used for class labels,
token indices, and any integer indexing (e.g. cross_entropy_loss targets).
Sourcepub fn shape(&self) -> Vec<i64>
pub fn shape(&self) -> Vec<i64>
Shape of each dimension as a Vec. Like tensor.shape in PyTorch.
Sourcepub fn numel(&self) -> i64
pub fn numel(&self) -> i64
Total number of elements (product of all dimensions). Like tensor.numel().
Sourcepub fn device(&self) -> Device
pub fn device(&self) -> Device
Device where this tensor’s data resides (CPU or CUDA). Like tensor.device.
Sourcepub fn to_f32_vec(&self) -> Result<Vec<f32>>
pub fn to_f32_vec(&self) -> Result<Vec<f32>>
Copy tensor data to a Vec<f32>. Transparently moves to CPU first
if the tensor lives on CUDA. Non-f32 dtypes are cast via libtorch.
Sourcepub fn to_f64_vec(&self) -> Result<Vec<f64>>
pub fn to_f64_vec(&self) -> Result<Vec<f64>>
Copy tensor data to a Vec<f64>. Moves to CPU if needed.
Float64 tensors are copied at full precision. All other dtypes
go through f32 (lossless for f16/bf16, and the best f32 can offer).
Sourcepub fn to_i64_vec(&self) -> Result<Vec<i64>>
pub fn to_i64_vec(&self) -> Result<Vec<i64>>
Copy tensor data to a Vec<i64>. Moves to CPU if needed.
Intended for Int64 tensors (indices, labels).
Sourcepub fn item(&self) -> Result<f64>
pub fn item(&self) -> Result<f64>
Extract a scalar value as f64. Like PyTorch’s .item().
The tensor must contain exactly one element (any shape is fine,
e.g. [1], [1, 1], or []). Returns an error otherwise.
Preserves full precision for Float64 tensors.
let loss_val = loss_tensor.item()?;
println!("loss: {:.4}", loss_val);Sourcepub fn add(&self, other: &Tensor) -> Result<Tensor>
pub fn add(&self, other: &Tensor) -> Result<Tensor>
Element-wise addition. Shapes must be broadcastable.
let c = a.add(&b)?; // [2, 3] + [2, 3] → [2, 3]Sourcepub fn sub(&self, other: &Tensor) -> Result<Tensor>
pub fn sub(&self, other: &Tensor) -> Result<Tensor>
Element-wise subtraction. Shapes must be broadcastable.
Sourcepub fn mul(&self, other: &Tensor) -> Result<Tensor>
pub fn mul(&self, other: &Tensor) -> Result<Tensor>
Element-wise (Hadamard) multiplication. Shapes must be broadcastable.
For matrix multiplication, use matmul.
Sourcepub fn matmul(&self, other: &Tensor) -> Result<Tensor>
pub fn matmul(&self, other: &Tensor) -> Result<Tensor>
Matrix multiplication.
// [batch, M, K] @ [batch, K, N] → [batch, M, N]
let c = a.matmul(&b)?;Sourcepub fn mul_scalar(&self, scalar: f64) -> Result<Tensor>
pub fn mul_scalar(&self, scalar: f64) -> Result<Tensor>
Multiply every element by a scalar. Like tensor * 0.5 in PyTorch.
Sourcepub fn flatten(&self, start_dim: i32, end_dim: i32) -> Result<Tensor>
pub fn flatten(&self, start_dim: i32, end_dim: i32) -> Result<Tensor>
Flatten dimensions [start_dim..=end_dim] into one.
Sourcepub fn add_scalar(&self, scalar: f64) -> Result<Tensor>
pub fn add_scalar(&self, scalar: f64) -> Result<Tensor>
Add a scalar to every element.
Sourcepub fn div_scalar(&self, scalar: f64) -> Result<Tensor>
pub fn div_scalar(&self, scalar: f64) -> Result<Tensor>
Divide every element by a scalar.
Sourcepub fn triu(&self, diagonal: i64) -> Result<Tensor>
pub fn triu(&self, diagonal: i64) -> Result<Tensor>
Upper triangle of a matrix (or batch of matrices).
Elements below the diagonal-th diagonal are zeroed.
diagonal=0 keeps the main diagonal; diagonal=1 excludes it.
Sourcepub fn pow_scalar(&self, exponent: f64) -> Result<Tensor>
pub fn pow_scalar(&self, exponent: f64) -> Result<Tensor>
Raise every element to a scalar exponent.
Sourcepub fn gt_scalar(&self, scalar: f64) -> Result<Tensor>
pub fn gt_scalar(&self, scalar: f64) -> Result<Tensor>
Element-wise greater-than comparison against a scalar.
Sourcepub fn reshape(&self, shape: &[i64]) -> Result<Tensor>
pub fn reshape(&self, shape: &[i64]) -> Result<Tensor>
Reshape to a new shape (must have same total elements). Use -1 for one inferred dimension.
let flat = t.reshape(&[-1])?; // [2, 3] → [6]Sourcepub fn transpose(&self, dim0: i32, dim1: i32) -> Result<Tensor>
pub fn transpose(&self, dim0: i32, dim1: i32) -> Result<Tensor>
Swap two dimensions.
let t = x.transpose(0, 1)?; // [M, N] → [N, M]Sourcepub fn narrow(&self, dim: i32, start: i64, length: i64) -> Result<Tensor>
pub fn narrow(&self, dim: i32, start: i64, length: i64) -> Result<Tensor>
Narrow (slice) along a dimension: returns a view.
Sourcepub fn narrow_scatter(
&self,
src: &Tensor,
dim: i32,
start: i64,
) -> Result<Tensor>
pub fn narrow_scatter( &self, src: &Tensor, dim: i32, start: i64, ) -> Result<Tensor>
Scatter a narrow slice back into a tensor (for narrow backward).
Sourcepub fn cat(&self, other: &Tensor, dim: i32) -> Result<Tensor>
pub fn cat(&self, other: &Tensor, dim: i32) -> Result<Tensor>
Concatenate two tensors along a dimension.
Sourcepub fn cat_many(tensors: &[&Tensor], dim: i32) -> Result<Tensor>
pub fn cat_many(tensors: &[&Tensor], dim: i32) -> Result<Tensor>
Concatenate multiple tensors along an existing dimension.
All tensors must have the same shape except in the concatenation dimension. Uses a single kernel launch regardless of the number of tensors.
Sourcepub fn stack(tensors: &[&Tensor], dim: i32) -> Result<Tensor>
pub fn stack(tensors: &[&Tensor], dim: i32) -> Result<Tensor>
Stack tensors along a new dimension.
All tensors must have the same shape. A new dimension is inserted at dim.
Sourcepub fn log_softmax(&self, dim: i32) -> Result<Tensor>
pub fn log_softmax(&self, dim: i32) -> Result<Tensor>
Log-softmax along a dimension (numerically stable).
Sourcepub fn native_layer_norm(
&self,
weight: &Tensor,
bias: &Tensor,
normalized_size: i64,
eps: f64,
) -> Result<(Tensor, Tensor, Tensor)>
pub fn native_layer_norm( &self, weight: &Tensor, bias: &Tensor, normalized_size: i64, eps: f64, ) -> Result<(Tensor, Tensor, Tensor)>
Native layer normalization. Returns (output, mean, rstd).
Sourcepub fn select(&self, dim: i32, index: i64) -> Result<Tensor>
pub fn select(&self, dim: i32, index: i64) -> Result<Tensor>
Select a single index along a dimension (reduces that dim).
Sourcepub fn index_select(&self, dim: i32, index: &Tensor) -> Result<Tensor>
pub fn index_select(&self, dim: i32, index: &Tensor) -> Result<Tensor>
Select rows/elements along a dimension using an index tensor.
Sourcepub fn index_add(
&self,
dim: i32,
index: &Tensor,
src: &Tensor,
) -> Result<Tensor>
pub fn index_add( &self, dim: i32, index: &Tensor, src: &Tensor, ) -> Result<Tensor>
Scatter-add src into self along dim at positions given by index.
Sourcepub fn zeros_like(t: &Tensor) -> Result<Tensor>
pub fn zeros_like(t: &Tensor) -> Result<Tensor>
Create a tensor of zeros with the same shape, dtype, and device as t.
Sourcepub fn ones_like(t: &Tensor) -> Result<Tensor>
pub fn ones_like(t: &Tensor) -> Result<Tensor>
Create a tensor of ones with the same shape, dtype, and device as t.
Sourcepub fn rand(shape: &[i64], opts: TensorOptions) -> Result<Self>
pub fn rand(shape: &[i64], opts: TensorOptions) -> Result<Self>
Create a tensor with uniform random values in [0, 1).
Sourcepub fn randn(shape: &[i64], opts: TensorOptions) -> Result<Self>
pub fn randn(shape: &[i64], opts: TensorOptions) -> Result<Self>
Create a tensor with standard normal random values (mean=0, std=1).
Sourcepub fn conv2d(
&self,
weight: &Tensor,
bias: Option<&Tensor>,
stride: [i64; 2],
padding: [i64; 2],
dilation: [i64; 2],
groups: i64,
) -> Result<Tensor>
pub fn conv2d( &self, weight: &Tensor, bias: Option<&Tensor>, stride: [i64; 2], padding: [i64; 2], dilation: [i64; 2], groups: i64, ) -> Result<Tensor>
2D convolution. bias may be a null-handle tensor for no bias.
Sourcepub fn conv_transpose2d(
&self,
weight: &Tensor,
bias: Option<&Tensor>,
stride: [i64; 2],
padding: [i64; 2],
output_padding: [i64; 2],
dilation: [i64; 2],
groups: i64,
) -> Result<Tensor>
pub fn conv_transpose2d( &self, weight: &Tensor, bias: Option<&Tensor>, stride: [i64; 2], padding: [i64; 2], output_padding: [i64; 2], dilation: [i64; 2], groups: i64, ) -> Result<Tensor>
Transposed 2D convolution.
Sourcepub fn linear(&self, weight: &Tensor, bias: Option<&Tensor>) -> Result<Tensor>
pub fn linear(&self, weight: &Tensor, bias: Option<&Tensor>) -> Result<Tensor>
Fused linear: y = input @ weight^T + bias (single ATen kernel).
Sourcepub fn gru_cell(
&self,
hx: &Tensor,
w_ih: &Tensor,
w_hh: &Tensor,
b_ih: &Tensor,
b_hh: &Tensor,
) -> Result<Tensor>
pub fn gru_cell( &self, hx: &Tensor, w_ih: &Tensor, w_hh: &Tensor, b_ih: &Tensor, b_hh: &Tensor, ) -> Result<Tensor>
Fused GRU cell: single ATen gru_cell kernel.
Returns new hidden state h’.
Sourcepub fn lstm_cell(
&self,
hx: &Tensor,
cx: &Tensor,
w_ih: &Tensor,
w_hh: &Tensor,
b_ih: &Tensor,
b_hh: &Tensor,
) -> Result<(Tensor, Tensor)>
pub fn lstm_cell( &self, hx: &Tensor, cx: &Tensor, w_ih: &Tensor, w_hh: &Tensor, b_ih: &Tensor, b_hh: &Tensor, ) -> Result<(Tensor, Tensor)>
Fused LSTM cell: single ATen lstm_cell kernel.
Returns (h', c').
Sourcepub fn mse_loss(&self, target: &Tensor, reduction: i64) -> Result<Tensor>
pub fn mse_loss(&self, target: &Tensor, reduction: i64) -> Result<Tensor>
Fused MSE loss: single libtorch kernel. reduction: 0=None, 1=Mean, 2=Sum.
Sourcepub fn cross_entropy_loss(
&self,
target: &Tensor,
reduction: i64,
ignore_index: i64,
label_smoothing: f64,
) -> Result<Tensor>
pub fn cross_entropy_loss( &self, target: &Tensor, reduction: i64, ignore_index: i64, label_smoothing: f64, ) -> Result<Tensor>
Fused cross-entropy loss: single libtorch kernel. pred: [N,C] logits. target: [N] Int64 indices or [N,C] Float probs. reduction: 0=None, 1=Mean, 2=Sum.
Sourcepub fn bce_with_logits_loss(
&self,
target: &Tensor,
reduction: i64,
) -> Result<Tensor>
pub fn bce_with_logits_loss( &self, target: &Tensor, reduction: i64, ) -> Result<Tensor>
Fused BCE with logits loss: single libtorch kernel. Numerically stable binary cross-entropy from raw logits. reduction: 0=None, 1=Mean, 2=Sum.
Sourcepub fn l1_loss(&self, target: &Tensor, reduction: i64) -> Result<Tensor>
pub fn l1_loss(&self, target: &Tensor, reduction: i64) -> Result<Tensor>
Fused L1 loss: single libtorch kernel. reduction: 0=None, 1=Mean, 2=Sum.
Sourcepub fn smooth_l1_loss(
&self,
target: &Tensor,
reduction: i64,
beta: f64,
) -> Result<Tensor>
pub fn smooth_l1_loss( &self, target: &Tensor, reduction: i64, beta: f64, ) -> Result<Tensor>
Fused Smooth L1 (Huber) loss: single libtorch kernel. reduction: 0=None, 1=Mean, 2=Sum. beta: transition point.
Sourcepub fn kl_div_loss(
&self,
target: &Tensor,
reduction: i64,
log_target: bool,
) -> Result<Tensor>
pub fn kl_div_loss( &self, target: &Tensor, reduction: i64, log_target: bool, ) -> Result<Tensor>
Fused KL divergence loss: single libtorch kernel. input: log-probabilities. target: probabilities. reduction: 0=None, 1=Mean, 2=Sum, 5=BatchMean.
Sourcepub fn batch_norm(
&self,
weight: Option<&Tensor>,
bias: Option<&Tensor>,
running_mean: Option<&Tensor>,
running_var: Option<&Tensor>,
training: bool,
momentum: f64,
eps: f64,
) -> Result<Tensor>
pub fn batch_norm( &self, weight: Option<&Tensor>, bias: Option<&Tensor>, running_mean: Option<&Tensor>, running_var: Option<&Tensor>, training: bool, momentum: f64, eps: f64, ) -> Result<Tensor>
Fused batch normalization: single libtorch kernel. When training=true, updates running_mean/running_var in-place.
Sourcepub fn dropout(&self, p: f64, training: bool) -> Result<Tensor>
pub fn dropout(&self, p: f64, training: bool) -> Result<Tensor>
Fused dropout: single libtorch kernel with inverted scaling.
Sourcepub fn feature_dropout(&self, p: f64, training: bool) -> Result<Tensor>
pub fn feature_dropout(&self, p: f64, training: bool) -> Result<Tensor>
Fused 2D feature dropout: drops entire channels.
Sourcepub fn linspace(
start: f64,
end: f64,
steps: i64,
opts: TensorOptions,
) -> Result<Self>
pub fn linspace( start: f64, end: f64, steps: i64, opts: TensorOptions, ) -> Result<Self>
Create evenly spaced values.
Sourcepub fn arange(
start: f64,
end: f64,
step: f64,
opts: TensorOptions,
) -> Result<Self>
pub fn arange( start: f64, end: f64, step: f64, opts: TensorOptions, ) -> Result<Self>
Create a range of values [start, end) with given step.
Sourcepub fn min_dim(&self, dim: i32, keepdim: bool) -> Result<Tensor>
pub fn min_dim(&self, dim: i32, keepdim: bool) -> Result<Tensor>
Minimum along a dimension (values only).
Sourcepub fn max_dim(&self, dim: i32, keepdim: bool) -> Result<Tensor>
pub fn max_dim(&self, dim: i32, keepdim: bool) -> Result<Tensor>
Maximum along a dimension (values only).
Sourcepub fn ge_scalar(&self, scalar: f64) -> Result<Tensor>
pub fn ge_scalar(&self, scalar: f64) -> Result<Tensor>
Element-wise greater-than-or-equal comparison against a scalar.
Sourcepub fn le_scalar(&self, scalar: f64) -> Result<Tensor>
pub fn le_scalar(&self, scalar: f64) -> Result<Tensor>
Element-wise less-than-or-equal comparison against a scalar.
Sourcepub fn lt_scalar(&self, scalar: f64) -> Result<Tensor>
pub fn lt_scalar(&self, scalar: f64) -> Result<Tensor>
Element-wise less-than comparison against a scalar.
Sourcepub fn select_scatter(
&self,
src: &Tensor,
dim: i32,
index: i64,
) -> Result<Tensor>
pub fn select_scatter( &self, src: &Tensor, dim: i32, index: i64, ) -> Result<Tensor>
Scatter a selected index back into a tensor.
Sourcepub fn where_cond(condition: &Tensor, x: &Tensor, y: &Tensor) -> Result<Tensor>
pub fn where_cond(condition: &Tensor, x: &Tensor, y: &Tensor) -> Result<Tensor>
Conditional select: where(condition, self, other).
Sourcepub fn adaptive_avg_pool2d(&self, output_size: [i64; 2]) -> Result<Tensor>
pub fn adaptive_avg_pool2d(&self, output_size: [i64; 2]) -> Result<Tensor>
Adaptive average pooling to target spatial size.
Sourcepub fn grid_sample(
&self,
grid: &Tensor,
mode: i32,
padding_mode: i32,
align_corners: bool,
) -> Result<Tensor>
pub fn grid_sample( &self, grid: &Tensor, mode: i32, padding_mode: i32, align_corners: bool, ) -> Result<Tensor>
Grid sampling (bilinear/nearest interpolation).
Sourcepub fn all_finite(&self) -> Result<bool>
pub fn all_finite(&self) -> Result<bool>
Check if all elements are finite (no inf/nan).
Sourcepub fn gt(&self, other: &Tensor) -> Result<Tensor>
pub fn gt(&self, other: &Tensor) -> Result<Tensor>
Element-wise greater-than (returns float mask: 0.0 or 1.0).
Sourcepub fn lt(&self, other: &Tensor) -> Result<Tensor>
pub fn lt(&self, other: &Tensor) -> Result<Tensor>
Element-wise less-than (returns float mask: 0.0 or 1.0).
Sourcepub fn ge(&self, other: &Tensor) -> Result<Tensor>
pub fn ge(&self, other: &Tensor) -> Result<Tensor>
Element-wise greater-than-or-equal (returns float mask: 0.0 or 1.0).
Sourcepub fn le(&self, other: &Tensor) -> Result<Tensor>
pub fn le(&self, other: &Tensor) -> Result<Tensor>
Element-wise less-than-or-equal (returns float mask: 0.0 or 1.0).
Sourcepub fn eq_tensor(&self, other: &Tensor) -> Result<Tensor>
pub fn eq_tensor(&self, other: &Tensor) -> Result<Tensor>
Element-wise equality. Returns a mask (0.0 or 1.0) in the input’s dtype for float inputs, or Float32 for integer/bool inputs.
Sourcepub fn ne_tensor(&self, other: &Tensor) -> Result<Tensor>
pub fn ne_tensor(&self, other: &Tensor) -> Result<Tensor>
Element-wise not-equal. Returns a mask (0.0 or 1.0) in the input’s dtype for float inputs, or Float32 for integer/bool inputs.
Sourcepub fn var_dim(&self, dim: i32, keepdim: bool) -> Result<Tensor>
pub fn var_dim(&self, dim: i32, keepdim: bool) -> Result<Tensor>
Variance along a dimension (Bessel-corrected).
Sourcepub fn std_dim(&self, dim: i32, keepdim: bool) -> Result<Tensor>
pub fn std_dim(&self, dim: i32, keepdim: bool) -> Result<Tensor>
Standard deviation along a dimension (Bessel-corrected).
Sourcepub fn reciprocal(&self) -> Result<Tensor>
pub fn reciprocal(&self) -> Result<Tensor>
Element-wise reciprocal (1/x).
Sourcepub fn gather(&self, dim: i32, index: &Tensor) -> Result<Tensor>
pub fn gather(&self, dim: i32, index: &Tensor) -> Result<Tensor>
Gather values along a dimension using an index tensor.
Sourcepub fn scatter_add(
&self,
dim: i32,
index: &Tensor,
src: &Tensor,
) -> Result<Tensor>
pub fn scatter_add( &self, dim: i32, index: &Tensor, src: &Tensor, ) -> Result<Tensor>
Scatter-add: accumulate src into self at index positions along dim.
Sourcepub fn topk(
&self,
k: i64,
dim: i32,
largest: bool,
sorted: bool,
) -> Result<(Tensor, Tensor)>
pub fn topk( &self, k: i64, dim: i32, largest: bool, sorted: bool, ) -> Result<(Tensor, Tensor)>
Top-k values and indices along a dimension. Returns (values, indices).
Sourcepub fn sort(&self, dim: i32, descending: bool) -> Result<(Tensor, Tensor)>
pub fn sort(&self, dim: i32, descending: bool) -> Result<(Tensor, Tensor)>
Sort along a dimension. Returns (sorted_values, indices).
Sourcepub fn eye(n: i64, opts: TensorOptions) -> Result<Self>
pub fn eye(n: i64, opts: TensorOptions) -> Result<Self>
Create an identity matrix of size n x n.
Sourcepub fn full(shape: &[i64], value: f64, opts: TensorOptions) -> Result<Self>
pub fn full(shape: &[i64], value: f64, opts: TensorOptions) -> Result<Self>
Create a tensor filled with a scalar value.
Sourcepub fn batches(&self, batch_size: i64) -> Result<Vec<Tensor>>
pub fn batches(&self, batch_size: i64) -> Result<Vec<Tensor>>
Split tensor into batches of batch_size along dimension 0.
The last batch may be smaller if the tensor size isn’t evenly divisible.
let data = Tensor::randn(&[100, 4], opts)?;
for batch in data.batches(32)? {
let x = Variable::new(batch, false);
// ...
}Sourcepub fn chunk(&self, chunks: i32, dim: i32) -> Result<Vec<Tensor>>
pub fn chunk(&self, chunks: i32, dim: i32) -> Result<Vec<Tensor>>
Split tensor into chunks along a dimension.
Sourcepub fn repeat(&self, repeats: &[i64]) -> Result<Tensor>
pub fn repeat(&self, repeats: &[i64]) -> Result<Tensor>
Repeat the tensor along each dimension.
Sourcepub fn pad(&self, padding: &[i64], value: f64) -> Result<Tensor>
pub fn pad(&self, padding: &[i64], value: f64) -> Result<Tensor>
Constant-value padding. Padding format matches PyTorch: [left, right, top, bottom, …].
Sourcepub fn unsqueeze_many(&self, dims: &[i32]) -> Result<Tensor>
pub fn unsqueeze_many(&self, dims: &[i32]) -> Result<Tensor>
Insert multiple dimensions of size 1. Dims are sorted ascending and applied sequentially.
Sourcepub fn meshgrid(tensors: &[&Tensor]) -> Result<Vec<Tensor>>
pub fn meshgrid(tensors: &[&Tensor]) -> Result<Vec<Tensor>>
Compute meshgrid from a slice of 1-D tensors (always “ij” indexing).
Sourcepub fn cdist(&self, other: &Tensor) -> Result<Tensor>
pub fn cdist(&self, other: &Tensor) -> Result<Tensor>
Pairwise L2 distance between rows of two batched matrices.
Input shapes: [B, P, D] and [B, R, D] -> output [B, P, R].
Sourcepub fn cdist_p(&self, other: &Tensor, p: f64) -> Result<Tensor>
pub fn cdist_p(&self, other: &Tensor, p: f64) -> Result<Tensor>
Pairwise distance with custom p-norm.
Sourcepub fn to_device(&self, device: Device) -> Result<Tensor>
pub fn to_device(&self, device: Device) -> Result<Tensor>
Move this tensor to a different device (CPU or CUDA). Returns a new tensor; the original is unchanged.
let gpu = t.to_device(Device::CUDA(0))?;Sourcepub fn to_device_of(&self, other: &Tensor) -> Result<Tensor>
pub fn to_device_of(&self, other: &Tensor) -> Result<Tensor>
Move this tensor to the same device as other.
No-op (returns a clone) if both are already on the same device.
let x = x.to_device_of(&weights)?; // ensure same deviceSourcepub fn set_requires_grad(&self, requires_grad: bool) -> Result<Tensor>
pub fn set_requires_grad(&self, requires_grad: bool) -> Result<Tensor>
Set requires_grad on this tensor. Returns a new tensor that shares storage but has the grad flag set. This enables libtorch’s native autograd tracking for all subsequent operations.
Sourcepub fn requires_grad(&self) -> bool
pub fn requires_grad(&self) -> bool
Check whether this tensor requires gradient computation.
Sourcepub fn backward(&self) -> Result<()>
pub fn backward(&self) -> Result<()>
Run backward pass from this scalar tensor. Populates .grad() on all leaf tensors in the computation graph.
Sourcepub fn grad(&self) -> Option<Tensor>
pub fn grad(&self) -> Option<Tensor>
Get the accumulated gradient for this tensor, if any. Returns None if no gradient has been computed.
Sourcepub fn set_grad(&self, grad: &Tensor) -> Result<()>
pub fn set_grad(&self, grad: &Tensor) -> Result<()>
Replace the gradient tensor (for gradient clipping / unscaling).
Sourcepub fn zero_grad_set_to_none(&self)
pub fn zero_grad_set_to_none(&self)
Null out the gradient pointer instead of zeroing the data. No CUDA kernel — just resets the grad tensor to undefined. This is what PyTorch does by default since 1.7.
Sourcepub fn clip_grad_norm_fused(params: &[Tensor], max_norm: f64) -> Result<f64>
pub fn clip_grad_norm_fused(params: &[Tensor], max_norm: f64) -> Result<f64>
Fused clip_grad_norm: compute global L2 norm across all param grads and scale in-place if it exceeds max_norm. Single C++ call. Returns the original total norm before clipping.
Sourcepub fn is_leaf(&self) -> bool
pub fn is_leaf(&self) -> bool
Whether this tensor is a leaf in the autograd graph. A tensor is a leaf if it was created by the user (not by an op) or if it doesn’t require grad.
Sourcepub fn autograd_node_count(&self) -> i64
pub fn autograd_node_count(&self) -> i64
Count unique autograd nodes reachable from this tensor’s grad_fn. Returns 0 for leaf tensors or tensors without gradient tracking. This is the number of backward operations libtorch will execute.
Sourcepub fn detach(&self) -> Result<Tensor>
pub fn detach(&self) -> Result<Tensor>
Detach from the computation graph. Returns a new tensor that shares storage but has no autograd history.
Sourcepub fn detach_(&self) -> Result<()>
pub fn detach_(&self) -> Result<()>
In-place detach: sever the grad_fn chain on this tensor without allocating a new handle. After this call the tensor’s autograd_meta no longer references any C++ Node objects, allowing the autograd graph to be freed immediately rather than when the tensor is dropped.
Sourcepub fn mul_scalar_(&self, scalar: f64) -> Result<()>
pub fn mul_scalar_(&self, scalar: f64) -> Result<()>
In-place scalar multiply: self *= scalar
Sourcepub fn add_scalar_(&self, scalar: f64) -> Result<()>
pub fn add_scalar_(&self, scalar: f64) -> Result<()>
In-place scalar add: self += scalar
Sourcepub fn adam_step(
&self,
grad: &Tensor,
m: &Tensor,
v: &Tensor,
lr: f64,
beta1: f64,
beta2: f64,
eps: f64,
weight_decay: f64,
step: i64,
) -> Result<()>
pub fn adam_step( &self, grad: &Tensor, m: &Tensor, v: &Tensor, lr: f64, beta1: f64, beta2: f64, eps: f64, weight_decay: f64, step: i64, ) -> Result<()>
Fused Adam/AdamW step: updates param, m, and v tensors in-place.
Performs the full Adam update in a single FFI call (~5 kernel launches instead of ~16), eliminating temporary tensor allocations.
self— parameter tensor (updated in-place)grad— gradient (read-only)m,v— moment buffers (updated in-place)weight_decay— 0.0 for Adam, >0 for AdamW (decoupled)step— timestep for bias correction
Sourcepub fn adam_step_batched(
params: &[Tensor],
grads: &[Tensor],
ms: &[Tensor],
vs: &[Tensor],
lrs: &mut [f64],
beta1: f64,
beta2: f64,
eps: f64,
weight_decay: f64,
step: i64,
) -> Result<()>
pub fn adam_step_batched( params: &[Tensor], grads: &[Tensor], ms: &[Tensor], vs: &[Tensor], lrs: &mut [f64], beta1: f64, beta2: f64, eps: f64, weight_decay: f64, step: i64, ) -> Result<()>
Perform Adam/AdamW update on all params in one C++ loop.
Eliminates per-param FFI overhead. lrs[i] supports per-group LR.
Sourcepub fn pin_memory(&self) -> Result<Tensor>
pub fn pin_memory(&self) -> Result<Tensor>
Copy this CPU tensor into page-locked (pinned) memory.
Pinned memory enables async CPU→GPU transfers via cudaMemcpyAsync.
Only valid for CPU tensors. Returns a new tensor in pinned memory.