torsh-tensor
PyTorch-compatible tensor implementation for ToRSh, built on top of scirs2.
Overview
This crate provides the core Tensor type with a familiar PyTorch-like API, wrapping scirs2's powerful autograd functionality.
Features
- PyTorch-compatible tensor operations
- Automatic differentiation support
- Broadcasting and shape manipulation
- Comprehensive indexing and slicing
- Integration with scirs2 for optimized computation
simd_ops_f32module (v0.1.2): zero-allocation SIMD f32 arithmetic (add_into_f32,add_assign_f32, etc.) and activation functions with PyTorch NaN semantics- Real SIMD dispatch (v0.1.2):
Tensor::add/sub/mul/divautomatically use AVX2/NEON acceleration via scirs2_core for f32 tensors with ≥ 1024 elements - Zero-allocation in-place arithmetic (v0.1.2):
add_/sub_/mul_/div_dispatch throughsimd_*_inplace— no temporary buffers - In-place activation SIMD (v0.1.2):
relu_/leaky_relu_/clamp_route to SIMD helpers for maximum throughput - True buffer pool reuse (v0.1.2):
GlobalMemoryPool::acquire_uninit::<T>()returnsReusedBuffer<T>with zero copy on pool hit simdandparallelfeatures enabled by default — no--featuresflag required- Allocation tracking benchmark (v0.1.2):
benches/alloc_tracking.rs(harness=false, dhat) provesGlobalMemoryPoolachieves 100% alloc reduction — 10,000 blocks in naive path vs 0 in pooled path
Usage
Basic Tensor Creation
use *;
// Create tensors using the tensor! macro
let a = tensor!;
let b = tensor!;
// Create tensors with specific shapes
let zeros = ;
let ones = ;
let eye = ;
// Random tensors
let uniform = ;
let normal = ;
Tensor Operations
// Element-wise operations
let c = a.add?;
let d = a.mul?;
// Matrix multiplication
let e = a.matmul?;
// Reductions
let sum = a.sum;
let mean = a.mean;
let max = a.max;
// Activation functions
let relu = a.relu;
let sigmoid = a.sigmoid;
Shape Manipulation
// Reshape
let reshaped = a.view?;
// Transpose
let transposed = a.t?;
// Squeeze and unsqueeze
let squeezed = a.squeeze;
let unsqueezed = a.unsqueeze?;
Automatic Differentiation
// Enable gradient computation
let x = tensor!.requires_grad_;
// Forward pass
let y = x.pow?.add?;
// Backward pass
y.backward?;
// Access gradient
let grad = x.grad.unwrap;
Indexing and Slicing
// Basic indexing
let element = tensor.get?;
let element_2d = tensor.get_2d?;
// Slicing with macros
let slice = tensor.index?;
// Boolean masking
let mask = tensor.gt?;
let selected = tensor.masked_select?;
Performance
torsh-tensor routes hot arithmetic paths through SIMD automatically when the simd feature is active (default since v0.1.2).
- Element-wise arithmetic (
add,sub,mul,div) on f32 tensors with ≥ 1024 elements dispatches through scirs2_core's AVX2 (x86-64) or NEON (AArch64) kernels. - In-place variants (
add_,sub_,mul_,div_) usesimd_*_inplace— no intermediate allocation occurs at any tensor size. - Activation functions (
relu_,leaky_relu_,clamp_) take the same in-place SIMD path. - The global memory pool (
GlobalMemoryPool) returns slabs without copying when the requested size matches an existing free buffer (acquire_uninit::<T>()).
No special build flags are needed on supported targets; the feature-detection is done at runtime by scirs2_core.
Recent Changes
v0.1.2 — 2026-04-26
- Added
simd_ops_f32module with zero-allocation SIMD f32 arithmetic and activations (PyTorch NaN semantics). - Wired real SIMD dispatch into
Tensor::add/sub/mul/divfor f32 tensors ≥ 1024 elements (AVX2/NEON via scirs2_core). add_/sub_/mul_/div_now callsimd_*_inplace— zero extra allocations.relu_/leaky_relu_/clamp_dispatch to SIMD helpers.GlobalMemoryPool::acquire_uninit::<T>()returnsReusedBuffer<T>with no copy on pool hit.simdandparallelfeatures promoted to default features.
License
Licensed under the Apache License, Version 2.0. See LICENSE for details.