torsh 0.1.0-alpha.1

A blazingly fast, production-ready deep learning framework written in pure Rust
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
# ToRSh API Reference

## Overview

ToRSh (Tensor Operations in Rust with Sharding) is a production-ready deep learning framework built in pure Rust. It provides a PyTorch-compatible API with superior performance, memory safety, and deployment flexibility.

## Core API

### Tensor Operations

#### Tensor Creation Functions

```rust
use torsh::prelude::*;

// Create tensors from data
let t1 = tensor![1.0, 2.0, 3.0];
let t2 = tensor_2d![[1.0, 2.0], [3.0, 4.0]];

// Convenience creation functions
let zeros = zeros(&[2, 3]);
let ones = ones(&[2, 3]);
let randn = randn(&[2, 3]);
let eye = eye(5);
let arange = arange(0.0, 10.0, 1.0);
```

#### Tensor Operations

```rust
// Arithmetic operations
let result = tensor1.add(&tensor2)?;
let result = tensor1.mul(&tensor2)?;
let result = tensor1.div(&tensor2)?;
let result = tensor1.sub(&tensor2)?;

// Matrix operations
let result = tensor1.matmul(&tensor2)?;
let result = tensor1.transpose(0, 1)?;

// Reduction operations
let sum = tensor1.sum(None)?;
let mean = tensor1.mean(None)?;
let max = tensor1.max(None)?;
let min = tensor1.min(None)?;

// Shape operations
let reshaped = tensor1.reshape(&[2, 3])?;
let squeezed = tensor1.squeeze()?;
let unsqueezed = tensor1.unsqueeze(0)?;
```

### Automatic Differentiation

```rust
use torsh::prelude::*;

// Enable gradients
let x = tensor![2.0].requires_grad_(true);
let y = x.pow(2.0)?;

// Compute gradients
y.backward()?;
println!("Gradient: {:?}", x.grad());

// Gradient contexts
let result = no_grad(|| {
    // Operations here don't track gradients
    x.mul(&x)
})?;
```

### Neural Network Modules

#### Basic Layers

```rust
use torsh::nn::*;

// Linear layer
let linear = Linear::new(784, 128);
let output = linear.forward(&input)?;

// Convolutional layers
let conv2d = Conv2d::new(3, 64, 3, ConvOptions::default());
let conv_out = conv2d.forward(&input)?;

// Normalization layers
let batch_norm = BatchNorm2d::new(64);
let norm_out = batch_norm.forward(&conv_out)?;
```

#### Activation Functions

```rust
use torsh::nn::*;

// Built-in activation modules
let relu = ReLU::new();
let sigmoid = Sigmoid::new();
let tanh = Tanh::new();
let gelu = GELU::new();

// Functional activations
let activated = F::relu(&input)?;
let activated = F::sigmoid(&input)?;
```

#### Loss Functions

```rust
use torsh::nn::*;

// Loss functions
let mse = MSELoss::new();
let cross_entropy = CrossEntropyLoss::new();
let bce = BCELoss::new();

// Compute loss
let loss = mse.forward(&predictions, &targets)?;
```

### Optimizers

```rust
use torsh::optim::*;

// Create optimizer
let mut sgd = SGD::new(model.parameters(), 0.01);
let mut adam = Adam::new(model.parameters(), 0.001);

// Optimization step
sgd.zero_grad();
loss.backward()?;
sgd.step()?;
```

### Data Loading

```rust
use torsh::data::*;

// Dataset creation
let dataset = TensorDataset::new(inputs, targets);
let dataloader = DataLoader::new(dataset, 32, true);

// Iterate over batches
for batch in dataloader {
    let (inputs, targets) = batch?;
    // Training logic here
}
```

## Advanced API

### Sparse Tensors

```rust
use torsh::sparse::*;

// Create sparse tensor
let indices = tensor![[0, 1, 1], [2, 0, 2]];
let values = tensor![3.0, 4.0, 5.0];
let sparse = SparseTensor::new(indices, values, &[2, 3]);

// Sparse operations
let result = sparse.to_dense()?;
let spmm = sparse.sparse_mm(&dense)?;
```

### Quantization

```rust
use torsh::quantization::*;

// Dynamic quantization
let quantized = quantize_dynamic(&model, &[QConfigDynamic::default()])?;

// Static quantization
let qconfig = QConfig::default();
let quantized = quantize_static(&model, &qconfig)?;
```

### Special Functions

```rust
use torsh::special::*;

// Mathematical special functions
let gamma_result = gamma(&input)?;
let bessel_result = bessel_j0(&input)?;
let erf_result = erf(&input)?;
```

### Linear Algebra

```rust
use torsh::linalg::*;

// Matrix decompositions
let (q, r) = qr(&matrix)?;
let (u, s, v) = svd(&matrix)?;
let eigenvals = eigvals(&matrix)?;

// Solve linear systems
let solution = solve(&a, &b)?;
```

### Distributed Training

```rust
use torsh::distributed::*;

// Initialize distributed training
let world_size = 4;
let rank = 0;
init_process_group(Backend::Nccl, world_size, rank)?;

// Distributed data parallel
let ddp_model = DistributedDataParallel::new(model)?;
```

### JIT Compilation

```rust
use torsh::jit::*;

// JIT compile function
let jit_fn = jit_compile(|x: &Tensor| {
    x.mul(&x)?.add(&tensor![1.0])
})?;

// Use compiled function
let result = jit_fn(&input)?;
```

### Graph Transformations

```rust
use torsh::fx::*;

// Create graph tracer
let tracer = GraphTracer::new();
let graph = tracer.trace(&model, &sample_input)?;

// Apply transformations
let optimized = optimize_graph(&graph)?;
```

## Device Management

```rust
use torsh::prelude::*;

// Device creation
let cpu = Device::cpu();
let cuda = Device::cuda(0);

// Move tensors to device
let tensor_gpu = tensor_cpu.to_device(&cuda)?;

// Check device properties
let properties = cuda.properties()?;
println!("Device: {}", properties.name);
```

## Memory Management

```rust
use torsh::prelude::*;

// Memory info
let mem_info = Device::cuda(0).memory_info()?;
println!("Free: {}, Total: {}", mem_info.free, mem_info.total);

// Manual memory management
torch::cuda::empty_cache()?;
torch::cuda::synchronize()?;
```

## Serialization

```rust
use torsh::prelude::*;

// Save/load tensors
tensor.save("tensor.pt")?;
let loaded = Tensor::load("tensor.pt")?;

// Model state dict
let state_dict = model.state_dict();
model.load_state_dict(&state_dict)?;
```

## Error Handling

```rust
use torsh::prelude::*;

// All operations return Result<T, TorshError>
match tensor1.add(&tensor2) {
    Ok(result) => println!("Success: {:?}", result),
    Err(e) => println!("Error: {}", e),
}

// Use ? operator for error propagation
fn example() -> Result<Tensor> {
    let a = tensor![1.0, 2.0];
    let b = tensor![3.0, 4.0];
    let result = a.add(&b)?;
    Ok(result)
}
```

## Performance Optimization

### SIMD Operations

```rust
// Enable SIMD optimizations
use torsh::prelude::*;

// Operations automatically use SIMD when available
let result = tensor1.add(&tensor2)?; // Uses AVX/NEON when available
```

### Memory Layout

```rust
// Contiguous memory layout
let tensor = tensor.contiguous()?;

// Memory format specification
let nhwc = tensor.to_memory_format(MemoryFormat::ChannelsLast)?;
```

## Integration Examples

### PyTorch Compatibility

```rust
use torsh::prelude::*;

// PyTorch-like API
let x = torch::randn(&[2, 3]);
let y = torch::mm(&x, &x.t());
let z = torch::relu(&y);

// Functional API
let output = F::conv2d(&input, &weight, Some(&bias), 1, 1, 1, 1)?;
```

### NumPy Compatibility

```rust
use torsh::prelude::*;

// NumPy-like operations
let arr = np::array([[1.0, 2.0], [3.0, 4.0]]);
let result = np::dot(&arr, &arr.T());
```

## Testing Utilities

```rust
use torsh::testing::*;

// Assertion helpers
assert_tensor_eq!(tensor1, tensor2);
assert_tensor_close!(tensor1, tensor2, 1e-5);

// Testing with gradients
assert_grad_close!(tensor1, tensor2, 1e-5);
```

## Best Practices

### Memory Management

```rust
// Prefer in-place operations when possible
tensor.add_(&other)?; // In-place addition
tensor.mul_(&scalar)?; // In-place multiplication

// Use proper scoping for temporary tensors
{
    let temp = tensor.clone();
    // Use temp here
} // temp is dropped here
```

### Performance

```rust
// Use appropriate data types
let float_tensor = tensor.to_dtype(DType::F32)?;
let half_tensor = tensor.to_dtype(DType::F16)?; // For memory efficiency

// Batch operations
let batched = torch::stack(&tensors, 0)?; // Better than individual operations
```

### Error Handling

```rust
// Handle errors appropriately
fn safe_operation(tensor: &Tensor) -> Result<Tensor> {
    if tensor.numel() == 0 {
        return Err(TorshError::InvalidArgument("Empty tensor".to_string()));
    }
    tensor.square()
}
```

## Version Information

```rust
use torsh::prelude::*;

// Check version
println!("ToRSh version: {}", VERSION);

// Version compatibility
check_version(0, 1)?;

// Feature information
print_feature_info();
```

## Migration from PyTorch

### Common Patterns

```rust
// PyTorch -> ToRSh
// torch.tensor([1, 2, 3])
let tensor = tensor![1, 2, 3];

// torch.randn(2, 3)
let tensor = randn(&[2, 3]);

// torch.nn.Linear(784, 128)
let linear = Linear::new(784, 128);

// torch.optim.Adam(model.parameters(), lr=0.001)
let optimizer = Adam::new(model.parameters(), 0.001);
```

### Key Differences

1. **Error Handling**: ToRSh uses `Result<T, TorshError>` for all operations
2. **Memory Safety**: No need for manual memory management
3. **Performance**: Automatic SIMD optimizations
4. **Type Safety**: Compile-time shape checking when possible

## Contributing

For API extensions and improvements, please follow the ToRSh contribution guidelines and ensure all new APIs maintain PyTorch compatibility where appropriate.