tensorlogic-infer 0.1.0-beta.1

Execution and autodiff traits for TensorLogic inference engines
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
# Trait Implementation Guide

This guide provides comprehensive instructions for implementing the TensorLogic execution traits in your backend.

## Table of Contents

- [Overview]#overview
- [Core Traits]#core-traits
- [Implementation Checklist]#implementation-checklist
- [Step-by-Step Guide]#step-by-step-guide
- [Best Practices]#best-practices
- [Testing Your Implementation]#testing-your-implementation
- [Common Pitfalls]#common-pitfalls
- [Examples]#examples

## Overview

TensorLogic defines several traits that backends must implement:

1. **TlExecutor** - Core tensor operations (required)
2. **TlAutodiff** - Automatic differentiation (optional, for training)
3. **TlBatchExecutor** - Batch processing (optional, for efficiency)
4. **TlStreamingExecutor** - Streaming execution (optional, for large datasets)
5. **TlCapabilities** - Backend capability queries (recommended)
6. **TlProfiledExecutor** - Execution profiling (optional, for debugging)
7. **TlRecoverableExecutor** - Error recovery (optional, for fault tolerance)

## Core Traits

### TlExecutor (Required)

The fundamental trait that all backends must implement.

```rust
pub trait TlExecutor {
    type Tensor;
    type Error;

    fn einsum(&mut self, spec: &str, inputs: &[Self::Tensor])
        -> Result<Self::Tensor, Self::Error>;

    fn elem_op(&mut self, op: ElemOp, x: &Self::Tensor)
        -> Result<Self::Tensor, Self::Error>;

    fn elem_op_binary(&mut self, op: ElemOp, x: &Self::Tensor, y: &Self::Tensor)
        -> Result<Self::Tensor, Self::Error>;

    fn reduce(&mut self, op: ReduceOp, x: &Self::Tensor, axes: &[usize])
        -> Result<Self::Tensor, Self::Error>;
}
```

**Key Points:**
- `Tensor` type should represent your tensor data structure
- `Error` type should capture all possible errors
- All methods take `&mut self` to allow state tracking
- Operations should be pure (no side effects beyond state tracking)

### TlAutodiff (Optional, for Training)

Extends `TlExecutor` with automatic differentiation.

```rust
pub trait TlAutodiff: TlExecutor {
    type Tape;

    fn forward(&mut self, graph: &EinsumGraph)
        -> Result<Self::Tensor, Self::Error>;

    fn backward(&mut self, graph: &EinsumGraph, loss: &Self::Tensor)
        -> Result<Self::Tape, Self::Error>;
}
```

**Key Points:**
- `Tape` represents recorded computation for backpropagation
- `forward` executes the graph and records operations
- `backward` computes gradients using the tape
- Must track intermediate values for gradient computation

## Implementation Checklist

### Minimum Viable Implementation (TlExecutor only)

- [ ] Define `Tensor` type (e.g., wrapper around ndarray)
- [ ] Define `Error` type (use thiserror for clean errors)
- [ ] Implement `einsum` (at least basic cases)
- [ ] Implement `elem_op` (Relu, OneMinus, at minimum)
- [ ] Implement `elem_op_binary` (Add, Multiply, at minimum)
- [ ] Implement `reduce` (Sum, Max, at minimum)
- [ ] Write unit tests for each operation
- [ ] Handle edge cases (empty tensors, mismatched shapes)

### Production-Ready Implementation

- [ ] All minimum viable items ✓
- [ ] Implement `TlAutodiff` for training support
- [ ] Implement `TlBatchExecutor` for batch processing
- [ ] Implement `TlCapabilities` for feature detection
- [ ] Comprehensive error handling
- [ ] Performance optimization (SIMD, parallelization)
- [ ] Memory efficiency (pooling, caching)
- [ ] Integration tests with real graphs
- [ ] Benchmarks against reference implementation

## Step-by-Step Guide

### Step 1: Set Up Your Crate

```bash
cargo new --lib my-tensorlogic-backend
cd my-tensorlogic-backend
```

Add dependencies to `Cargo.toml`:

```toml
[dependencies]
tensorlogic-ir = "0.1"
tensorlogic-infer = "0.1"
thiserror = "1.0"
# Your tensor library (e.g., ndarray)
ndarray = "0.15"
```

### Step 2: Define Core Types

```rust
use ndarray::{Array, ArrayD};
use thiserror::Error;

/// Your tensor type
#[derive(Clone, Debug)]
pub struct MyTensor {
    data: ArrayD<f64>,
    id: String,
}

/// Your error type
#[derive(Error, Debug)]
pub enum MyError {
    #[error("Shape mismatch: expected {expected:?}, got {actual:?}")]
    ShapeMismatch { expected: Vec<usize>, actual: Vec<usize> },

    #[error("Invalid einsum specification: {0}")]
    InvalidEinsum(String),

    #[error("Operation failed: {0}")]
    OperationFailed(String),
}

/// Your executor
pub struct MyExecutor {
    // State tracking, caching, etc.
}
```

### Step 3: Implement TlExecutor

```rust
use tensorlogic_infer::{TlExecutor, ElemOp, ReduceOp};

impl TlExecutor for MyExecutor {
    type Tensor = MyTensor;
    type Error = MyError;

    fn einsum(&mut self, spec: &str, inputs: &[Self::Tensor])
        -> Result<Self::Tensor, Self::Error>
    {
        // Parse einsum specification
        let (input_specs, output_spec) = parse_einsum_spec(spec)?;

        // Validate inputs
        if inputs.len() != input_specs.len() {
            return Err(MyError::InvalidEinsum(
                format!("Expected {} inputs, got {}", input_specs.len(), inputs.len())
            ));
        }

        // Execute einsum operation
        let result = execute_einsum_op(input_specs, output_spec, inputs)?;

        Ok(result)
    }

    fn elem_op(&mut self, op: ElemOp, x: &Self::Tensor)
        -> Result<Self::Tensor, Self::Error>
    {
        let result_data = match op {
            ElemOp::Relu => x.data.mapv(|v| v.max(0.0)),
            ElemOp::OneMinus => x.data.mapv(|v| 1.0 - v),
            ElemOp::Sigmoid => x.data.mapv(|v| 1.0 / (1.0 + (-v).exp())),
            // Add other operations...
            _ => return Err(MyError::OperationFailed(
                format!("Unsupported operation: {:?}", op)
            )),
        };

        Ok(MyTensor {
            data: result_data,
            id: format!("{}_op", x.id)
        })
    }

    fn elem_op_binary(&mut self, op: ElemOp, x: &Self::Tensor, y: &Self::Tensor)
        -> Result<Self::Tensor, Self::Error>
    {
        // Validate shapes are compatible
        if x.data.shape() != y.data.shape() {
            return Err(MyError::ShapeMismatch {
                expected: x.data.shape().to_vec(),
                actual: y.data.shape().to_vec(),
            });
        }

        let result_data = match op {
            ElemOp::Add => &x.data + &y.data,
            ElemOp::Multiply => &x.data * &y.data,
            ElemOp::Max => {
                let mut result = x.data.clone();
                ndarray::Zip::from(&mut result)
                    .and(&y.data)
                    .for_each(|a, &b| *a = a.max(b));
                result
            },
            // Add other operations...
            _ => return Err(MyError::OperationFailed(
                format!("Unsupported binary operation: {:?}", op)
            )),
        };

        Ok(MyTensor {
            data: result_data,
            id: format!("{}_{}_op", x.id, y.id)
        })
    }

    fn reduce(&mut self, op: ReduceOp, x: &Self::Tensor, axes: &[usize])
        -> Result<Self::Tensor, Self::Error>
    {
        let result_data = match op {
            ReduceOp::Sum => {
                let mut result = x.data.clone();
                for &axis in axes.iter().rev() {
                    result = result.sum_axis(ndarray::Axis(axis));
                }
                result
            },
            ReduceOp::Max => {
                let mut result = x.data.clone();
                for &axis in axes.iter().rev() {
                    result = result.map_axis(ndarray::Axis(axis), |view| {
                        view.iter().fold(f64::NEG_INFINITY, |a, &b| a.max(b))
                    });
                }
                result
            },
            // Add other operations...
            _ => return Err(MyError::OperationFailed(
                format!("Unsupported reduce operation: {:?}", op)
            )),
        };

        Ok(MyTensor {
            data: result_data,
            id: format!("{}_reduce", x.id)
        })
    }
}
```

### Step 4: Implement TlAutodiff (Optional)

```rust
use tensorlogic_infer::TlAutodiff;
use tensorlogic_ir::EinsumGraph;

pub struct ComputationTape {
    // Store intermediate values and operations
    operations: Vec<TapeEntry>,
}

impl TlAutodiff for MyExecutor {
    type Tape = ComputationTape;

    fn forward(&mut self, graph: &EinsumGraph)
        -> Result<Self::Tensor, Self::Error>
    {
        // Enable gradient tracking
        self.gradient_mode = true;

        // Execute graph and record operations
        let mut tape = ComputationTape::new();

        for node in &graph.nodes {
            let result = match &node.op {
                OpType::Einsum { spec } => {
                    let inputs = self.get_inputs(&node.inputs)?;
                    let output = self.einsum(spec, &inputs)?;
                    tape.record(TapeEntry::Einsum {
                        spec: spec.clone(),
                        inputs: inputs.clone(),
                        output: output.clone()
                    });
                    output
                },
                // Handle other operation types...
                _ => unimplemented!(),
            };

            self.store_result(node.id, result)?;
        }

        self.tape = Some(tape);
        self.get_output()
    }

    fn backward(&mut self, graph: &EinsumGraph, loss: &Self::Tensor)
        -> Result<Self::Tape, Self::Error>
    {
        let tape = self.tape.take()
            .ok_or(MyError::OperationFailed("No forward pass recorded".into()))?;

        // Initialize gradient with respect to loss
        let mut gradients = HashMap::new();
        gradients.insert(loss.id.clone(), loss.clone());

        // Backpropagate through tape in reverse order
        for entry in tape.operations.iter().rev() {
            match entry {
                TapeEntry::Einsum { spec, inputs, output } => {
                    let grad_output = gradients.get(&output.id)
                        .ok_or(MyError::OperationFailed("Missing gradient".into()))?;

                    // Compute gradients for inputs using einsum derivatives
                    let input_grads = self.einsum_backward(spec, inputs, grad_output)?;

                    for (input, grad) in inputs.iter().zip(input_grads.iter()) {
                        gradients.insert(input.id.clone(), grad.clone());
                    }
                },
                // Handle other operation types...
                _ => unimplemented!(),
            }
        }

        self.gradients = Some(gradients);
        Ok(tape)
    }
}
```

### Step 5: Implement TlBatchExecutor (Optional)

```rust
use tensorlogic_infer::{TlBatchExecutor, BatchResult};
use std::collections::HashMap;

impl TlBatchExecutor for MyExecutor {
    fn execute_batch(
        &mut self,
        graph: &EinsumGraph,
        batch_inputs: Vec<HashMap<String, Self::Tensor>>,
    ) -> Result<BatchResult<Self::Tensor>, Self::Error> {
        let start = std::time::Instant::now();
        let mut results = Vec::new();

        for inputs in batch_inputs {
            // Execute graph for each input
            let output = self.forward_with_inputs(graph, &inputs)?;
            results.push(output);
        }

        let total_time_ms = start.elapsed().as_secs_f64() * 1000.0;

        Ok(BatchResult {
            outputs: results,
            total_time_ms,
            metadata: HashMap::new(),
        })
    }

    fn execute_batch_parallel(
        &mut self,
        graph: &EinsumGraph,
        batch_inputs: Vec<HashMap<String, Self::Tensor>>,
        num_threads: Option<usize>,
    ) -> Result<BatchResult<Self::Tensor>, Self::Error> {
        use rayon::prelude::*;

        let num_threads = num_threads.unwrap_or_else(num_cpus::get);
        rayon::ThreadPoolBuilder::new()
            .num_threads(num_threads)
            .build()
            .unwrap();

        let start = std::time::Instant::now();

        // Parallel execution
        let results: Result<Vec<_>, _> = batch_inputs
            .par_iter()
            .map(|inputs| {
                // Each thread needs its own executor
                let mut exec = self.clone();
                exec.forward_with_inputs(graph, inputs)
            })
            .collect();

        let total_time_ms = start.elapsed().as_secs_f64() * 1000.0;

        Ok(BatchResult {
            outputs: results?,
            total_time_ms,
            metadata: HashMap::new(),
        })
    }

    fn optimal_batch_size(&self, _graph: &EinsumGraph) -> usize {
        // Heuristic: balance memory and parallelism
        let available_threads = num_cpus::get();
        let memory_per_sample_mb = 10.0; // Estimate
        let available_memory_mb = 1000.0; // Estimate

        let max_by_memory = (available_memory_mb / memory_per_sample_mb) as usize;
        let max_by_threads = available_threads * 2; // 2x for pipelining

        max_by_memory.min(max_by_threads).max(1)
    }
}
```

## Best Practices

### Error Handling

1. **Use thiserror for clean error types**:
```rust
#[derive(Error, Debug)]
pub enum MyError {
    #[error("Shape mismatch: expected {expected:?}, got {actual:?}")]
    ShapeMismatch { expected: Vec<usize>, actual: Vec<usize> },
}
```

2. **Provide detailed error messages**:
```rust
if inputs.is_empty() {
    return Err(MyError::InvalidInput(
        "Expected at least one input tensor".into()
    ));
}
```

3. **Handle all edge cases**:
- Empty tensors
- Mismatched shapes
- Invalid specifications
- Out of memory

### Performance Optimization

1. **Use SIMD when possible**:
```rust
#[cfg(target_feature = "avx2")]
use std::arch::x86_64::*;
```

2. **Implement memory pooling**:
```rust
struct TensorPool {
    pool: Vec<ArrayD<f64>>,
}

impl TensorPool {
    fn allocate(&mut self, shape: &[usize]) -> ArrayD<f64> {
        self.pool.pop().unwrap_or_else(|| ArrayD::zeros(shape))
    }

    fn deallocate(&mut self, tensor: ArrayD<f64>) {
        self.pool.push(tensor);
    }
}
```

3. **Cache einsum parsing**:
```rust
use std::collections::HashMap;

struct EinsumCache {
    cache: HashMap<String, ParsedEinsum>,
}
```

### Memory Management

1. **Use Cow for zero-copy when possible**:
```rust
use std::borrow::Cow;

fn process<'a>(&self, data: Cow<'a, ArrayD<f64>>) -> Cow<'a, ArrayD<f64>> {
    if needs_modification {
        Cow::Owned(data.into_owned().mapv(|x| x * 2.0))
    } else {
        data // Zero-copy
    }
}
```

2. **Implement Drop for resource cleanup**:
```rust
impl Drop for MyExecutor {
    fn drop(&mut self) {
        // Clean up GPU resources, file handles, etc.
        self.cleanup_resources();
    }
}
```

## Testing Your Implementation

### Unit Tests

```rust
#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_basic_execution() {
        let mut exec = MyExecutor::new();
        let t1 = MyTensor::ones(vec![2, 3]);
        let t2 = MyTensor::ones(vec![2, 3]);

        let result = exec.elem_op_binary(ElemOp::Add, &t1, &t2).unwrap();

        assert_eq!(result.data.shape(), &[2, 3]);
        assert!(result.data.iter().all(|&x| (x - 2.0).abs() < 1e-6));
    }

    #[test]
    fn test_shape_mismatch_error() {
        let mut exec = MyExecutor::new();
        let t1 = MyTensor::ones(vec![2, 3]);
        let t2 = MyTensor::ones(vec![3, 2]);

        let result = exec.elem_op_binary(ElemOp::Add, &t1, &t2);

        assert!(result.is_err());
        assert!(matches!(result.unwrap_err(), MyError::ShapeMismatch { .. }));
    }
}
```

### Integration Tests

Create `tests/integration_test.rs`:

```rust
use my_backend::MyExecutor;
use tensorlogic_infer::TlExecutor;
use tensorlogic_compiler::compile;

#[test]
fn test_full_graph_execution() {
    // Compile a TLExpr to EinsumGraph
    let expr = /* ... */;
    let graph = compile(&expr, &context).unwrap();

    // Execute with your backend
    let mut executor = MyExecutor::new();
    let inputs = /* ... */;
    let outputs = executor.forward(&graph, &inputs).unwrap();

    // Verify results
    assert_eq!(outputs.shape(), expected_shape);
}
```

## Common Pitfalls

### 1. **Forgetting to clone tensors**

❌ **Wrong**:
```rust
fn elem_op(&mut self, op: ElemOp, x: &Self::Tensor) -> Result<Self::Tensor, Self::Error> {
    x.data.mapv_inplace(|v| v.max(0.0)); // Mutates input!
    Ok(x.clone())
}
```

✅ **Correct**:
```rust
fn elem_op(&mut self, op: ElemOp, x: &Self::Tensor) -> Result<Self::Tensor, Self::Error> {
    let result_data = x.data.mapv(|v| v.max(0.0)); // Creates new array
    Ok(MyTensor { data: result_data, id: format!("{}_relu", x.id) })
}
```

### 2. **Not handling broadcast correctly**

❌ **Wrong**:
```rust
fn elem_op_binary(&mut self, op: ElemOp, x: &Self::Tensor, y: &Self::Tensor)
    -> Result<Self::Tensor, Self::Error>
{
    // Assumes shapes are identical
    Ok(MyTensor { data: &x.data + &y.data, id: "result".into() })
}
```

✅ **Correct**:
```rust
fn elem_op_binary(&mut self, op: ElemOp, x: &Self::Tensor, y: &Self::Tensor)
    -> Result<Self::Tensor, Self::Error>
{
    // Check if broadcast is needed
    if !are_shapes_compatible(&x.data.shape(), &y.data.shape()) {
        return Err(MyError::ShapeMismatch { /* ... */ });
    }

    let result_data = broadcast_and_apply(&x.data, &y.data, |a, b| a + b)?;
    Ok(MyTensor { data: result_data, id: "result".into() })
}
```

### 3. **Memory leaks in gradient computation**

✅ **Always clean up**:
```rust
impl Drop for MyExecutor {
    fn drop(&mut self) {
        self.clear_tape();
        self.clear_gradients();
    }
}
```

### 4. **Not validating einsum specifications**

✅ **Validate early**:
```rust
fn einsum(&mut self, spec: &str, inputs: &[Self::Tensor])
    -> Result<Self::Tensor, Self::Error>
{
    // Validate specification format
    if !is_valid_einsum_spec(spec) {
        return Err(MyError::InvalidEinsum(format!("Invalid spec: {}", spec)));
    }

    // Validate input count
    let expected_inputs = count_einsum_inputs(spec);
    if inputs.len() != expected_inputs {
        return Err(MyError::InvalidEinsum(
            format!("Expected {} inputs, got {}", expected_inputs, inputs.len())
        ));
    }

    // ... rest of implementation
}
```

## Examples

### Complete Minimal Backend

See `examples/minimal_backend.rs` in the repository for a complete minimal implementation.

### Production Backend

See the `tensorlogic-scirs-backend` crate for a full production implementation using SciRS2.

## Next Steps

1. Implement the basic `TlExecutor` trait
2. Write comprehensive tests
3. Benchmark against reference implementation
4. Add optional traits (TlAutodiff, TlBatchExecutor, etc.)
5. Optimize for your target platform
6. Submit your backend to the TensorLogic ecosystem!

## Getting Help

- **Documentation**: https://docs.rs/tensorlogic-infer
- **Examples**: https://github.com/cool-japan/tensorlogic/tree/main/examples
- **Issues**: https://github.com/cool-japan/tensorlogic/issues
- **Discussions**: https://github.com/cool-japan/tensorlogic/discussions

---

**Version**: 1.0
****Last Updated**: 2025-12-16
**Part of**: [TensorLogic Ecosystem](https://github.com/cool-japan/tensorlogic)