# ToRSh Sparse Best Practices Guide
## Overview
This guide provides best practices, design patterns, and optimization techniques for using ToRSh-Sparse effectively in production applications. Following these practices will help you build robust, efficient, and maintainable sparse tensor applications.
## Table of Contents
1. [Format Selection Guidelines](#format-selection-guidelines)
2. [Performance Best Practices](#performance-best-practices)
3. [Memory Management](#memory-management)
4. [Error Handling and Robustness](#error-handling-and-robustness)
5. [Code Organization](#code-organization)
6. [Testing Strategies](#testing-strategies)
7. [Deployment Considerations](#deployment-considerations)
8. [Common Pitfalls](#common-pitfalls)
9. [Design Patterns](#design-patterns)
10. [Optimization Checklist](#optimization-checklist)
## Format Selection Guidelines
### Choose the Right Format for Your Use Case
#### CSR (Compressed Sparse Row)
**Use when:**
- Performing frequent row-wise operations
- Matrix-vector multiplication is primary operation
- Need efficient sequential row access
- Building general-purpose sparse matrix libraries
**Avoid when:**
- Primarily doing column operations
- Frequent random element insertion/deletion
- Memory is extremely constrained
```rust
// Good: Row-wise operations with CSR
let csr_matrix = CSRTensor::from_triplets(triplets, shape)?;
for row_idx in 0..csr_matrix.nrows() {
let row = csr_matrix.get_row(row_idx)?;
// Process row efficiently
}
// Bad: Column operations with CSR (inefficient)
for col_idx in 0..csr_matrix.ncols() {
let col = csr_matrix.get_col(col_idx)?; // Slow!
}
```
#### CSC (Compressed Sparse Column)
**Use when:**
- Performing frequent column-wise operations
- Transpose matrix-vector multiplication
- Linear algebra algorithms requiring column access
- Interfacing with column-major libraries
**Example:**
```rust
// Good: Column-wise operations with CSC
let csc_matrix = CSCTensor::from_csr(&csr_matrix)?;
for col_idx in 0..csc_matrix.ncols() {
let col = csc_matrix.get_col(col_idx)?;
// Process column efficiently
}
```
#### COO (Coordinate)
**Use when:**
- Building sparse matrices incrementally
- Converting between formats
- One-time operations on unsorted data
- Parallel construction from multiple threads
**Avoid when:**
- Performing arithmetic operations repeatedly
- Need optimized matrix-vector multiplication
- Memory usage is critical
```rust
// Good: Incremental construction with COO
let mut triplets = Vec::new();
for data_point in data_stream {
triplets.push((data_point.row, data_point.col, data_point.value));
}
let coo_matrix = COOTensor::from_triplets(triplets, shape)?;
// Convert to optimized format for operations
let csr_matrix = CSRTensor::from_coo(&coo_matrix)?;
```
#### Specialized Formats (BSR, DIA, ELL)
**Use when:**
- Matrix has specific structural patterns
- Performance is critical
- Memory bandwidth is the bottleneck
```rust
// BSR for block-structured matrices
if has_block_structure(&matrix) {
let block_size = detect_optimal_block_size(&matrix)?;
let bsr_matrix = BSRTensor::from_csr(&csr_matrix, block_size)?;
}
// DIA for banded matrices
if is_banded(&matrix) {
let dia_matrix = DIATensor::from_csr(&csr_matrix)?;
}
```
### Dynamic Format Selection
```rust
use torsh_sparse::{auto_select_format, OperationType};
fn select_optimal_format(matrix: &CSRTensor, operations: &[OperationType]) -> Result<SparseFormat, TorshError> {
// Analyze matrix characteristics
let analysis = analyze_sparsity_pattern(matrix)?;
// Consider operation types
let format = if analysis.is_diagonal {
SparseFormat::DIA
} else if analysis.has_block_structure && analysis.density > 0.1 {
SparseFormat::BSR
} else if operations.contains(&OperationType::MatVec) {
SparseFormat::CSR
} else if operations.contains(&OperationType::VecMat) {
SparseFormat::CSC
} else {
auto_select_format(matrix, operations)?
};
Ok(format)
}
```
## Performance Best Practices
### 1. Minimize Format Conversions
```rust
// Bad: Multiple conversions in loop
for iteration in 0..num_iterations {
let csc_matrix = CSCTensor::from_csr(&csr_matrix)?; // Expensive!
let result = csc_matrix.vecmat(&vector)?;
}
// Good: Convert once, reuse
let csc_matrix = CSCTensor::from_csr(&csr_matrix)?;
for iteration in 0..num_iterations {
let result = csc_matrix.vecmat(&vector)?;
}
```
### 2. Use Unified Interface for Complex Workflows
```rust
use torsh_sparse::UnifiedSparseTensor;
// Good: Automatic optimization
let unified = UnifiedSparseTensor::from_csr(csr_matrix)?;
let optimized = unified.optimize_for_operations(&[
OperationType::MatVec,
OperationType::Transpose,
OperationType::Addition,
])?;
// Performs operations with optimal formats
let result1 = optimized.matvec(&vector)?; // Uses CSR
let result2 = optimized.transpose()?; // Uses CSC
let result3 = optimized.add(&other_matrix)?; // Uses optimal format
```
### 3. Leverage Memory Pools
```rust
use torsh_sparse::memory_management::MemoryPool;
fn efficient_batch_processing(matrices: &[CSRTensor]) -> Result<Vec<f64>, TorshError> {
// Create memory pool
let pool = MemoryPool::new(1_000_000_000)?; // 1GB pool
let mut results = Vec::new();
for matrix in matrices {
// Allocate temporary matrix from pool
let temp_matrix = pool.allocate_like(matrix)?;
// Perform operations using pool memory
let result = pool.multiply_matrices(matrix, &temp_matrix)?;
results.push(result.sum()?);
// Memory automatically returned to pool
}
Ok(results)
}
```
### 4. Use SIMD and Parallel Operations
```rust
use torsh_sparse::parallel::*;
use torsh_sparse::kernels::simd::*;
// Enable SIMD optimizations
if supports_avx2() {
enable_avx2_kernels();
}
// Use parallel operations for large matrices
let config = ParallelConfig::new()
.num_threads(8)
.chunk_size(1000)
.load_balancing(LoadBalancing::Dynamic);
let result = parallel_spmv(&large_matrix, &vector, &config)?;
```
### 5. Profile and Optimize Hot Paths
```rust
use torsh_sparse::profiling::*;
fn optimized_computation() -> Result<(), TorshError> {
let profiler = Profiler::new();
profiler.enable();
// Your computation here
heavy_sparse_computation()?;
let report = profiler.get_report();
// Identify bottlenecks
for (operation, timing) in &report.operation_times {
if timing.total_time > 100.0 { // > 100ms
println!("Bottleneck: {} took {:.2}ms", operation, timing.total_time);
}
}
Ok(())
}
```
## Memory Management
### 1. Monitor Memory Usage
```rust
use torsh_sparse::memory_management::*;
struct MemoryAwareApplication {
memory_budget: usize,
current_usage: usize,
matrices: Vec<Box<dyn SparseTensor>>,
}
impl MemoryAwareApplication {
fn add_matrix(&mut self, matrix: Box<dyn SparseTensor>) -> Result<(), TorshError> {
let matrix_size = matrix.memory_usage();
if self.current_usage + matrix_size > self.memory_budget {
self.cleanup_unused_matrices()?;
}
if self.current_usage + matrix_size <= self.memory_budget {
self.current_usage += matrix_size;
self.matrices.push(matrix);
Ok(())
} else {
Err(TorshError::OutOfMemory {
requested: matrix_size,
available: self.memory_budget - self.current_usage,
})
}
}
fn cleanup_unused_matrices(&mut self) -> Result<(), TorshError> {
// Implement LRU or other cleanup strategy
Ok(())
}
}
```
### 2. Use Streaming for Large Datasets
```rust
fn process_large_dataset(file_path: &str) -> Result<(), TorshError> {
let chunk_size = 10000; // Process 10k rows at a time
let mut row_offset = 0;
loop {
// Load chunk
let chunk = load_sparse_chunk(file_path, row_offset, chunk_size)?;
if chunk.nnz() == 0 {
break; // End of file
}
// Process chunk
let result = process_sparse_chunk(&chunk)?;
save_result(&result, row_offset)?;
row_offset += chunk_size;
// Force garbage collection periodically
if row_offset % (chunk_size * 10) == 0 {
force_gc()?;
}
}
Ok(())
}
```
### 3. Implement Custom Memory Allocators
```rust
struct SparseMatrixAllocator {
pool: MemoryPool,
allocation_strategy: AllocationStrategy,
}
impl SparseMatrixAllocator {
fn allocate_optimized<T: SparseTensor>(&self,
shape: (usize, usize),
nnz: usize) -> Result<T, TorshError> {
match self.allocation_strategy {
AllocationStrategy::MemoryFirst => {
// Optimize for memory usage
self.pool.allocate_compressed(shape, nnz)
},
AllocationStrategy::SpeedFirst => {
// Optimize for speed
self.pool.allocate_cache_aligned(shape, nnz)
},
AllocationStrategy::Balanced => {
// Balance memory and speed
self.pool.allocate_balanced(shape, nnz)
},
}
}
}
```
## Error Handling and Robustness
### 1. Comprehensive Error Handling
```rust
use torsh_sparse::{TorshError, Result};
fn robust_sparse_operation(matrix: &CSRTensor, vector: &[f64]) -> Result<Vec<f64>> {
// Validate inputs
if matrix.ncols() != vector.len() {
return Err(TorshError::DimensionMismatch {
expected: matrix.ncols(),
actual: vector.len(),
});
}
// Check for numerical issues
if matrix.has_inf_or_nan() {
return Err(TorshError::NumericalError {
message: "Matrix contains infinite or NaN values".to_string(),
});
}
// Perform operation with error handling
match matrix.matvec(vector) {
Ok(result) => {
// Validate result
if result.iter().any(|&x| !x.is_finite()) {
Err(TorshError::NumericalError {
message: "Result contains non-finite values".to_string(),
})
} else {
Ok(result)
}
},
Err(e) => {
// Log error and provide context
log::error!("Matrix-vector multiplication failed: {:?}", e);
Err(e)
}
}
}
```
### 2. Input Validation
```rust
fn validate_sparse_matrix(matrix: &dyn SparseTensor) -> Result<()> {
// Check dimensions
let (nrows, ncols) = matrix.shape();
if nrows == 0 || ncols == 0 {
return Err(TorshError::InvalidInput {
message: "Matrix dimensions must be positive".to_string(),
});
}
// Check for valid indices
for (row, col, _) in matrix.triplets() {
if row >= nrows || col >= ncols {
return Err(TorshError::IndexOutOfBounds {
index: (row, col),
shape: (nrows, ncols),
});
}
}
// Check for numerical validity
for (_, _, value) in matrix.triplets() {
if !value.is_finite() {
return Err(TorshError::NumericalError {
message: format!("Non-finite value found: {}", value),
});
}
}
Ok(())
}
```
### 3. Graceful Degradation
```rust
fn adaptive_sparse_computation(matrix: &CSRTensor,
vector: &[f64]) -> Result<Vec<f64>> {
// Try optimized algorithm first
match optimized_spmv(matrix, vector) {
Ok(result) => Ok(result),
Err(TorshError::OutOfMemory { .. }) => {
log::warn!("Falling back to memory-efficient algorithm");
memory_efficient_spmv(matrix, vector)
},
Err(TorshError::UnsupportedOperation { .. }) => {
log::warn!("Falling back to general algorithm");
general_spmv(matrix, vector)
},
Err(e) => Err(e),
}
}
```
## Code Organization
### 1. Modular Design
```rust
// sparse_ops.rs - Core operations
pub struct SparseOperations {
memory_pool: MemoryPool,
profiler: Option<Profiler>,
}
impl SparseOperations {
pub fn new(memory_budget: usize) -> Result<Self> {
Ok(Self {
memory_pool: MemoryPool::new(memory_budget)?,
profiler: None,
})
}
pub fn enable_profiling(&mut self) {
self.profiler = Some(Profiler::new());
}
pub fn multiply(&self, a: &CSRTensor, b: &CSRTensor) -> Result<CSRTensor> {
if let Some(ref profiler) = self.profiler {
profiler.start_operation("sparse_multiply")?;
}
let result = self.memory_pool.multiply_matrices(a, b)?;
if let Some(ref profiler) = self.profiler {
profiler.end_operation("sparse_multiply")?;
}
Ok(result)
}
}
// format_manager.rs - Format selection and conversion
pub struct FormatManager {
cache: HashMap<String, Box<dyn SparseTensor>>,
auto_optimize: bool,
}
impl FormatManager {
pub fn get_optimal_format(&self,
matrix_id: &str,
operations: &[OperationType]) -> Result<&dyn SparseTensor> {
if self.auto_optimize {
self.get_optimized_for_operations(matrix_id, operations)
} else {
self.get_cached(matrix_id)
}
}
}
// application.rs - High-level application logic
pub struct SparseApplication {
operations: SparseOperations,
format_manager: FormatManager,
config: ApplicationConfig,
}
```
### 2. Configuration Management
```rust
#[derive(Debug, Clone)]
pub struct SparseConfig {
pub memory_budget: usize,
pub enable_profiling: bool,
pub auto_format_selection: bool,
pub parallel_threshold: usize,
pub simd_enabled: bool,
pub cache_size: usize,
}
impl Default for SparseConfig {
fn default() -> Self {
Self {
memory_budget: 1_000_000_000, // 1GB
enable_profiling: false,
auto_format_selection: true,
parallel_threshold: 10000,
simd_enabled: true,
cache_size: 100,
}
}
}
impl SparseConfig {
pub fn from_env() -> Result<Self> {
let mut config = Self::default();
if let Ok(budget) = std::env::var("TORSH_MEMORY_BUDGET") {
config.memory_budget = budget.parse()?;
}
if let Ok(profiling) = std::env::var("TORSH_ENABLE_PROFILING") {
config.enable_profiling = profiling.parse()?;
}
Ok(config)
}
}
```
### 3. Trait-Based Design
```rust
pub trait SparseComputation {
type Input;
type Output;
type Error;
fn compute(&self, input: Self::Input) -> Result<Self::Output, Self::Error>;
fn validate_input(&self, input: &Self::Input) -> Result<(), Self::Error>;
fn estimated_memory_usage(&self, input: &Self::Input) -> usize;
}
pub struct MatrixVectorMultiplication {
config: ComputationConfig,
}
impl SparseComputation for MatrixVectorMultiplication {
type Input = (CSRTensor, Vec<f64>);
type Output = Vec<f64>;
type Error = TorshError;
fn compute(&self, (matrix, vector): Self::Input) -> Result<Self::Output, Self::Error> {
self.validate_input(&(matrix, vector))?;
if self.estimated_memory_usage(&(matrix, vector)) > self.config.memory_limit {
self.compute_streaming((matrix, vector))
} else {
matrix.matvec(&vector)
}
}
// ... implement other methods
}
```
## Testing Strategies
### 1. Property-Based Testing
```rust
#[cfg(test)]
mod property_tests {
use super::*;
use proptest::prelude::*;
// Generate random sparse matrices for testing
fn sparse_matrix_strategy() -> impl Strategy<Value = CSRTensor> {
(1usize..100, 1usize..100, 0.01f64..0.5f64)
.prop_flat_map(|(rows, cols, density)| {
let nnz = ((rows * cols) as f64 * density) as usize;
prop::collection::vec(
(0..rows, 0..cols, -10.0..10.0f64),
nnz..nnz+1
).prop_map(move |triplets| {
CSRTensor::from_triplets(triplets, (rows, cols)).unwrap()
})
})
}
proptest! {
#[test]
fn matrix_vector_multiplication_properties(
matrix in sparse_matrix_strategy(),
vector in prop::collection::vec(-10.0..10.0f64, 1..100)
) {
if matrix.ncols() == vector.len() {
let result = matrix.matvec(&vector);
prop_assert!(result.is_ok());
let result = result.unwrap();
prop_assert_eq!(result.len(), matrix.nrows());
prop_assert!(result.iter().all(|&x| x.is_finite()));
}
}
#[test]
fn format_conversion_preserves_data(matrix in sparse_matrix_strategy()) {
let coo = COOTensor::from_csr(&matrix).unwrap();
let csr_back = CSRTensor::from_coo(&coo).unwrap();
prop_assert_eq!(matrix.shape(), csr_back.shape());
prop_assert_eq!(matrix.nnz(), csr_back.nnz());
// Check that all elements are preserved
for i in 0..matrix.nrows() {
for j in 0..matrix.ncols() {
prop_assert!((matrix.get(i, j).unwrap() - csr_back.get(i, j).unwrap()).abs() < 1e-10);
}
}
}
}
}
```
### 2. Performance Regression Testing
```rust
#[cfg(test)]
mod performance_tests {
use super::*;
use std::time::Instant;
#[test]
fn benchmark_matrix_vector_multiplication() {
let matrix = create_test_matrix(10000, 0.01);
let vector = vec![1.0; 10000];
let start = Instant::now();
let _result = matrix.matvec(&vector).unwrap();
let duration = start.elapsed();
// Assert performance doesn't regress
assert!(duration.as_millis() < 100,
"Matrix-vector multiplication took {}ms, expected <100ms",
duration.as_millis());
}
#[test]
fn memory_usage_test() {
let initial_memory = get_memory_usage();
{
let large_matrix = create_test_matrix(50000, 0.001);
let current_memory = get_memory_usage();
// Check memory usage is reasonable
let memory_increase = current_memory - initial_memory;
let expected_memory = large_matrix.nnz() * 16; // Approximate
assert!(memory_increase <= expected_memory * 2,
"Memory usage {} exceeds expected {}",
memory_increase, expected_memory);
}
// Force garbage collection and check for leaks
force_gc();
let final_memory = get_memory_usage();
assert!((final_memory as i64 - initial_memory as i64).abs() < 1_000_000,
"Potential memory leak detected");
}
}
```
### 3. Integration Testing
```rust
#[cfg(test)]
mod integration_tests {
use super::*;
#[test]
fn full_workflow_test() {
// Test complete workflow from creation to computation
let triplets = vec![(0, 0, 1.0), (1, 1, 2.0), (2, 2, 3.0)];
let coo = COOTensor::from_triplets(triplets, (3, 3)).unwrap();
// Test format conversions
let csr = CSRTensor::from_coo(&coo).unwrap();
let csc = CSCTensor::from_csr(&csr).unwrap();
let bsr = BSRTensor::from_csr(&csr, (1, 1)).unwrap();
// Test operations on all formats
let vector = vec![1.0, 2.0, 3.0];
let result_csr = csr.matvec(&vector).unwrap();
let result_csc = csc.matvec(&vector).unwrap();
let result_bsr = bsr.matvec(&vector).unwrap();
// All results should be identical
for i in 0..3 {
assert!((result_csr[i] - result_csc[i]).abs() < 1e-10);
assert!((result_csr[i] - result_bsr[i]).abs() < 1e-10);
}
}
}
```
## Deployment Considerations
### 1. Environment-Specific Optimizations
```rust
pub fn configure_for_environment() -> SparseConfig {
let mut config = SparseConfig::default();
// Detect hardware capabilities
if has_avx2_support() {
config.simd_enabled = true;
log::info!("AVX2 support detected, enabling SIMD optimizations");
}
// Adjust for available memory
let available_memory = get_available_memory();
config.memory_budget = (available_memory * 0.8) as usize; // Use 80% of available
// Adjust for CPU count
let cpu_count = num_cpus::get();
config.parallel_threshold = 10000 / cpu_count; // Scale with CPU count
config
}
```
### 2. Monitoring and Observability
```rust
use tracing::{info, warn, error, debug};
pub struct SparseMetrics {
operation_counts: HashMap<String, u64>,
operation_times: HashMap<String, Duration>,
memory_usage: u64,
cache_hit_rate: f64,
}
impl SparseMetrics {
pub fn record_operation(&mut self, operation: &str, duration: Duration) {
*self.operation_counts.entry(operation.to_string()).or_insert(0) += 1;
*self.operation_times.entry(operation.to_string()).or_insert(Duration::ZERO) += duration;
info!("Operation {} completed in {:?}", operation, duration);
if duration > Duration::from_millis(1000) {
warn!("Slow operation detected: {} took {:?}", operation, duration);
}
}
pub fn export_metrics(&self) -> serde_json::Value {
serde_json::json!({
"operation_counts": self.operation_counts,
"average_operation_times": self.operation_times
.iter()
.map(|(op, total_time)| {
let count = self.operation_counts.get(op).unwrap_or(&1);
(op, total_time.as_millis() / *count as u128)
})
.collect::<HashMap<_, _>>(),
"memory_usage_mb": self.memory_usage / 1_000_000,
"cache_hit_rate": self.cache_hit_rate,
})
}
}
```
### 3. Configuration Management
```rust
#[derive(Serialize, Deserialize)]
pub struct DeploymentConfig {
pub sparse_config: SparseConfig,
pub logging_level: String,
pub metrics_endpoint: Option<String>,
pub health_check_interval: Duration,
}
impl DeploymentConfig {
pub fn from_file<P: AsRef<Path>>(path: P) -> Result<Self> {
let content = std::fs::read_to_string(path)?;
let config: Self = toml::from_str(&content)?;
Ok(config)
}
pub fn from_env() -> Result<Self> {
let sparse_config = SparseConfig::from_env()?;
Ok(Self {
sparse_config,
logging_level: std::env::var("LOG_LEVEL").unwrap_or_else(|_| "info".to_string()),
metrics_endpoint: std::env::var("METRICS_ENDPOINT").ok(),
health_check_interval: Duration::from_secs(
std::env::var("HEALTH_CHECK_INTERVAL")
.unwrap_or_else(|_| "30".to_string())
.parse()
.unwrap_or(30)
),
})
}
}
```
## Common Pitfalls
### 1. Inefficient Format Usage
```rust
// BAD: Using COO for repeated operations
let coo_matrix = COOTensor::from_triplets(triplets, shape)?;
for _ in 0..1000 {
let result = coo_matrix.matvec(&vector)?; // Inefficient!
}
// GOOD: Convert to efficient format once
let coo_matrix = COOTensor::from_triplets(triplets, shape)?;
let csr_matrix = CSRTensor::from_coo(&coo_matrix)?;
for _ in 0..1000 {
let result = csr_matrix.matvec(&vector)?; // Efficient!
}
```
### 2. Memory Leaks in Long-Running Applications
```rust
// BAD: Accumulating temporary matrices
let mut results = Vec::new();
for data in large_dataset {
let temp_matrix = process_data(data)?;
let result = expensive_operation(&temp_matrix)?;
results.push(result);
// temp_matrix not explicitly freed
}
// GOOD: Explicit memory management
let memory_pool = MemoryPool::new(memory_budget)?;
let mut results = Vec::new();
for data in large_dataset {
let temp_matrix = memory_pool.process_data(data)?;
let result = expensive_operation(&temp_matrix)?;
results.push(result);
memory_pool.deallocate(temp_matrix)?; // Explicit cleanup
if results.len() % 1000 == 0 {
memory_pool.garbage_collect()?; // Periodic cleanup
}
}
```
### 3. Ignoring Numerical Stability
```rust
// BAD: No numerical validation
fn risky_computation(matrix: &CSRTensor) -> Result<f64> {
let result = matrix.norm(2.0)?;
Ok(result * 1e20) // Could overflow
}
// GOOD: Proper numerical handling
fn safe_computation(matrix: &CSRTensor) -> Result<f64> {
let norm = matrix.norm(2.0)?;
// Check for potential overflow
if norm > f64::MAX / 1e20 {
return Err(TorshError::NumericalError {
message: "Computation would overflow".to_string(),
});
}
// Check for underflow
if norm < f64::MIN_POSITIVE * 1e20 {
return Err(TorshError::NumericalError {
message: "Computation would underflow".to_string(),
});
}
Ok(norm * 1e20)
}
```
## Design Patterns
### 1. Builder Pattern for Complex Matrices
```rust
pub struct SparseMatrixBuilder {
triplets: Vec<(usize, usize, f64)>,
shape: Option<(usize, usize)>,
format: SparseFormat,
sorted: bool,
deduplicated: bool,
}
impl SparseMatrixBuilder {
pub fn new() -> Self {
Self {
triplets: Vec::new(),
shape: None,
format: SparseFormat::CSR,
sorted: false,
deduplicated: false,
}
}
pub fn add_triplet(mut self, row: usize, col: usize, value: f64) -> Self {
self.triplets.push((row, col, value));
self
}
pub fn shape(mut self, shape: (usize, usize)) -> Self {
self.shape = Some(shape);
self
}
pub fn format(mut self, format: SparseFormat) -> Self {
self.format = format;
self
}
pub fn sorted(mut self) -> Self {
self.sorted = true;
self
}
pub fn deduplicated(mut self) -> Self {
self.deduplicated = true;
self
}
pub fn build(self) -> Result<Box<dyn SparseTensor>> {
let shape = self.shape.ok_or_else(|| TorshError::InvalidInput {
message: "Shape must be specified".to_string(),
})?;
let mut triplets = self.triplets;
if self.sorted {
triplets.sort_by_key(|&(r, c, _)| (r, c));
}
if self.deduplicated {
// Deduplicate triplets
let mut dedup_triplets = Vec::new();
let mut last_key = None;
let mut sum = 0.0;
for (r, c, v) in triplets {
let key = (r, c);
if last_key == Some(key) {
sum += v;
} else {
if let Some((lr, lc)) = last_key {
if sum != 0.0 {
dedup_triplets.push((lr, lc, sum));
}
}
last_key = Some(key);
sum = v;
}
}
if let Some((lr, lc)) = last_key {
if sum != 0.0 {
dedup_triplets.push((lr, lc, sum));
}
}
triplets = dedup_triplets;
}
match self.format {
SparseFormat::COO => {
let coo = COOTensor::from_triplets(triplets, shape)?;
Ok(Box::new(coo))
},
SparseFormat::CSR => {
let coo = COOTensor::from_triplets(triplets, shape)?;
let csr = CSRTensor::from_coo(&coo)?;
Ok(Box::new(csr))
},
SparseFormat::CSC => {
let coo = COOTensor::from_triplets(triplets, shape)?;
let csc = CSCTensor::from_coo(&coo)?;
Ok(Box::new(csc))
},
_ => Err(TorshError::UnsupportedOperation {
op: format!("Building matrix in {:?} format", self.format),
}),
}
}
}
```
### 2. Strategy Pattern for Algorithm Selection
```rust
pub trait SpMVStrategy {
fn multiply(&self, matrix: &CSRTensor, vector: &[f64]) -> Result<Vec<f64>>;
fn estimated_memory(&self, matrix: &CSRTensor) -> usize;
fn estimated_time(&self, matrix: &CSRTensor) -> Duration;
}
pub struct StandardSpMV;
pub struct ParallelSpMV { num_threads: usize }
pub struct MemoryEfficientSpMV { chunk_size: usize }
pub struct CacheOptimizedSpMV;
impl SpMVStrategy for StandardSpMV {
fn multiply(&self, matrix: &CSRTensor, vector: &[f64]) -> Result<Vec<f64>> {
matrix.matvec(vector)
}
fn estimated_memory(&self, matrix: &CSRTensor) -> usize {
matrix.memory_usage() + vector.len() * 8
}
fn estimated_time(&self, matrix: &CSRTensor) -> Duration {
Duration::from_millis(matrix.nnz() / 1000000) // 1M ops per ms
}
}
pub struct SpMVContext {
strategy: Box<dyn SpMVStrategy>,
}
impl SpMVContext {
pub fn new(strategy: Box<dyn SpMVStrategy>) -> Self {
Self { strategy }
}
pub fn auto_select_strategy(matrix: &CSRTensor,
constraints: &PerformanceConstraints) -> Self {
let strategies: Vec<Box<dyn SpMVStrategy>> = vec![
Box::new(StandardSpMV),
Box::new(ParallelSpMV { num_threads: 8 }),
Box::new(MemoryEfficientSpMV { chunk_size: 1000 }),
Box::new(CacheOptimizedSpMV),
];
let best_strategy = strategies
.into_iter()
.min_by_key(|strategy| {
let memory = strategy.estimated_memory(matrix);
let time = strategy.estimated_time(matrix);
if memory > constraints.max_memory {
return usize::MAX; // Invalid strategy
}
time.as_millis() as usize
})
.unwrap();
Self { strategy: best_strategy }
}
pub fn execute(&self, matrix: &CSRTensor, vector: &[f64]) -> Result<Vec<f64>> {
self.strategy.multiply(matrix, vector)
}
}
```
### 3. Observer Pattern for Progress Tracking
```rust
pub trait ProgressObserver {
fn on_operation_start(&self, operation: &str, total_work: usize);
fn on_progress(&self, completed_work: usize);
fn on_operation_complete(&self, operation: &str, duration: Duration);
}
pub struct ConsoleProgressObserver;
impl ProgressObserver for ConsoleProgressObserver {
fn on_operation_start(&self, operation: &str, total_work: usize) {
println!("Starting {}: {} items to process", operation, total_work);
}
fn on_progress(&self, completed_work: usize) {
print!("\rProgress: {}", completed_work);
std::io::stdout().flush().unwrap();
}
fn on_operation_complete(&self, operation: &str, duration: Duration) {
println!("\n{} completed in {:?}", operation, duration);
}
}
pub struct ProgressTracker {
observers: Vec<Box<dyn ProgressObserver>>,
}
impl ProgressTracker {
pub fn new() -> Self {
Self { observers: Vec::new() }
}
pub fn add_observer(&mut self, observer: Box<dyn ProgressObserver>) {
self.observers.push(observer);
}
pub fn start_operation(&self, operation: &str, total_work: usize) {
for observer in &self.observers {
observer.on_operation_start(operation, total_work);
}
}
pub fn report_progress(&self, completed_work: usize) {
for observer in &self.observers {
observer.on_progress(completed_work);
}
}
pub fn complete_operation(&self, operation: &str, duration: Duration) {
for observer in &self.observers {
observer.on_operation_complete(operation, duration);
}
}
}
```
## Optimization Checklist
### Pre-Deployment Checklist
- [ ] **Format Selection**: Verified optimal sparse format for each use case
- [ ] **Memory Management**: Implemented memory pools and garbage collection
- [ ] **Error Handling**: Comprehensive error handling with graceful degradation
- [ ] **Performance Testing**: Benchmarked critical paths and identified bottlenecks
- [ ] **Memory Testing**: Verified no memory leaks in long-running scenarios
- [ ] **Numerical Stability**: Validated numerical algorithms for edge cases
- [ ] **Configuration**: Environment-specific configuration management
- [ ] **Monitoring**: Implemented metrics collection and alerting
- [ ] **Documentation**: API documentation and usage examples
- [ ] **Testing**: Unit tests, integration tests, and property-based tests
### Performance Optimization Checklist
- [ ] **SIMD**: Enabled SIMD optimizations where available
- [ ] **Parallelization**: Used parallel algorithms for large matrices
- [ ] **Cache Optimization**: Optimized memory access patterns
- [ ] **Algorithm Selection**: Used optimal algorithms for each operation
- [ ] **Memory Allocation**: Minimized dynamic memory allocation
- [ ] **Format Conversion**: Minimized unnecessary format conversions
- [ ] **Batch Processing**: Batched operations to reduce overhead
- [ ] **Streaming**: Implemented streaming for large datasets
- [ ] **Profiling**: Regular performance profiling and optimization
- [ ] **Hardware Utilization**: Optimized for target hardware architecture
## Conclusion
Following these best practices will help you build robust, efficient, and maintainable sparse tensor applications with ToRSh-Sparse. Remember to:
1. **Profile before optimizing** - Identify actual bottlenecks
2. **Choose the right format** - Match format to access patterns
3. **Manage memory carefully** - Use pools and monitor usage
4. **Handle errors gracefully** - Plan for failure scenarios
5. **Test comprehensively** - Include performance and stress tests
6. **Monitor in production** - Track metrics and performance
7. **Document thoroughly** - Make code maintainable for others
The sparse tensor domain has many subtleties, and these practices will help you navigate them successfully while building high-performance applications.
For more detailed information, see the [Sparse Guide](SPARSE_GUIDE.md), [Format Reference](FORMAT_REFERENCE.md), and [Performance Guide](PERFORMANCE_GUIDE.md).