torsh-profiler 0.1.1

Performance profiling and monitoring for ToRSh
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
# ToRSh Profiler User Guide

Welcome to the ToRSh Profiler - a comprehensive performance profiling library for the ToRSh deep learning framework. This guide will help you get started with profiling your applications and optimizing performance.

## Table of Contents

1. [Getting Started]#getting-started
2. [Basic Profiling]#basic-profiling
3. [Advanced Features]#advanced-features
4. [Dashboard and Visualization]#dashboard-and-visualization
5. [Integration with CI/CD]#integration-with-cicd
6. [Performance Optimization]#performance-optimization
7. [Troubleshooting]#troubleshooting

## Getting Started

### Installation

Add the ToRSh profiler to your `Cargo.toml`:

```toml
[dependencies]
torsh-profiler = "0.1.0"
```

### Basic Setup

```rust
use torsh_profiler::*;

fn main() {
    // Start global profiling
    start_profiling();
    
    // Your application code here
    your_application_code();
    
    // Stop profiling and export results
    stop_profiling();
    
    // Export results
    let profiler = global_profiler();
    let memory_profiler = MemoryProfiler::new();
    
    // Export to various formats
    export_performance_dashboard(&profiler.lock().unwrap(), &memory_profiler, "dashboard.html").unwrap();
}
```

## Basic Profiling

### Function Profiling

Use the `ProfileScope` to profile functions automatically:

```rust
fn my_function() {
    let _scope = ProfileScope::simple("my_function".to_string(), "computation".to_string());
    
    // Your function code here
    expensive_computation();
}
```

### Using Profiling Macros

The profiler provides convenient macros for different profiling scenarios:

```rust
use torsh_profiler::*;

// Profile a block of code
profile_block!("matrix_multiply", {
    let result = matrix_a * matrix_b;
    result
});

// Profile a function call
profile_function!("data_loading", load_data_from_disk());

// Profile with custom metadata
profile_with_metadata!("gpu_operation", {
    operation_count: 1000,
    bytes_transferred: 1024 * 1024,
    flops: 2_000_000
}, {
    gpu_kernel_launch();
});

// Profile asynchronously
profile_async!("async_download", {
    async_download_data().await
});
```

### Memory Profiling

Track memory allocations and detect leaks:

```rust
let mut memory_profiler = MemoryProfiler::new();
memory_profiler.enable();

// Enable leak detection
memory_profiler.set_leak_detection_enabled(true);

// Your code that allocates memory
let data = vec![0u8; 1024 * 1024]; // 1MB allocation

// Check for leaks
let leak_results = memory_profiler.detect_leaks().unwrap();
if !leak_results.potential_leaks.is_empty() {
    println!("Found {} potential memory leaks", leak_results.leak_count);
}
```

### GPU Profiling

For CUDA operations:

```rust
let mut cuda_profiler = CudaProfiler::new();
cuda_profiler.enable();

// Profile GPU operations
{
    let _event = cuda_profiler.start_timing("kernel_launch").unwrap();
    // Launch CUDA kernel
    cuda_kernel_launch();
}

// Get GPU statistics
let stats = cuda_profiler.get_stats();
println!("Total kernel launches: {}", stats.kernel_launches);
```

## Advanced Features

### Machine Learning-based Analysis

Use ML algorithms to analyze performance patterns:

```rust
let config = MLAnalysisConfig::default();
let mut analyzer = MLAnalyzer::new(config);

// Create performance data from profiling events
let events = profiler.lock().unwrap().get_events();
let features = analyzer.extract_features(&events).unwrap();
analyzer.add_features(features).unwrap();

// Perform clustering to identify performance patterns
analyzer.train_clustering().unwrap();
let clusters = analyzer.get_clusters();

// Detect anomalies
let anomaly_result = analyzer.detect_anomaly(&features).unwrap();
if anomaly_result.is_anomaly {
    println!("Performance anomaly detected! Score: {}", anomaly_result.anomaly_score);
}

// Get optimization suggestions
let suggestions = analyzer.get_optimization_suggestions(&features);
for suggestion in suggestions {
    println!("💡 {}", suggestion);
}
```

### Regression Detection

Monitor performance over time and detect regressions:

```rust
let mut detector = RegressionDetector::new();

// Add baseline measurements
for i in 0..100 {
    detector.add_measurement("matrix_multiply", 100.0 + (i as f64)).unwrap();
}

// Update baselines
detector.update_baselines().unwrap();

// Check for regressions
let result = detector.check_regression("matrix_multiply", 150.0).unwrap();
if let Some(regression) = result {
    println!("Regression detected: {:?} ({}% slower)", 
             regression.severity, regression.percentage_change);
    
    for recommendation in regression.recommendations {
        println!("📋 {}", recommendation);
    }
}
```

### Distributed Profiling

For multi-node distributed applications:

```rust
// Initialize distributed profiling
init_distributed_profiling().unwrap();

// Add cluster nodes
add_cluster_node("worker1", "192.168.1.100:8080").unwrap();
add_cluster_node("worker2", "192.168.1.101:8080").unwrap();

// Record distributed events
record_distributed_event("data_transfer", "worker1", "worker2", 1024).unwrap();

// Analyze distributed performance
let analysis = analyze_distributed_performance().unwrap();
println!("Network efficiency: {:.2}%", analysis.network_analysis.efficiency);
```

### Custom Tool Integration

Integrate with external profiling tools:

```rust
// NVIDIA Nsight integration
let nsight_config = NsightConfig::default();
let mut nsight_profiler = create_nsight_profiler_with_config(nsight_config);

nsight_profiler.start_profiling().unwrap();
// Your GPU code here
nsight_profiler.stop_profiling().unwrap();

// Intel VTune integration
let vtune_config = VTuneConfig::default();
let mut vtune_profiler = create_vtune_profiler_with_config(vtune_config);

vtune_profiler.start_hotspot_analysis().unwrap();
// Your CPU-intensive code here
vtune_profiler.stop_analysis().unwrap();
```

## Dashboard and Visualization

### Real-time Dashboard

Create a web-based dashboard for real-time monitoring:

```rust
let dashboard_config = DashboardConfig {
    port: 8080,
    refresh_interval: 5,
    real_time_updates: true,
    max_data_points: 1000,
    enable_stack_traces: true,
    custom_css: Some("/* Your custom CSS */".to_string()),
};

let dashboard = create_dashboard_with_config(dashboard_config);

// Start the dashboard server
dashboard.start(profiler.clone(), memory_profiler.clone()).unwrap();

println!("Dashboard available at: http://localhost:8080");

// Add custom alerts
let alert = DashboardAlert {
    id: "high_memory".to_string(),
    severity: DashboardAlertSeverity::Warning,
    title: "High Memory Usage".to_string(),
    message: "Memory usage exceeded 80% threshold".to_string(),
    timestamp: std::time::SystemTime::now()
        .duration_since(std::time::UNIX_EPOCH)
        .unwrap()
        .as_secs(),
    resolved: false,
};

dashboard.add_alert(alert).unwrap();
```

### Static Reports

Generate comprehensive reports in multiple formats:

```rust
let reporting_config = ReportingConfig {
    include_performance_analysis: true,
    include_memory_analysis: true,
    include_recommendations: true,
    chart_generation: true,
    export_raw_data: true,
};

let mut reporter = create_reporting_engine_with_config(reporting_config);

// Generate different types of reports
reporter.generate_performance_report(&profiler.lock().unwrap(), "performance_report.html").unwrap();
reporter.generate_memory_report(&memory_profiler, "memory_report.html").unwrap();

// Export to different formats
reporter.export_to_pdf("performance_report.html", "report.pdf").unwrap();
reporter.export_to_json(&profiler.lock().unwrap(), "data.json").unwrap();
```

### Visualization

Create interactive charts and visualizations:

```rust
// Generate performance trend charts
export_performance_trend_chart(&profiler.lock().unwrap(), "trend_chart.html").unwrap();

// Create memory usage scatter plots
export_memory_scatter_plot(&memory_profiler, "memory_plot.html").unwrap();

// Generate operation frequency charts
export_operation_frequency_chart(&profiler.lock().unwrap(), "frequency_chart.html").unwrap();

// Create duration histograms
export_duration_histogram(&profiler.lock().unwrap(), "duration_histogram.html").unwrap();
```

## Integration with CI/CD

### Automated Performance Testing

Integrate profiling into your CI/CD pipeline:

```rust
let ci_config = CICDConfig {
    platform: CICDPlatform::GitHubActions,
    regression_threshold: 10.0, // 10% regression threshold
    baseline_branch: "main".to_string(),
    report_format: ReportFormat::Html,
    fail_on_regression: true,
    generate_pr_comments: true,
};

let mut ci_integration = create_cicd_integration_with_config(ci_config);

// Run performance tests
ci_integration.run_performance_tests().unwrap();

// Check for regressions
let regression_results = ci_integration.check_regressions().unwrap();
if !regression_results.is_empty() {
    for regression in regression_results {
        println!("❌ Regression in {}: {}% slower", 
                 regression.operation, regression.percentage_change);
    }
    
    // Generate PR comment
    ci_integration.generate_pr_comment().unwrap();
}
```

### Automated Alerts

Set up automated alerts for performance issues:

```rust
let alert_config = AlertConfig {
    duration_threshold: Duration::from_millis(100),
    memory_threshold: 1024 * 1024 * 1024, // 1GB
    throughput_threshold: 1000.0, // operations per second
    enable_anomaly_detection: true,
    notification_channels: vec![
        NotificationChannel::Slack { webhook_url: "https://hooks.slack.com/...".to_string() },
        NotificationChannel::Email { recipients: vec!["team@company.com".to_string()] },
    ],
};

let mut alert_manager = create_alert_manager_with_config(alert_config);
alert_manager.start_monitoring(&profiler, &memory_profiler).unwrap();
```

## Performance Optimization

### Bottleneck Detection

Automatically identify performance bottlenecks:

```rust
let bottleneck_results = detect_bottlenecks(&profiler.lock().unwrap(), &memory_profiler).unwrap();

for bottleneck in bottleneck_results.bottlenecks {
    println!("🚨 Bottleneck detected in {}: {}", bottleneck.operation, bottleneck.description);
    println!("   Severity: {:?}", bottleneck.severity);
    println!("   Impact: {:.1}% of total time", bottleneck.impact_percentage);
    
    for recommendation in bottleneck.recommendations {
        println!("   💡 {}", recommendation);
    }
}
```

### Efficiency Analysis

Analyze overall system efficiency:

```rust
let efficiency_results = analyze_efficiency(&profiler.lock().unwrap(), &memory_profiler).unwrap();

println!("System Efficiency Report:");
println!("  CPU Efficiency: {:.1}%", efficiency_results.cpu_efficiency);
println!("  Memory Efficiency: {:.1}%", efficiency_results.memory_efficiency);
println!("  Cache Performance: {:.1}%", efficiency_results.cache_performance);
println!("  Overall Score: {:.1}/100", efficiency_results.overall_score);

for suggestion in efficiency_results.optimization_suggestions {
    println!("  📈 {}", suggestion);
}
```

### Workload Characterization

Understand your application's workload characteristics:

```rust
let workload_analysis = analyze_workload(&profiler.lock().unwrap()).unwrap();

println!("Workload Analysis:");
println!("  Type: {:?}", workload_analysis.workload_type);
println!("  Compute Intensity: {:.2}", workload_analysis.compute_characteristics.arithmetic_intensity);
println!("  Memory Bandwidth Utilization: {:.1}%", workload_analysis.memory_patterns.bandwidth_utilization);
println!("  Parallelization Efficiency: {:.1}%", workload_analysis.parallelism_analysis.efficiency);

for recommendation in workload_analysis.optimization_recommendations {
    println!("  🎯 {} (Priority: {:?})", recommendation.description, recommendation.priority);
}
```

## Troubleshooting

### Common Issues

#### High Profiling Overhead

```rust
// Use sampling to reduce overhead
let sampling_config = SamplingConfig {
    sampling_rate: 0.1, // Sample 10% of operations
    adaptive_sampling: true,
    min_duration_threshold: Duration::from_micros(100),
};

let optimized_profiler = create_optimized_profiler_with_config(sampling_config);
```

#### Memory Issues

```rust
// Enable memory pooling for reduced allocations
let mut memory_profiler = MemoryProfiler::new();
memory_profiler.set_memory_pool_size(1024 * 1024 * 1024); // 1GB pool

// Use compact events to reduce memory usage
init_optimized_profiling().unwrap();
```

#### Large Data Exports

```rust
// Use streaming export for large datasets
let export_config = CustomExportFormat {
    format: ExportFormat::Csv,
    compression: Some(CompressionType::Gzip),
    streaming: true,
    batch_size: 10000,
};

export_with_config(&profiler.lock().unwrap(), "large_data.csv.gz", export_config).unwrap();
```

### Performance Tips

1. **Use Scoped Profiling**: Always use RAII-based profiling (`ProfileScope`) to ensure proper cleanup.

2. **Enable Profiling Conditionally**: Use feature flags or environment variables to enable profiling only when needed.

3. **Monitor Overhead**: Use the built-in overhead tracking to ensure profiling doesn't impact performance significantly.

4. **Batch Operations**: When possible, batch multiple small operations to reduce profiling overhead.

5. **Use Appropriate Sampling**: For high-frequency operations, use sampling to reduce overhead.

### Debugging

Enable debug logging to troubleshoot issues:

```rust
// Set environment variable
std::env::set_var("TORSH_PROFILER_LOG", "debug");

// Or use tracing directly
use tracing::{info, debug, error};

debug!("Profiling operation: {}", operation_name);
```

## Best Practices

1. **Start Simple**: Begin with basic function profiling before moving to advanced features.

2. **Measure First**: Always establish baseline performance before optimization.

3. **Profile in Production**: Use low-overhead profiling in production environments.

4. **Automate Analysis**: Set up automated regression detection and alerts.

5. **Document Results**: Keep track of optimizations and their impact.

6. **Team Collaboration**: Share profiling results and dashboards with your team.

## Examples

See the `examples/` directory for complete examples:

- `profiler_demo.rs` - Basic profiling example
- `analytics_demo.rs` - Advanced analytics and ML analysis
- `dashboard_demo.rs` - Real-time dashboard example
- `advanced_profiling_demo.rs` - Comprehensive profiling features

## Support

For issues and questions:

- Check the [troubleshooting section]#troubleshooting
- Review the examples in the repository
- File issues on the GitHub repository
- Refer to the API documentation

## Next Steps

1. Start with basic profiling in your application
2. Set up a dashboard for real-time monitoring
3. Integrate with your CI/CD pipeline
4. Use ML-based analysis for optimization insights
5. Share results with your team using reports and visualizations

Happy profiling! 🚀