calltrace-rs 1.1.4

High-performance function call tracing library for C/C++ applications using GCC instrumentation with Rust safety guarantees
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
# CallTrace Performance Tuning Guide

This guide provides comprehensive strategies for optimizing CallTrace performance across different use cases, from development debugging to production monitoring.

## Table of Contents

- [Performance Overview]#performance-overview
- [Configuration Strategies]#configuration-strategies
- [Compilation Optimizations]#compilation-optimizations
- [Runtime Tuning]#runtime-tuning
- [Memory Management]#memory-management
- [Profiling and Measurement]#profiling-and-measurement
- [Production Deployment]#production-deployment
- [Use Case Scenarios]#use-case-scenarios

## Performance Overview

### CallTrace Overhead Characteristics

| Configuration | Overhead per Call | Memory per 10K Calls | Use Case |
|---------------|-------------------|---------------------|----------|
| **Disabled** | < 5ns | 0 | Production baseline |
| **Basic Tracing** | 50-100ns | ~2MB | Production monitoring |
| **Argument Capture** | 500ns-2μs | ~5-10MB | Development debugging |
| **Debug Mode** | 1-5μs | ~10-20MB | Deep investigation |

### Performance Factors

1. **Function Call Frequency**
   - High-frequency functions (>1M calls/sec) feel overhead more
   - Recursive functions multiply overhead by recursion depth

2. **Argument Complexity**
   - Simple types (int, float): ~100ns overhead
   - Strings: ~200-500ns overhead  
   - Complex structs: ~1-2μs overhead
   - Arrays/pointers: ~500ns-1μs overhead

3. **Call Stack Depth**
   - Memory usage grows linearly with depth
   - Deep recursion (>1000 levels) can impact performance significantly

4. **Thread Count**
   - Per-thread overhead is minimal
   - Memory usage scales with number of active threads

## Configuration Strategies

### Development Configuration (Maximum Detail)

For debugging and development, prioritize completeness over performance:

```bash
# Full featured development setup
export CALLTRACE_OUTPUT=dev_trace.json
export CALLTRACE_CAPTURE_ARGS=1
export CALLTRACE_MAX_DEPTH=1000
export CALLTRACE_DEBUG=1
export CALLTRACE_PRETTY_JSON=1

# Use debug build for additional error checking
LD_PRELOAD=./.target/debug/libcalltrace.so ./your_program
```

**Expected overhead:** 10-50x slower than normal execution

### Testing Configuration (Balanced)

For integration testing and QA environments:

```bash
# Balanced testing setup
export CALLTRACE_OUTPUT=test_trace.json
export CALLTRACE_CAPTURE_ARGS=0      # Disable expensive argument capture
export CALLTRACE_MAX_DEPTH=500
export CALLTRACE_DEBUG=0
export CALLTRACE_PRETTY_JSON=1

# Use release build for better performance
LD_PRELOAD=./.target/release/libcalltrace.so ./your_program
```

**Expected overhead:** 2-5x slower than normal execution

### Production Configuration (Minimal Overhead)

For production monitoring with minimal impact:

```bash
# Production-optimized setup
export CALLTRACE_OUTPUT=prod_trace.json
export CALLTRACE_CAPTURE_ARGS=0      # Never enable in production
export CALLTRACE_MAX_DEPTH=100       # Limit memory usage
export CALLTRACE_DEBUG=0             # No debug output
export CALLTRACE_PRETTY_JSON=0       # Smaller file size

# Always use release build in production
LD_PRELOAD=./.target/release/libcalltrace.so ./your_program
```

**Expected overhead:** 10-20% slower than normal execution

### Minimal Configuration (Ultra-Low Overhead)

For scenarios where every nanosecond counts:

```bash
# Ultra-minimal setup - basic call counting only
export CALLTRACE_OUTPUT=/dev/null    # Or specific file when needed
export CALLTRACE_CAPTURE_ARGS=0
export CALLTRACE_MAX_DEPTH=50
export CALLTRACE_DEBUG=0
export CALLTRACE_PRETTY_JSON=0

# Release build only
LD_PRELOAD=./.target/release/libcalltrace.so ./your_program
```

**Expected overhead:** < 5% impact on execution time

## Compilation Optimizations

### Target Program Compilation

Choose compilation flags based on your performance requirements:

#### Maximum Performance (Production)
```bash
# Optimized for speed, limited tracing capability
gcc -rdynamic -finstrument-functions -O3 -DNDEBUG \
    -fno-omit-frame-pointer -fno-inline-small-functions \
    your_program.c -o your_program_optimized
```

**Trade-offs:**
- ✅ Fastest execution
- ❌ Some functions may be inlined (not traced)
- ❌ Harder to debug issues

#### Balanced Performance (Testing)
```bash
# Good performance with complete tracing
gcc -rdynamic -finstrument-functions -O2 -g \
    -fno-omit-frame-pointer \
    your_program.c -o your_program_balanced
```

**Trade-offs:**
- ✅ Good performance
- ✅ Most functions traced
- ✅ Debug info available

#### Maximum Tracing (Development)
```bash
# Complete tracing capability, slower execution
gcc -rdynamic -finstrument-functions -O0 -g \
    -fno-inline -fno-omit-frame-pointer \
    your_program.c -o your_program_debug
```

**Trade-offs:**
- ✅ All functions traced
- ✅ Complete debug information
- ❌ Significantly slower execution

### CallTrace Library Compilation

Use appropriate CallTrace build for your use case:

```bash
# Development: debug build with error checking
cargo build

# Production: optimized build
cargo build --release

# Ultra-optimized: custom profile (add to Cargo.toml)
[profile.production]
inherits = "release"
lto = "fat"          # Full link-time optimization
codegen-units = 1    # Single codegen unit for better optimization
panic = "abort"      # Smaller binary, faster execution
```

## Runtime Tuning

### Environment Variable Optimization

Fine-tune CallTrace behavior with environment variables:

```bash
# Memory-constrained environments
export CALLTRACE_MAX_DEPTH=50          # Limit call stack depth
export CALLTRACE_PRETTY_JSON=0         # Reduce file size

# High-frequency call environments  
export CALLTRACE_MAX_DEPTH=20          # Very shallow tracing
export CALLTRACE_OUTPUT=/tmp/trace.json # Fast storage

# Multi-threaded applications
export CALLTRACE_MAX_DEPTH=200         # Higher limit for concurrent threads
```

### Dynamic Performance Control

Control tracing overhead at runtime:

```c
// In your C program - disable instrumentation for hot functions
__attribute__((no_instrument_function))
void high_frequency_function() {
    // This function won't be traced, reducing overhead
}

// Use GCC pragmas to disable instrumentation for code blocks
#pragma GCC push_options
#pragma GCC optimize ("no-instrument-functions")
void performance_critical_section() {
    // Code here won't be traced
}
#pragma GCC pop_options
```

### Signal-Based Control (Future Feature)

```c
// Example of potential runtime control (not yet implemented)
#include <signal.h>

void toggle_tracing(int sig) {
    // Toggle CallTrace on/off during execution
    // Useful for focusing on specific time periods
}

int main() {
    signal(SIGUSR1, toggle_tracing);
    // Run program, send SIGUSR1 to toggle tracing
}
```

## Memory Management

### Understanding Memory Usage

CallTrace memory consumption breakdown:

1. **Call Tree Storage:** ~200 bytes per function call
2. **DWARF Cache:** ~1KB per unique function
3. **String Interning:** ~50 bytes per unique string
4. **Thread Metadata:** ~1KB per thread
5. **JSON Buffer:** ~2x final output size during generation

### Memory Optimization Strategies

#### Limit Call Depth
```bash
# Prevent memory explosion in recursive scenarios
export CALLTRACE_MAX_DEPTH=100

# Monitor actual depth in traces
jq '.call_trees[].root_calls[] | .. | .children? | length' trace.json | sort -n | tail -10
```

#### Periodic Output Flushing (Future Feature)
```bash
# Hypothetical future feature - periodic output to limit memory
export CALLTRACE_FLUSH_INTERVAL=10000   # Flush every 10K calls
export CALLTRACE_OUTPUT_ROLLING=1       # Rolling output files
```

#### Memory Monitoring
```bash
# Monitor CallTrace memory usage
valgrind --tool=massif LD_PRELOAD=./.target/release/libcalltrace.so ./your_program

# Check peak memory usage
/usr/bin/time -v LD_PRELOAD=./.target/release/libcalltrace.so ./your_program

# Monitor real-time memory usage
while true; do 
    ps aux | grep your_program | grep -v grep
    sleep 1
done
```

## Profiling and Measurement

### Baseline Measurements

Always establish baseline performance before adding CallTrace:

```bash
# 1. Measure without CallTrace
time ./your_program
perf stat -e cycles,instructions,cache-misses ./your_program

# 2. Measure with CallTrace (minimal config)
time env CALLTRACE_OUTPUT=/dev/null LD_PRELOAD=./.target/release/libcalltrace.so ./your_program

# 3. Measure with CallTrace (full config)
time env CALLTRACE_CAPTURE_ARGS=1 CALLTRACE_OUTPUT=trace.json LD_PRELOAD=./.target/release/libcalltrace.so ./your_program
```

### Performance Profiling

Use profiling tools to understand overhead distribution:

```bash
# Profile CallTrace overhead with perf
perf record -g LD_PRELOAD=./.target/release/libcalltrace.so ./your_program
perf report

# Focus on CallTrace-specific overhead
perf record -g -e probe_libcalltrace:* LD_PRELOAD=./.target/release/libcalltrace.so ./your_program

# Analyze hot functions
perf top -p $(pgrep your_program)
```

### CallTrace-Specific Metrics

Analyze CallTrace's own performance from trace output:

```bash
# Extract timing statistics from trace
jq -r '.call_trees[].root_calls[] | .. | select(.duration_ns?) | .duration_ns' trace.json | \
    awk '{sum+=$1; count++} END {print "Average:", sum/count "ns", "Total calls:", count}'

# Find slowest function calls
jq -r '.call_trees[].root_calls[] | .. | select(.duration_ns?) | "\(.duration_ns) \(.function)"' trace.json | \
    sort -nr | head -20

# Analyze call frequency
jq -r '.call_trees[].root_calls[] | .. | select(.function?) | .function' trace.json | \
    sort | uniq -c | sort -nr | head -20
```

## Production Deployment

### Pre-Production Testing

Before deploying CallTrace in production:

1. **Load Testing:**
   ```bash
   # Test with production-like load
   ab -n 10000 -c 100 http://localhost:8080/api/endpoint
   
   # Compare with and without CallTrace
   siege -t 60s -c 50 http://localhost:8080/
   ```

2. **Memory Leak Testing:**
   ```bash
   # Long-running memory test
   valgrind --tool=memcheck --leak-check=full \
       LD_PRELOAD=./.target/release/libcalltrace.so ./your_program
   ```

3. **Stress Testing:**
   ```bash
   # High concurrency test
   for i in {1..100}; do
       LD_PRELOAD=./.target/release/libcalltrace.so ./your_program &
   done
   wait
   ```

### Production Monitoring

Monitor CallTrace impact in production:

```bash
# Monitor system metrics
iostat -x 1    # Disk I/O impact from JSON writing
free -m        # Memory usage
top -p $(pgrep your_program)  # CPU usage

# Application-specific metrics
curl http://localhost:8080/metrics | grep response_time
```

### Rollback Strategy

Prepare for quick rollback if issues arise:

```bash
# Create wrapper script for easy enable/disable
cat > run_with_trace.sh << 'EOF'
#!/bin/bash
if [[ "$ENABLE_TRACING" == "1" ]]; then
    exec env LD_PRELOAD=./.target/release/libcalltrace.so "$@"
else
    exec "$@"
fi
EOF

# Use in production
ENABLE_TRACING=1 ./run_with_trace.sh ./your_program  # With tracing
ENABLE_TRACING=0 ./run_with_trace.sh ./your_program  # Without tracing
```

## Use Case Scenarios

### Scenario 1: Debugging Performance Regression

**Goal:** Find which functions are slower than expected

**Configuration:**
```bash
# Capture detailed timing with argument information
export CALLTRACE_CAPTURE_ARGS=1
export CALLTRACE_OUTPUT=regression_analysis.json
export CALLTRACE_PRETTY_JSON=1
LD_PRELOAD=./.target/debug/libcalltrace.so ./your_program
```

**Analysis:**
```bash
# Find functions taking longer than threshold
jq '.call_trees[].root_calls[] | .. | select(.duration_ns? > 1000000) | {function, duration_ns}' regression_analysis.json

# Compare with baseline trace
diff <(jq '.call_trees[].root_calls[] | .. | select(.function?) | .function' baseline.json | sort) \
     <(jq '.call_trees[].root_calls[] | .. | select(.function?) | .function' regression_analysis.json | sort)
```

### Scenario 2: Production Health Monitoring

**Goal:** Monitor application behavior with minimal overhead

**Configuration:**
```bash
# Ultra-lightweight monitoring
export CALLTRACE_OUTPUT=/var/log/app_trace.json
export CALLTRACE_CAPTURE_ARGS=0
export CALLTRACE_MAX_DEPTH=50
export CALLTRACE_PRETTY_JSON=0
LD_PRELOAD=./.target/release/libcalltrace.so ./your_program
```

**Automation:**
```bash
# Rotate trace files hourly
0 * * * * mv /var/log/app_trace.json /var/log/app_trace_$(date +%Y%m%d_%H).json

# Analyze for anomalies
jq '.metadata.total_calls' /var/log/app_trace_*.json | \
    awk '{if(NR==1){min=max=$1} if($1<min){min=$1} if($1>max){max=$1}} END {print "Range:", min "-" max}'
```

### Scenario 3: API Performance Analysis

**Goal:** Understand request processing flow and timing

**Configuration:**
```bash
# Capture API request flow
export CALLTRACE_OUTPUT=api_flow.json
export CALLTRACE_CAPTURE_ARGS=1  # Capture request parameters
export CALLTRACE_MAX_DEPTH=200   # Deep web framework stacks
LD_PRELOAD=./.target/release/libcalltrace.so ./api_server
```

**Analysis:**
```bash
# Extract request handler timing
jq '.call_trees[] | .root_calls[] | .. | select(.function? | test("handle_request|process_api")) | {function, duration_ns}' api_flow.json

# Identify bottleneck functions
jq -r '.call_trees[] | .root_calls[] | .. | select(.duration_ns? > 100000) | "\(.duration_ns) \(.function)"' api_flow.json | sort -nr
```

### Scenario 4: Memory Allocation Profiling

**Goal:** Track memory allocation patterns

**Configuration:**
```bash
# Trace malloc/free patterns (if instrumented)
export CALLTRACE_OUTPUT=memory_trace.json
export CALLTRACE_CAPTURE_ARGS=1
# Ensure memory functions are instrumented in your build
LD_PRELOAD=./.target/debug/libcalltrace.so ./your_program
```

**Note:** CallTrace doesn't automatically trace system memory functions. You'll need to instrument your own memory management functions.

## Performance Best Practices Summary

### ✅ Do's

1. **Always use release builds in production**
2. **Disable argument capture unless specifically needed**
3. **Set appropriate call depth limits**
4. **Monitor memory usage in long-running applications**
5. **Establish baseline performance measurements**
6. **Use minimal configuration for production monitoring**
7. **Test performance impact thoroughly before deployment**

### ❌ Don'ts

1. **Don't enable debug mode in production**
2. **Don't use unlimited call depth with recursive functions**
3. **Don't enable argument capture for high-frequency functions**
4. **Don't ignore memory usage growth over time**
5. **Don't deploy without performance testing**
6. **Don't use debug builds for performance measurements**
7. **Don't forget to monitor disk space for JSON output**

### 🎯 Golden Rules

1. **Measure First:** Always establish baseline performance
2. **Start Minimal:** Begin with lowest overhead configuration
3. **Iterate Gradually:** Add features as needed, measuring impact
4. **Monitor Continuously:** Watch for performance degradation over time
5. **Plan for Rollback:** Always have a way to quickly disable tracing

By following these guidelines, you can effectively use CallTrace across the entire development lifecycle, from detailed debugging to production monitoring, while maintaining acceptable performance characteristics.