Skip to main content

calltrace/
lib.rs

1//! # CallTrace - High-Performance Function Call Tracing Library
2//!
3//! CallTrace is a Rust library that provides comprehensive function call tracing capabilities
4//! using GCC's `-finstrument-functions` feature. It captures function calls, arguments, return values,
5//! and call relationships across multiple threads with minimal performance overhead.
6//!
7//! ## Features
8//!
9//! - **Zero-overhead when disabled**: Atomic fast-path checking
10//! - **Thread-safe**: Full support for multi-threaded applications
11//! - **Argument capture**: DWARF-based type-aware argument extraction
12//! - **Return value tracing**: RAX/XMM0 register capture
13//! - **Crash analysis**: Comprehensive crash reporting with stack traces
14//! - **JSON output**: Structured, hierarchical call tree export
15//! - **Symbol resolution**: Function name resolution with C++ demangling
16//! - **Memory safe**: 100% Rust implementation with zero memory leaks
17//!
18//! ## Quick Start
19//!
20//! ### 1. Compile your C/C++ program with instrumentation
21//!
22//! ```bash
23//! gcc -rdynamic -finstrument-functions -g your_program.c -o your_program
24//! ```
25//!
26//! Required flags:
27//! - `-rdynamic`: Export symbols for dladdr() resolution
28//! - `-finstrument-functions`: Enable GCC instrumentation hooks  
29//! - `-g`: Provide DWARF debug information for argument capture
30//!
31//! ### 2. Run with CallTrace
32//!
33//! ```bash
34//! # Basic tracing
35//! CALLTRACE_OUTPUT=trace.json LD_PRELOAD=./libcalltrace.so ./your_program
36//!
37//! # With argument capture (higher overhead)
38//! CALLTRACE_CAPTURE_ARGS=1 CALLTRACE_OUTPUT=trace.json LD_PRELOAD=./libcalltrace.so ./your_program
39//!
40//! # With debug output
41//! CALLTRACE_DEBUG=1 CALLTRACE_OUTPUT=trace.json LD_PRELOAD=./libcalltrace.so ./your_program
42//! ```
43//!
44//! ### 3. Analyze the results
45//!
46//! The generated JSON file contains a hierarchical view of all function calls:
47//!
48//! ```json
49//! {
50//!   "metadata": {
51//!     "version": "1.0.0",
52//!     "timestamp": "2024-01-15T10:30:00Z",
53//!     "total_calls": 1250,
54//!     "threads": 4
55//!   },
56//!   "call_trees": {
57//!     "12345": {
58//!       "thread_id": 12345,
59//!       "calls": [
60//!         {
61//!           "function": "main",
62//!           "address": "0x401020",
63//!           "start_time": "10:30:00.123456",
64//!           "end_time": "10:30:00.987654",
65//!           "arguments": [...],
66//!           "children": [...]
67//!         }
68//!       ]
69//!     }
70//!   }
71//! }
72//! ```
73//!
74//! ## Environment Variables
75//!
76//! | Variable | Default | Description |
77//! |----------|---------|-------------|
78//! | `CALLTRACE_OUTPUT` | `{executable}.json` | Output file path |
79//! | `CALLTRACE_CAPTURE_ARGS` | `false` | Enable argument capture |
80//! | `CALLTRACE_MAX_DEPTH` | `100` | Maximum call depth |
81//! | `CALLTRACE_DEBUG` | `false` | Enable debug output |
82//! | `CALLTRACE_PRETTY_JSON` | `true` | Pretty-print JSON |
83//!
84//! ## Performance
85//!
86//! CallTrace is designed for production use with minimal overhead:
87//!
88//! - **Function calls**: ~50-100ns overhead per call (argument capture disabled)
89//! - **Memory usage**: ~2MB for 10,000 function calls
90//! - **Thread safety**: Zero data races, lock-free where possible
91//! - **Atomic fast-path**: <5ns when tracing is disabled
92//!
93//! ## Architecture
94//!
95//! CallTrace uses a modular architecture:
96//!
97//! - [`cyg_profile`]: GCC instrumentation hooks entry points
98//! - [`call_tree`]: Thread-safe call tree management
99//! - [`dwarf_analyzer`]: DWARF debug information parsing
100//! - [`register_reader`]: x86_64 register and argument capture  
101//! - [`json_output`]: Structured JSON serialization
102//! - [`crash_handler`]: Signal handling and crash analysis
103//! - [`build_validator`]: Compilation requirement validation
104//!
105//! ## Safety and Reliability
106//!
107//! - **Memory safe**: No manual memory management, RAII throughout
108//! - **Thread safe**: DashMap and `Arc<RwLock<T>>` for concurrent access
109//! - **Signal safe**: Proper signal handler chaining and restoration
110//! - **Crash resilient**: Comprehensive crash analysis and recovery
111//! - **Production tested**: Extensive integration and stress testing
112//!
113//! ## Platform Support
114//!
115//! - **Architecture**: x86_64
116//! - **Operating System**: Linux (with glibc)
117//! - **Compiler**: GCC with `-finstrument-functions` support
118//! - **Rust**: 2021 edition, stable channel
119//!
120//! ## Example Integration
121//!
122//! ```c
123//! // your_program.c
124//! #include <stdio.h>
125//!
126//! int fibonacci(int n) {
127//!     if (n <= 1) return n;
128//!     return fibonacci(n-1) + fibonacci(n-2);
129//! }
130//!
131//! int main() {
132//!     printf("fib(8) = %d\n", fibonacci(8));
133//!     return 0;
134//! }
135//! ```
136//!
137//! Compile and trace:
138//! ```bash
139//! gcc -rdynamic -finstrument-functions -g your_program.c -o your_program
140//! CALLTRACE_OUTPUT=fib_trace.json LD_PRELOAD=./libcalltrace.so ./your_program
141//! ```
142//!
143//! This will generate a complete call tree showing the recursive fibonacci calls,
144//! timing information, and call relationships.
145//!
146//! ## C API Functions
147//!
148//! The library exports C-compatible functions for manual control:
149//!
150//! - [`calltrace_init()`]: Initialize tracing (called automatically)
151//! - [`calltrace_cleanup()`]: Cleanup and write output (called automatically)
152//!
153//! ## Internal Implementation Notes
154//!
155//! For developers contributing to CallTrace:
156//!
157//! - Uses `ctor`/`dtor` attributes for automatic initialization
158//! - Thread-local storage for performance counters and buffers
159//! - LRU caching for DWARF function information
160//! - String interning for common type/function names
161//! - Atomic operations for statistics and fast-path checks
162
163#![allow(non_upper_case_globals)]
164#![allow(non_camel_case_types)]
165#![allow(non_snake_case)]
166
167use once_cell::sync::Lazy;
168use std::cell::RefCell;
169use std::collections::{HashMap, HashSet};
170use std::ffi::c_void;
171use std::sync::atomic::{AtomicBool, AtomicU64, Ordering};
172use std::sync::{Arc, Mutex, Once};
173
174// Core modules
175pub mod build_validator;
176pub mod call_tree;
177pub mod crash_handler;
178pub mod cyg_profile;
179pub mod dwarf_analyzer;
180pub mod error;
181pub mod json_output;
182pub mod register_reader;
183
184// Re-exports for C compatibility
185pub use cyg_profile::{__cyg_profile_func_enter, __cyg_profile_func_exit};
186
187use call_tree::CallTreeManager;
188use dwarf_analyzer::DwarfAnalyzer;
189use json_output::JsonOutputGenerator;
190
191/// Global call tree manager
192static CALL_TREE_MANAGER: Lazy<Arc<CallTreeManager>> =
193    Lazy::new(|| Arc::new(CallTreeManager::new()));
194
195/// Global DWARF analyzer
196static DWARF_ANALYZER: Lazy<Arc<Mutex<Option<DwarfAnalyzer>>>> =
197    Lazy::new(|| Arc::new(Mutex::new(None)));
198
199/// Global configuration
200static CONFIG: Lazy<Arc<Mutex<CallTraceConfig>>> =
201    Lazy::new(|| Arc::new(Mutex::new(CallTraceConfig::default())));
202
203/// Global initialization flag
204static INIT: Once = Once::new();
205
206/// Performance optimization: atomic flags for fast path checking
207static TRACING_ENABLED: AtomicBool = AtomicBool::new(true);
208static ARGUMENT_CAPTURE_ENABLED: AtomicBool = AtomicBool::new(false);
209static FUNCTION_CALL_COUNT: AtomicU64 = AtomicU64::new(0);
210static ARGUMENT_CAPTURE_COUNT: AtomicU64 = AtomicU64::new(0);
211
212// Thread-local counters for batched atomic updates
213thread_local! {
214    static LOCAL_COUNTERS: RefCell<LocalCounters> = RefCell::new(LocalCounters::new());
215}
216
217/// Local per-thread counters to reduce atomic operations
218#[derive(Debug)]
219struct LocalCounters {
220    function_calls: u64,
221    argument_captures: u64,
222    batch_size: u64,
223}
224
225impl LocalCounters {
226    fn new() -> Self {
227        Self {
228            function_calls: 0,
229            argument_captures: 0,
230            batch_size: 100, // Flush every 100 operations
231        }
232    }
233
234    #[inline]
235    fn increment_function_calls(&mut self) -> u64 {
236        self.function_calls += 1;
237
238        // Batch update to global counter
239        if self.function_calls % self.batch_size == 0 {
240            let batch = self.function_calls;
241            self.function_calls = 0;
242            FUNCTION_CALL_COUNT.fetch_add(batch, Ordering::Relaxed) + batch
243        } else {
244            FUNCTION_CALL_COUNT.load(Ordering::Relaxed) + self.function_calls
245        }
246    }
247
248    #[inline]
249    fn increment_argument_captures(&mut self) {
250        self.argument_captures += 1;
251
252        // Batch update to global counter
253        if self.argument_captures % self.batch_size == 0 {
254            let batch = self.argument_captures;
255            self.argument_captures = 0;
256            ARGUMENT_CAPTURE_COUNT.fetch_add(batch, Ordering::Relaxed);
257        }
258    }
259
260    // Flush remaining counts (called during cleanup)
261    fn flush(&mut self) {
262        if self.function_calls > 0 {
263            FUNCTION_CALL_COUNT.fetch_add(self.function_calls, Ordering::Relaxed);
264            self.function_calls = 0;
265        }
266        if self.argument_captures > 0 {
267            ARGUMENT_CAPTURE_COUNT.fetch_add(self.argument_captures, Ordering::Relaxed);
268            self.argument_captures = 0;
269        }
270    }
271}
272
273/// Type alias for the function info cache to reduce complexity
274type FunctionInfoCache = Lazy<Arc<Mutex<HashMap<u64, Option<dwarf_analyzer::FunctionInfo>>>>>;
275
276/// DWARF function information cache for performance optimization
277static FUNCTION_INFO_CACHE: FunctionInfoCache = Lazy::new(|| Arc::new(Mutex::new(HashMap::new())));
278
279/// String intern pool for reducing memory allocations
280/// Stores commonly used strings like type names, function names
281static STRING_INTERN_POOL: Lazy<Arc<Mutex<HashSet<String>>>> =
282    Lazy::new(|| Arc::new(Mutex::new(HashSet::new())));
283
284// Pre-allocated argument buffer pool to reduce allocations
285thread_local! {
286    static ARGUMENT_BUFFER_POOL: RefCell<Vec<Vec<register_reader::CapturedArgument>>> =
287        const { RefCell::new(Vec::new()) };
288
289    /// Pre-allocated string formatting buffers to reduce allocations
290    static FORMAT_BUFFER: RefCell<String> = RefCell::new(String::with_capacity(64));
291}
292
293/// CallTrace runtime configuration
294///
295/// This structure holds all configuration options that control CallTrace behavior.
296/// Configuration is typically loaded from environment variables during initialization,
297/// but can be modified programmatically if needed.
298///
299/// # Configuration Sources
300///
301/// 1. **Environment Variables** (primary source, read at startup)
302/// 2. **Default Values** (fallback when environment variables are not set)
303/// 3. **Programmatic** (can be modified via unsafe access to CONFIG static)
304///
305/// # Fields
306///
307/// ## `enabled: bool`
308/// Master enable/disable switch for all tracing functionality.
309/// - **Environment**: `CALLTRACE_ENABLED` (1/true to enable)
310/// - **Default**: `true`
311/// - **Performance Impact**: When disabled, overhead is <5ns per function call
312///
313/// ## `capture_arguments: bool`
314/// Enable expensive argument capture using DWARF debugging information.
315/// - **Environment**: `CALLTRACE_CAPTURE_ARGS` (1/true to enable)
316/// - **Default**: `false`
317/// - **Performance Impact**: 10-50x overhead when enabled, depending on argument complexity
318/// - **Requirements**: Target program must be compiled with `-g` flag
319///
320/// ## `output_file: Option<String>`
321/// Path where JSON trace output will be written.
322/// - **Environment**: `CALLTRACE_OUTPUT`
323/// - **Default**: `{executable_name}.json` in the same directory as the executable
324/// - **Special Values**:
325///   - `None`: No output generated
326///   - `"stderr"`: Output to stderr (not implemented)
327///   - `"stdout"`: Output to stdout (not implemented)
328///
329/// ## `max_call_depth: usize`
330/// Maximum depth of function call nesting to trace.
331/// - **Environment**: `CALLTRACE_MAX_DEPTH`
332/// - **Default**: `100`
333/// - **Purpose**: Prevents infinite recursion and limits memory usage
334/// - **Behavior**: Calls deeper than this limit are ignored
335///
336/// ## `pretty_json: bool`
337/// Whether to format JSON output with indentation and newlines.
338/// - **Environment**: `CALLTRACE_PRETTY_JSON` (1/true to enable)
339/// - **Default**: `true`
340/// - **Trade-off**: Readable output vs. smaller file size
341///
342/// # Examples
343///
344/// ## Reading Current Configuration
345/// ```rust
346/// // Note: This requires unsafe access to the CONFIG static
347/// use calltrace::CallTraceConfig;
348///
349/// // This is how configuration is accessed internally
350/// // (Not recommended for external use)
351/// ```
352///
353/// ## Environment Variable Configuration
354/// ```bash
355/// # Minimal configuration - basic tracing only
356/// CALLTRACE_OUTPUT=trace.json ./your_program
357///
358/// # Full featured configuration
359/// CALLTRACE_OUTPUT=full_trace.json \
360/// CALLTRACE_CAPTURE_ARGS=1 \
361/// CALLTRACE_MAX_DEPTH=50 \
362/// CALLTRACE_DEBUG=1 \
363/// CALLTRACE_PRETTY_JSON=1 \
364/// ./your_program
365///
366/// # Performance-optimized configuration
367/// CALLTRACE_OUTPUT=perf_trace.json \
368/// CALLTRACE_CAPTURE_ARGS=0 \
369/// CALLTRACE_MAX_DEPTH=200 \
370/// CALLTRACE_PRETTY_JSON=0 \
371/// ./your_program
372/// ```
373///
374/// # Performance Guidelines
375///
376/// - **Production**: Set `capture_arguments = false` for minimal overhead
377/// - **Development**: Enable `capture_arguments = true` for detailed debugging
378/// - **Deep Recursion**: Increase `max_call_depth` if needed, but monitor memory usage
379/// - **File Size**: Set `pretty_json = false` for smaller output files
380///
381/// # Thread Safety
382///
383/// Configuration is read once during initialization and then considered immutable.
384/// All fields can be safely accessed concurrently from multiple threads.
385#[derive(Debug, Clone)]
386pub struct CallTraceConfig {
387    pub enabled: bool,
388    pub capture_arguments: bool,
389    pub output_file: Option<String>,
390    pub max_call_depth: usize,
391    pub pretty_json: bool,
392}
393
394impl Default for CallTraceConfig {
395    fn default() -> Self {
396        Self {
397            enabled: true,
398            capture_arguments: std::env::var("CALLTRACE_CAPTURE_ARGS")
399                .map(|v| v == "1" || v.to_lowercase() == "true")
400                .unwrap_or(false),
401            output_file: std::env::var("CALLTRACE_OUTPUT")
402                .ok()
403                .or_else(generate_default_output_filename),
404            max_call_depth: std::env::var("CALLTRACE_MAX_DEPTH")
405                .ok()
406                .and_then(|v| v.parse().ok())
407                .unwrap_or(100),
408            pretty_json: std::env::var("CALLTRACE_PRETTY_JSON")
409                .map(|v| v == "1" || v.to_lowercase() == "true")
410                .unwrap_or(true),
411        }
412    }
413}
414
415/// Initialize the CallTrace library
416///
417/// This function initializes the global CallTrace state, including:
418/// - DWARF debug information analyzer
419/// - Crash signal handlers  
420/// - Performance monitoring counters
421/// - Configuration from environment variables
422///
423/// # Safety
424///
425/// This function is automatically called when the library is loaded via `ctor` attributes.
426/// It can be called multiple times safely - subsequent calls are no-ops.
427///
428/// # Returns
429///
430/// Always returns `0` for C compatibility.
431///
432/// # Environment Variables
433///
434/// The following environment variables are read during initialization:
435/// - `CALLTRACE_OUTPUT`: Output file path
436/// - `CALLTRACE_CAPTURE_ARGS`: Enable argument capture (1/true)  
437/// - `CALLTRACE_MAX_DEPTH`: Maximum call depth (default: 100)
438/// - `CALLTRACE_DEBUG`: Enable debug output (1/true)
439/// - `CALLTRACE_PRETTY_JSON`: Pretty-print JSON (default: true)
440///
441/// # Examples
442///
443/// Usually called automatically, but can be invoked manually:
444///
445/// ```c
446/// // In C code
447/// extern int calltrace_init(void);
448/// int result = calltrace_init(); // Returns 0 on success
449/// ```
450#[no_mangle]
451pub extern "C" fn calltrace_init() -> i32 {
452    INIT.call_once(|| {
453        if let Err(e) = init_global_state() {
454            eprintln!("CallTrace initialization failed: {:?}", e);
455        }
456    });
457    0
458}
459
460/// Cleanup the CallTrace library and write final output
461///
462/// This function performs comprehensive cleanup of the CallTrace library:
463/// - Flushes all pending thread-local performance counters
464/// - Generates and writes the final JSON trace output
465/// - Restores original signal handlers  
466/// - Releases all allocated resources
467///
468/// # Safety
469///
470/// This function is automatically called when the library is unloaded via `dtor` attributes.
471/// It handles cleanup gracefully even during abnormal program termination.
472///
473/// # Thread Safety
474///
475/// This function is designed to be signal-safe and can be called during:
476/// - Normal program exit
477/// - Library unloading
478/// - Signal handler execution
479/// - Thread destruction
480///
481/// # Output Generation
482///
483/// If `CALLTRACE_OUTPUT` was specified, this function will:
484/// 1. Collect all call trees from all threads
485/// 2. Generate metadata (timing, counters, environment)
486/// 3. Serialize to structured JSON format
487/// 4. Write atomically to the specified file
488///
489/// # Error Handling
490///
491/// Cleanup continues even if individual steps fail. Errors are logged to stderr
492/// but do not prevent other cleanup operations from completing.
493///
494/// # Examples
495///
496/// Usually called automatically, but can be invoked manually:
497///
498/// ```c
499/// // In C code - force immediate cleanup and output
500/// extern void calltrace_cleanup(void);
501/// calltrace_cleanup();
502/// ```
503///
504/// # Performance Notes
505///
506/// - Thread-local counter flushing: O(number of threads)
507/// - JSON generation: O(total function calls)
508/// - File I/O: O(output file size)
509/// - Signal handler restoration: O(number of signals)
510#[no_mangle]
511pub extern "C" fn calltrace_cleanup() {
512    // Safely flush thread-local counters
513    if std::panic::catch_unwind(|| {
514        LOCAL_COUNTERS.with(|counters| counters.borrow_mut().flush());
515    })
516    .is_ok()
517    {
518        // Successfully flushed counters
519    }
520    // If flushing fails (during shutdown), continue cleanup anyway
521
522    // Generate final JSON output
523    if let Err(e) = write_final_output() {
524        eprintln!("CallTrace cleanup failed: {:?}", e);
525    }
526
527    // Cleanup crash handler (restore original signal handlers)
528    crash_handler::cleanup_crash_handler();
529}
530
531/// Initialize global state
532fn init_global_state() -> Result<(), Box<dyn std::error::Error>> {
533    // Initialize build validation first to ensure required compilation flags
534    if let Err(e) = build_validator::init_build_validation() {
535        eprintln!("CallTrace: Build validation initialization failed: {:?}", e);
536    }
537
538    // Initialize DWARF analyzer if argument capture is enabled
539    let config = CONFIG.lock().unwrap();
540
541    // Set atomic flags for fast path checking
542    TRACING_ENABLED.store(config.enabled, Ordering::Relaxed);
543    ARGUMENT_CAPTURE_ENABLED.store(config.capture_arguments, Ordering::Relaxed);
544
545    // Always initialize DWARF analyzer for symbol resolution (independent of argument capture)
546    let exe_path = std::env::current_exe()
547        .or_else(|_| std::fs::read_link("/proc/self/exe"))
548        .unwrap_or_else(|_| "/proc/self/exe".into());
549
550    if let Some(exe_path_str) = exe_path.to_str() {
551        match DwarfAnalyzer::new(exe_path_str) {
552            Ok(analyzer) => {
553                *DWARF_ANALYZER.lock().unwrap() = Some(analyzer);
554                if std::env::var("CALLTRACE_DEBUG").is_ok() {
555                    eprintln!("CallTrace: DWARF analyzer initialized for symbol resolution");
556                }
557            }
558            Err(e) => {
559                eprintln!("CallTrace: DWARF analyzer initialization failed: {:?}", e);
560                eprintln!("CallTrace: Continuing with limited symbol resolution");
561            }
562        }
563    }
564
565    // Initialize crash handler if no existing handlers are present
566    if let Err(e) = crash_handler::init_crash_handler() {
567        eprintln!("CallTrace: Crash handler initialization failed: {:?}", e);
568        eprintln!("CallTrace: Continuing without crash detection");
569    }
570
571    if std::env::var("CALLTRACE_DEBUG").is_ok() {
572        eprintln!("CallTrace: Library initialized successfully");
573    }
574    Ok(())
575}
576
577/// Get or intern a string to reduce memory allocations
578/// For commonly used strings like type names and function names
579#[inline]
580pub(crate) fn intern_string(s: &str) -> String {
581    // For short, common strings, use the intern pool
582    if s.len() <= 64
583        && (s.starts_with("int")
584            || s.starts_with("char")
585            || s.starts_with("float")
586            || s.starts_with("double")
587            || s.starts_with("void")
588            || s.starts_with("struct")
589            || s.starts_with("0x"))
590    {
591        if let Ok(mut pool) = STRING_INTERN_POOL.lock() {
592            if let Some(interned) = pool.get(s) {
593                return interned.clone();
594            } else {
595                let owned = s.to_string();
596                pool.insert(owned.clone());
597                return owned;
598            }
599        }
600    }
601    // For longer or uncommon strings, just allocate normally
602    s.to_string()
603}
604
605/// Get a pre-allocated argument buffer from the pool
606/// This reduces Vec allocations in the hot path
607#[inline]
608fn get_argument_buffer() -> Vec<register_reader::CapturedArgument> {
609    ARGUMENT_BUFFER_POOL.with(|pool| {
610        let mut pool_ref = pool.borrow_mut();
611        if let Some(mut buffer) = pool_ref.pop() {
612            buffer.clear();
613            buffer
614        } else {
615            Vec::with_capacity(16) // Pre-allocate for typical function argument count
616        }
617    })
618}
619
620/// Fast address formatting using pre-allocated buffer
621/// Optimized for the common "0x{:x}" pattern
622#[inline]
623pub(crate) fn format_address(addr: u64) -> String {
624    // During cleanup, TLS may be unavailable, so always use fallback
625    // This is safer than trying to catch panics in TLS access
626    format!("0x{:x}", addr)
627}
628
629/// Fast address formatting with prefix (e.g., "func_0x1234")
630#[inline]
631pub(crate) fn format_address_with_prefix(prefix: &str, addr: u64) -> String {
632    // During cleanup, TLS may be unavailable, so always use fallback
633    // This is safer than trying to catch panics in TLS access
634    format!("{}_0x{:x}", prefix, addr)
635}
636
637/// Common string constants to avoid allocations
638mod string_constants {
639    pub const NULL_ADDRESS: &str = "0x0";
640    pub const NULL_STRING: &str = "NULL";
641    pub const CAPTURE_FAILED: &str = "Capture failed";
642    pub const PTHREAD_CREATE: &str = "pthread_create";
643    pub const X86_64: &str = "x86_64";
644    pub const UNKNOWN_ERROR: &str = "Unknown error";
645}
646
647/// Get function information with caching for performance optimization
648#[inline]
649fn get_cached_function_info(func_addr: u64) -> Option<dwarf_analyzer::FunctionInfo> {
650    // Fast path: check cache first
651    if let Ok(cache) = FUNCTION_INFO_CACHE.lock() {
652        if let Some(cached_result) = cache.get(&func_addr) {
653            return cached_result.clone();
654        }
655    }
656
657    // Cache miss: query DWARF analyzer
658    let function_info = if let Some(ref mut analyzer) = DWARF_ANALYZER.lock().unwrap().as_mut() {
659        analyzer.get_function_info(func_addr).ok()
660    } else {
661        None
662    };
663
664    // Store result in cache (both hits and misses)
665    if let Ok(mut cache) = FUNCTION_INFO_CACHE.lock() {
666        cache.insert(func_addr, function_info.clone());
667
668        // Limit cache size to prevent memory bloat
669        const MAX_CACHE_SIZE: usize = 1024;
670        if cache.len() > MAX_CACHE_SIZE {
671            // Remove oldest entries (simple LRU approximation)
672            let keys_to_remove: Vec<u64> = cache.keys().take(MAX_CACHE_SIZE / 4).copied().collect();
673            for key in keys_to_remove {
674                cache.remove(&key);
675            }
676        }
677    }
678
679    function_info
680}
681
682/// Write final JSON output
683fn write_final_output() -> Result<(), Box<dyn std::error::Error>> {
684    let config = CONFIG.lock().unwrap().clone();
685
686    if let Some(output_file) = &config.output_file {
687        let generator = JsonOutputGenerator::new();
688        let trace_session = generator.generate_output(&CALL_TREE_MANAGER)?;
689
690        generator.write_to_file(&trace_session, output_file)?;
691        eprintln!("CallTrace: Output written to {}", output_file);
692    }
693
694    Ok(())
695}
696
697/// Generate default output filename based on executable name
698fn generate_default_output_filename() -> Option<String> {
699    // Try to get the executable path
700    let exe_path = std::env::current_exe()
701        .or_else(|_| std::fs::read_link("/proc/self/exe"))
702        .ok()?;
703
704    // Get the file stem (filename without extension)
705    let file_stem = exe_path.file_stem()?.to_str()?;
706
707    // Get the directory where the executable is located
708    let exe_dir = exe_path.parent()?;
709
710    // Create the output filename in the same directory as the executable
711    let output_path = exe_dir.join(format!("{}.json", file_stem));
712
713    output_path.to_str().map(str::to_string)
714}
715
716/// Get the base output filename (without extension) for crash reports
717pub(crate) fn get_base_output_filename() -> Option<String> {
718    // First try to get from configuration
719    if let Ok(config) = CONFIG.lock() {
720        if let Some(ref output_file) = config.output_file {
721            // Remove .json extension if present
722            if output_file.ends_with(".json") {
723                return Some(output_file[..output_file.len() - 5].to_string());
724            } else {
725                return Some(output_file.clone());
726            }
727        }
728    }
729
730    // Fallback: generate from executable name (without extension)
731    let exe_path = std::env::current_exe()
732        .or_else(|_| std::fs::read_link("/proc/self/exe"))
733        .ok()?;
734
735    let file_stem = exe_path.file_stem()?.to_str()?;
736    let exe_dir = exe_path.parent()?;
737    let base_path = exe_dir.join(file_stem);
738
739    base_path.to_str().map(str::to_string)
740}
741
742/// Handle function entry (called from cyg_profile)
743#[inline]
744pub(crate) fn handle_function_enter_internal(
745    func_address: *mut c_void,
746    call_site: *mut c_void,
747) -> Result<(), error::CallTraceError> {
748    // Record function hook call for build validation
749    build_validator::record_function_hook_call();
750
751    // Fast path: check if tracing is enabled without locking
752    if !TRACING_ENABLED.load(Ordering::Relaxed) {
753        return Ok(());
754    }
755
756    let func_addr = func_address as u64;
757    let call_site_addr = call_site as u64;
758
759    // Increment call counter using thread-local batching
760    let call_count =
761        LOCAL_COUNTERS.with(|counters| counters.borrow_mut().increment_function_calls());
762
763    // Ultra-fast path: if argument capture is disabled, skip expensive operations
764    let arg_capture_enabled = ARGUMENT_CAPTURE_ENABLED.load(Ordering::Relaxed);
765    if !arg_capture_enabled {
766        // Minimal processing for maximum performance
767        let _node_id = CALL_TREE_MANAGER.function_enter_fast_path(func_addr, call_site_addr)?;
768
769        // Only reinforce crash handlers occasionally to minimize overhead
770        if (call_count < 100 && call_count % 20 == 0) || call_count % 2000 == 0 {
771            let _ = crash_handler::reinforce_crash_handlers();
772        }
773
774        return Ok(());
775    }
776
777    // Full path: argument capture enabled - do expensive operations
778    // Re-install crash handlers only during early execution
779    let should_reinforce = if call_count < 100 {
780        call_count % 10 == 0 // Every 10 calls during startup
781    } else {
782        call_count % 1000 == 0 // Every 1000 calls after startup
783    };
784
785    if should_reinforce {
786        let _ = crash_handler::reinforce_crash_handlers();
787    }
788
789    // Get function information for argument capture
790    let function_info = get_cached_function_info(func_addr);
791
792    LOCAL_COUNTERS.with(|counters| counters.borrow_mut().increment_argument_captures());
793
794    let arguments = if let Some(ref func_info) = function_info {
795        // Capture arguments using register reader
796        let register_context = unsafe { register_reader::RegisterContext::capture().ok() };
797
798        if let Some(ref context) = register_context {
799            capture_function_arguments(func_info, context)
800        } else {
801            Vec::new()
802        }
803    } else {
804        Vec::new()
805    };
806
807    // Add to call tree with full context
808    let _node_id = CALL_TREE_MANAGER.function_enter(
809        func_addr,
810        call_site_addr,
811        function_info,
812        arguments,
813        None, // register_context is consumed in argument capture
814    )?;
815
816    Ok(())
817}
818
819/// Handle function exit (called from cyg_profile)
820#[inline]
821pub(crate) fn handle_function_exit_internal(
822    func_address: *mut c_void,
823    _call_site: *mut c_void,
824) -> Result<(), error::CallTraceError> {
825    // Fast path: check if tracing is enabled without locking
826    if !TRACING_ENABLED.load(Ordering::Relaxed) {
827        return Ok(());
828    }
829
830    let func_addr = func_address as u64;
831
832    // Ultra-fast path: if argument capture is disabled, skip expensive operations
833    if !ARGUMENT_CAPTURE_ENABLED.load(Ordering::Relaxed) {
834        // Minimal processing for maximum performance
835        CALL_TREE_MANAGER.function_exit_fast_path(func_addr)?;
836        return Ok(());
837    }
838
839    // Full path: capture return values when argument capture is enabled
840    let return_value = {
841        // Capture return value registers
842        let return_context = unsafe { register_reader::capture_return_values().ok() };
843
844        if let Some(ref context) = return_context {
845            // Get function information for return type
846            if let Some(func_info) = get_cached_function_info(func_addr) {
847                // Extract return value based on function return type
848                register_reader::extract_return_value(context, func_info.return_type.as_ref())
849            } else {
850                // No function info available, try to extract raw value
851                if context.return_valid {
852                    // Default to integer interpretation of RAX
853                    Some(register_reader::ArgumentValue::Integer(context.return_rax))
854                } else {
855                    None
856                }
857            }
858        } else {
859            None
860        }
861    };
862
863    // Handle function exit in call tree with return value
864    CALL_TREE_MANAGER.function_exit_with_return_value(func_addr, return_value)?;
865
866    Ok(())
867}
868
869/// Capture function arguments using register context and function info
870/// Optimized version with minimal allocations and object pooling
871#[inline]
872fn capture_function_arguments(
873    function_info: &dwarf_analyzer::FunctionInfo,
874    register_context: &register_reader::RegisterContext,
875) -> Vec<register_reader::CapturedArgument> {
876    let param_count = function_info.parameters.len();
877
878    // Early return for functions with no parameters
879    if param_count == 0 {
880        return Vec::new();
881    }
882
883    // Get a pre-allocated buffer from the pool
884    let mut arguments = get_argument_buffer();
885
886    // Ensure sufficient capacity
887    if arguments.capacity() < param_count {
888        arguments.reserve(param_count - arguments.capacity());
889    }
890
891    // Limit argument capture to reasonable number to prevent excessive overhead
892    const MAX_ARGS: usize = 16;
893    let max_args = std::cmp::min(param_count, MAX_ARGS);
894
895    for (i, param) in function_info.parameters.iter().enumerate().take(max_args) {
896        let location = register_reader::classify_argument(
897            &param.type_info.name,
898            param.type_info.size.unwrap_or(8) as usize,
899            i,
900        );
901
902        // Try enhanced extraction for complex types, fallback to basic for simple types
903        let value = if param.type_info.is_struct
904            || param.type_info.is_array
905            || (param.type_info.is_pointer && param.type_info.base_type.is_some())
906        {
907            register_reader::extract_argument_with_type_info(
908                register_context,
909                &location,
910                &param.type_info,
911                param,
912            )
913        } else {
914            // Fast path for basic types
915            register_reader::extract_argument(
916                register_context,
917                &location,
918                &param.type_info.name,
919                param.type_info.is_pointer,
920            )
921        };
922
923        let captured_arg = register_reader::CapturedArgument {
924            name: intern_string(&param.name),
925            type_name: intern_string(&param.type_info.name),
926            location,
927            value: match &value {
928                Ok(v) => v.clone(),
929                Err(_) => register_reader::ArgumentValue::Unknown {
930                    type_name: intern_string(&param.type_info.name),
931                    raw_data: Vec::new(),
932                    error: Some(string_constants::CAPTURE_FAILED.to_string()),
933                },
934            },
935            valid: value.is_ok(),
936            error: value.err().map(|e| format!("{:?}", e)),
937        };
938
939        arguments.push(captured_arg);
940    }
941
942    arguments
943}
944
945/// Library constructor - automatically called when loaded
946#[ctor::ctor]
947fn library_init() {
948    calltrace_init();
949}
950
951/// Library destructor - automatically called when unloaded
952#[ctor::dtor]
953fn library_cleanup() {
954    calltrace_cleanup();
955}