# Python Runtime Profiling with py-spy
This profiler integrates py-spy to collect runtime execution data and convert it to a common format.
## How It Works
1. **Execution**: The profiler runs your Python script using py-spy in record mode
2. **Data Collection**: py-spy samples the call stack at regular intervals (default: 100Hz)
3. **Format Conversion**: Speedscope JSON output is parsed and converted to our common format
4. **Analysis**: Function execution counts and hot paths are identified
## Data Collected
### Common Format Fields
- `total_samples`: Total number of stack samples collected
- `execution_count`: HashMap of function names to execution counts
- `hot_functions`: Top 10 functions by execution time (sorted)
- Function execution percentages
### py-spy Speedscope Format
The tool uses py-spy's speedscope output format which provides:
- Stack frame information
- Sample weights
- Function names and line numbers
- Time-series execution data
## Example Output
```
=== Profiling Results for Python ===
File Size: 829 bytes
Lines of Code: 43
Functions: 5
Classes: 1
Imports: 2
Complexity Score: 15
Details:
- Detected 5 function definitions
- Detected 1 class definitions
- Detected 2 import statements
=== Runtime Profile (py-spy) ===
Total samples collected: 1245
Unique functions executed: 23
Top 10 Hot Functions:
1. compute_heavy - 892 samples (71.65%)
2. fibonacci - 245 samples (19.68%)
3. main - 75 samples (6.02%)
4. Calculator.multiply - 18 samples (1.45%)
5. Calculator.add - 15 samples (1.20%)
...
```
## Coverage-like Metrics
While py-spy is primarily a profiler (not a coverage tool), the execution counts provide coverage-like insights:
- **Execution frequency**: How many times each function appeared in samples
- **Hot paths**: Which code paths consume the most CPU time
- **Function usage**: Which functions were actually executed vs. just defined
## Limitations
1. **Not true code coverage**: py-spy samples running code; it won't detect unexecuted code
2. **CPU-bound focus**: Best for CPU-intensive workloads; I/O-bound code may not sample well
3. **Requires execution**: Script must actually run (unlike static analysis)
4. **Root privileges**: Some systems require elevated privileges for py-spy
## Alternative: True Code Coverage
For true code coverage (line-by-line execution tracking), consider integrating:
- **coverage.py**: Python's standard coverage tool
```python
import coverage
cov = coverage.Coverage()
cov.start()
# ... run code ...
cov.stop()
cov.json_report(outfile='coverage.json')
```
- **pytest-cov**: pytest integration for coverage
```bash
pytest --cov=mymodule --cov-report=json
```
## Future Enhancements
Potential improvements to the Python profiler:
1. Add coverage.py integration for true line-by-line coverage
2. Support multiple output formats (JSON, HTML, etc.)
3. Add memory profiling with memory_profiler
4. Line-level execution counts
5. Branch coverage analysis
6. Integration with pytest for test coverage