# Scripts Directory
This directory contains utility scripts for building, testing, and benchmarking the delaunay library.
**Note**: Tests for the Python utilities are located in `scripts/tests/` and can be run with `uv run pytest`.
## Prerequisites
Before running these scripts, ensure you have the following dependencies installed:
### Python 3.11+ (Required)
```bash
# Install Python 3.11+ and uv package manager
brew install python@3.11 uv # macOS with Homebrew
# or follow installation instructions for your platform
```
### Additional Dependencies
```bash
# macOS (using Homebrew)
brew install jq findutils coreutils
# Ubuntu/Debian
sudo apt-get install -y jq
```
## Scripts Overview
### Python Utilities (Primary)
All Python utilities require Python 3.11+ and support `--help` for detailed usage. The project uses modern Python with comprehensive utilities for
benchmarking, changelog management, and hardware detection.
**Available Commands**:
- `uv run benchmark-utils` - Performance baseline generation and comparison
- `uv run changelog-utils` - Enhanced changelog generation with AI commit processing and git tagging
- `uv run hardware-utils` - Cross-platform hardware information detection
- `uv run enhance-commits` - AI-powered commit message enhancement (internal utility)
#### `benchmark_utils.py` 🐍
**Purpose**: Complete benchmark parsing, baseline generation, and performance comparison utilities.
**Features**:
- **Criterion JSON Parsing**: Direct parsing of Criterion's estimates.json for accuracy
- **Baseline Generation**: `generate-baseline` command with git metadata
- **Performance Comparison**: `compare` command with regression detection (>7.5% default threshold)
- **Flexible Baseline Formats**: Handles standard and tag-specific baseline file naming patterns
- **Automatic File Conversion**: Converts tag-specific baselines to standard format for compatibility
- **Hardware Integration**: Automatic hardware info inclusion and comparison
- **Development Mode**: `--dev` flag for faster benchmarks (10x speedup)
- **Timezone-Aware Dating**: Proper timezone handling for timestamps
- **Modern Python**: Python 3.11+ with type hints and union syntax
**Commands**:
```bash
# Generate performance baseline
uv run benchmark-utils generate-baseline [--dev] [--output FILE]
# Compare against baseline
uv run benchmark-utils compare --baseline FILE [--dev] [--output FILE]
```
**Output Format**:
```bash
# Baseline file format:
=== 10 Points (2D) ===
Time: [354.30, 356.10, 357.91] μs
Throughput: [28.135, 28.257, 28.381] Kelem/s
# Comparison file format:
Current Time: [338.45, 340.12, 341.78] μs
Baseline Time: [336.95, 338.61, 340.26] μs
Time Change: [+0.45%, +0.45%, +0.45%]
✅ OK: Time change within acceptable range
```
**Dependencies**: Python 3.11+, `hardware_utils.py`
**Regression Testing Workflow Commands**:
```bash
# Prepare downloaded baseline artifact (handles tag-specific files)
uv run benchmark-utils prepare-baseline [--baseline-dir DIR]
# Extract commit SHA from baseline artifact
uv run benchmark-utils extract-baseline-commit [--baseline-dir DIR]
# Determine if benchmarks should be skipped based on changes
uv run benchmark-utils determine-skip --baseline-commit SHA --current-commit SHA
# Run performance regression test
uv run benchmark-utils run-regression-test --baseline FILE
# Display regression test results
uv run benchmark-utils display-results [--results FILE]
# Generate regression testing summary
uv run benchmark-utils regression-summary
```
**Baseline File Compatibility**:
- **Standard format**: `baseline_results.txt` (always supported)
- **Tag-specific format**: `baseline-vX.Y.Z.txt` (automatically converted to standard format)
- **Generic format**: `baseline*.txt` (fallback for any baseline file)
- **Metadata support**: Uses `metadata.json` when baseline files lack commit info
---
#### `changelog_utils.py` 🐍
**Purpose**: Comprehensive changelog management tool with AI commit processing and Keep a Changelog categorization.
**Features**:
- **Enhanced Changelog Generation**: Creates changelogs with commit dates instead of tag creation dates
- **AI Commit Processing**: Uses `enhance_commits.py` for intelligent commit categorization
- **Git Tag Management**: Creates git tags with changelog content as tag messages
- **Squashed PR Expansion**: Advanced parsing of squashed PR commits to extract detailed commit message bodies
- **Multi-format Support**: Handles various commit message formats and bullet styles
- **Cross-platform Compatibility**: Works consistently across Windows, macOS, and Linux
- **Comprehensive Error Handling**: Clear error messages and usage instructions
**Commands**:
```bash
# Generate enhanced changelog (default command)
uv run changelog-utils
uv run changelog-utils generate
# Generate changelog with debug output (keeps intermediate files)
uv run changelog-utils generate --debug
# Create git tag with changelog content as message
uv run changelog-utils tag vX.Y.Z
# Force recreate existing tag
uv run changelog-utils tag vX.Y.Z --force
```
**Enhanced Features**:
- **Accurate Dating**: Shows when development work was actually completed
- **Squashed PR Expansion**: Extracts bullet points and descriptions from squashed commits
- **AI Categorization**: Uses Keep a Changelog format (Added/Changed/Fixed/Removed/Deprecated/Security)
- **GitHub Integration**: Tag messages work with `gh release create --notes-from-tag`
**Dependencies**: Python 3.11+, `enhance_commits.py`, `subprocess_utils.py`, `git-cliff`
---
#### `hardware_utils.py` 🐍
**Purpose**: Cross-platform hardware information detection and comparison.
**Features**:
- **Cross-platform**: macOS, Linux, Windows detection
- **Hardware Detection**: CPU (model, cores, threads), memory, Rust toolchain
- **Output Formats**: Formatted display (`info`), key=value pairs (`kv`), or JSON format
- **Baseline Comparison**: Hardware compatibility warnings
- **Modern Architecture Support**: Enhanced ARM/heterogeneous core detection
**Commands**:
```bash
# Display formatted hardware information
uv run hardware-utils info
# Display as key=value pairs
uv run hardware-utils kv
# Display as JSON
uv run hardware-utils info --json
# Compare with baseline file
uv run hardware-utils compare --baseline-file FILE
```
**Output Format**:
```bash
# Formatted output:
Hardware Information:
OS: macOS
CPU: Apple M2 Pro
CPU Cores: 10
CPU Threads: 10
Memory: 16.0 GB
Rust: rustc 1.89.0
Target: aarch64-apple-darwin
```
**Dependencies**: Python 3.11+, `subprocess_utils.py`, system tools (`sysctl`, `lscpu`, PowerShell)
---
#### `enhance_commits.py` 🐍
**Purpose**: AI-powered commit message enhancement with Keep a Changelog categorization.
**Features**:
- **Keep a Changelog Format**: Categorizes commits as Added/Changed/Fixed/Removed/Deprecated/Security
- **Pattern Matching**: Advanced regex patterns for accurate categorization
- **Markdown Processing**: Handles markdown formatting and line wrapping
- **Internal Utility**: Used by `changelog_utils.py` for AI-enhanced changelog generation
**Usage**: This is an internal utility called by `changelog-utils`. Not typically used directly.
**Dependencies**: Python 3.11+
---
#### `subprocess_utils.py` 🐍
**Purpose**: Secure subprocess utilities for all Python scripts providing security-hardened subprocess execution.
**Features**:
- **Secure Execution**: Uses full executable paths instead of command names
- **Executable Validation**: Validates executables exist before running
- **Consistent Error Handling**: Standardized error handling across all utilities
- **Security Mitigation**: Addresses security vulnerabilities flagged by static analysis
- **Git Integration**: Convenient wrappers for common git operations
**Key Functions**:
- `get_safe_executable(command)` - Get validated full path to executable
- `run_safe_command(command, args, **kwargs)` - Secure subprocess execution
- `run_git_command(args, **kwargs)` - Git-specific secure execution
- `run_cargo_command(args, **kwargs)` - Cargo-specific secure execution
- `check_git_repo()` - Validate git repository
- `check_git_history()` - Validate git history exists
**Usage**: This is a shared library used by all other Python utilities. Not typically used directly.
**Dependencies**: Python 3.11+ standard library
---
#### `compare_storage_backends.py` 🐍
**Purpose**: Compare SlotMap vs DenseSlotMap storage backend performance for Phase 4 evaluation.
**Features**:
- **Automated Comparison**: Runs benchmarks with both backends and generates detailed reports
- **Criterion Integration**: Parses Criterion output (JSON and text) for robust comparison
- **Performance Metrics**: Analyzes construction time, iteration speed, query performance, and validation overhead
- **Memory Tracking**: Reports RSS memory usage internally during benchmarks
- **Summary Statistics**: Calculates average, best-case, and worst-case performance differences
- **Development Mode**: Fast iteration with reduced scale (`--dev` flag)
- **Markdown Reports**: Professional comparison reports with tables and recommendations
**Commands**:
```bash
# Run comparison with default settings
uv run compare-storage-backends
# Quick comparison (development mode)
uv run compare-storage-backends --dev
# Custom output location
uv run compare-storage-backends --output artifacts/storage_comparison.md
# Specify benchmark to run
uv run compare-storage-backends --bench large_scale_performance
# Filter specific benchmarks
uv run compare-storage-backends --filter "construction/3D"
```
**Report Contents**:
- Performance comparison table with percentage differences
- Summary statistics (average, best/worst case)
- Recommendations based on results
- Reproduction instructions
**Dependencies**: Python 3.11+, `subprocess_utils.py`
---
### Shell Scripts (Specialized)
#### `slurm_storage_comparison.sh`
**Purpose**: Slurm HPC script for comprehensive SlotMap vs DenseSlotMap storage backend comparison.
This script benchmarks the library's two storage backend options (SlotMap and DenseSlotMap) on high-performance computing clusters using the Slurm
workload manager. It runs the `large_scale_performance` benchmark suite with each backend and generates detailed comparison reports.
**Features**:
- **Automated 3-phase execution**: SlotMap benchmarks → DenseSlotMap benchmarks → Analysis
- **Dual submission modes**: Self-submitting with `sbatch` or direct execution within Slurm job
- **Baseline saving**: Uses `--save-baseline` for precise Criterion comparisons with `critcmp`
- **Smart timeout management**: Automatically calculates per-phase timeouts from Slurm time limit
- **Build isolation**: Uses node-local scratch (`$SLURM_TMPDIR`) for fast compilation
- **Baseline preservation**: Backs up SlotMap results before `cargo clean` to enable comparison
- **Progress tracking**: Detailed timing and status for each phase
- **Artifact archiving**: Packages all results in timestamped tarball with merged baselines
- **critcmp integration**: Automatic detailed comparison if available
- **Error handling**: Per-phase timeout protection with status tracking
**Usage**:
```bash
# Standard comparison (4D with 1K, 3K points)
./scripts/slurm_storage_comparison.sh
sbatch scripts/slurm_storage_comparison.sh
# Large-scale comparison (4D with 1K, 5K, 10K points)
./scripts/slurm_storage_comparison.sh --large
# Custom time limit (default: 3 days)
./scripts/slurm_storage_comparison.sh --time=7-00:00:00
# Use specific partition (default: med2)
./scripts/slurm_storage_comparison.sh --partition=high2
# Use specific account (default: adamgrp)
./scripts/slurm_storage_comparison.sh --account=myaccount
# Large-scale with extended time (recommended for completion)
./scripts/slurm_storage_comparison.sh --large --time=14-00:00:00
# Combine multiple options
./scripts/slurm_storage_comparison.sh --partition=high2 --account=myaccount --large --time=7-00:00:00
# Help information
./scripts/slurm_storage_comparison.sh --help
```
**Submission Modes**:
1. **Self-submission** (no `SLURM_JOB_ID`): Submits itself to Slurm with specified options
2. **Direct execution** (inside Slurm job): Runs the benchmark workflow
The script automatically detects which mode to use, making it easy to submit jobs without writing separate submission scripts.
**Command-line Options**:
- `--large`: Enable large-scale benchmarks (4D with 1K, 5K, 10K points)
- `--time=DURATION`: Custom Slurm time limit (format: D-HH:MM:SS, default: 3-00:00:00)
- `--partition=NAME`: Slurm partition/queue to use (default: med2)
- `--account=NAME`: Slurm account/allocation to use (default: adamgrp)
- `--help, -h`: Show detailed help information
**Benchmark Scale**:
- **Standard** (default): 4D triangulations use [1K, 3K] points (~2-3h per backend, ~6h total)
- **Large** (`--large`): 4D triangulations use [1K, 5K, 10K] points (~4-6h per backend, ~12h total)
The `--large` flag sets `BENCH_LARGE_SCALE=1`, which is read by the benchmark suite to enable larger point counts for more comprehensive performance testing.
**Time Management**:
```bash
# Automatic timeout calculation from Slurm time limit:
# - Reserves 2 hours for cleanup/buffer
# - Splits remaining time equally between Phase 1 (SlotMap) and Phase 2 (DenseSlotMap)
# - Example: 3-day limit → ~34h per phase
# View calculated timeouts in job output:
squeue -j <job-id>
tail -f slurm-<job-id>-storage-comparison.out
```
**Prerequisites**:
- Slurm workload manager
- Rust toolchain (rustup) - loads `rust/1.93.1` module if available
- uv package manager
- GNU coreutils (`timeout` command)
- critcmp (optional but recommended): `cargo install critcmp`
**Cluster Configuration**:
You can configure Slurm options in two ways:
1. **Command-line arguments** (recommended):
```bash
./scripts/slurm_storage_comparison.sh --account=your_account --partition=your_partition --time=7-00:00:00
```
2. **Edit script header** (for permanent defaults):
```bash
#SBATCH --account=your_account # Billing account (default: adamgrp)
#SBATCH --partition=your_partition # Compute partition (default: med2)
#SBATCH --time=3-00:00:00 # Job time limit (3 days default)
#SBATCH --cpus-per-task=8 # CPU cores
#SBATCH --mem=32G # Memory allocation
```
Command-line arguments override the script header defaults.
**Output Files**:
```text
artifacts/
├── storage_comparison_<job-id>_<timestamp>.md # Main comparison report
├── storage-comparison-<job-id>/ # Full archive directory
│ ├── criterion/ # Merged Criterion reports
│ │ ├── <benchmark>/ # Per-benchmark results
│ │ │ ├── slotmap/ # SlotMap baseline
│ │ │ ├── denseslotmap/ # DenseSlotMap baseline
│ │ │ └── report/ # HTML reports
│ └── report.md # Report copy
└── storage-comparison-<job-id>.tar.gz # Compressed archive
slurm-<job-id>-storage-comparison.out # Job stdout (progress log)
slurm-<job-id>-storage-comparison.err # Job stderr (errors/warnings)
```
**Analysis Workflow**:
```bash
# 1. Monitor job progress
squeue -j <job-id> # Check job status
tail -f slurm-<job-id>-storage-comparison.out # Live progress
# 2. After completion, view report on cluster
cat artifacts/storage_comparison_<job-id>_<timestamp>.md
# 3. Use critcmp for detailed comparison (if installed)
critcmp slotmap denseslotmap
# 4. Download results for local analysis
scp cluster:/path/to/artifacts/storage-comparison-<job-id>.tar.gz .
tar -xzf storage-comparison-<job-id>.tar.gz
cd storage-comparison-<job-id>
# 5. View HTML reports in browser
open criterion/*/report/index.html # macOS
xdg-open criterion/*/report/index.html # Linux
# 6. Compare with critcmp locally (requires criterion directory)
critcmp slotmap denseslotmap
```
**Understanding Results**:
The comparison report includes:
- **Job metadata**: Job ID, node, mode, duration for each phase
- **Baseline locations**: Paths to saved Criterion baselines
- **critcmp output**: Detailed performance comparison (if available)
- **Status tracking**: Success/timeout/failure for each phase
Criterion baselines are saved as:
- `denseslotmap`: Default DenseSlotMap backend (enabled by default)
- `slotmap`: SlotMap backend (run with `--no-default-features`)
These can be compared using `critcmp slotmap denseslotmap` or Criterion's CLI tools.
**Common Issues**:
1. **Module loading failures**: Script continues with PATH-based Rust/Python if modules unavailable
2. **NFS .nfs* files**: `cargo clean` warnings are expected on shared filesystems, script continues
3. **Timeout before completion**: Increase time limit with `--time=` or use standard mode instead of `--large`
4. **Missing critcmp**: Install with `cargo install critcmp` for detailed comparison output
**Environment Variables**:
- `BENCH_LARGE_SCALE=1`: Automatically set when using `--large` flag
- `CARGO_TARGET_DIR`: Set to node-local scratch for faster builds
- `CARGO_UPDATE_IN_JOB=1`: Optional, runs `cargo update` before benchmarks (default: skip)
- `PROJECT_DIR`: Project root directory (default: current directory)
**Related**: See [Issue #74](https://github.com/acgetchell/delaunay/issues/74) for Phase 4 storage backend evaluation.
**Dependencies**: Slurm, Rust toolchain, cargo, uv, GNU coreutils, optional critcmp
---
#### `run_all_examples.sh`
**Purpose**: Executes all example programs in the project to verify functionality.
**Features**:
- Automatically discovers all examples in the `examples/` directory
- Runs examples in release mode for representative performance
- Creates results directory structure
**Usage**:
```bash
./scripts/run_all_examples.sh
```
**Dependencies**: Requires `cargo`, `find`, `sort` (GNU sort preferred but not required)
---
#### Git Tagging from Changelog (Python-based)
**Purpose**: Creates git tags with changelog content as tag messages for seamless GitHub release integration.
**Modern Implementation**: Uses Python utilities instead of shell scripts for better cross-platform compatibility and maintainability.
**Usage**:
```bash
# Create new tag with changelog content
uv run changelog-utils tag vX.Y.Z
# Force recreate existing tag
uv run changelog-utils tag vX.Y.Z --force
# Show help information
uv run changelog-utils tag --help
```
**Features**:
- **Automatic changelog extraction**: Parses CHANGELOG.md to find version-specific content
- **Multiple version formats**: Supports `## [X.Y.Z]`, `## vX.Y.Z`, and `## X.Y.Z` headers
- **GitHub release integration**: Tag messages work with `gh release create --notes-from-tag`
- **Safety checks**: Validates git repository, changelog existence, and version format
- **Force recreation**: Option to recreate existing tags with `--force` flag
- **Smart content extraction**: Removes headers and cleans whitespace automatically
- **Preview functionality**: Shows tag message preview before creation
- **Comprehensive error handling**: Clear error messages and usage instructions
- **Cross-platform compatibility**: Works consistently across Windows, macOS, and Linux
**Integration with GitHub Releases**:
```bash
# Workflow for GitHub releases:
1. Create tag with changelog content:
uv run changelog-utils tag vX.Y.Z
2. Push tag to remote:
git push origin vX.Y.Z
3. Create GitHub release using tag message:
gh release create vX.Y.Z --notes-from-tag
```
**Advanced Usage**:
```bash
# The changelog-utils tool also supports changelog generation:
uv run changelog-utils generate # Generate enhanced changelog
uv run changelog-utils generate --debug # Keep intermediate files for debugging
```
**Dependencies**: Requires Python 3.11+, `uv`, `git`, and access to CHANGELOG.md
---
## Workflow Examples
### Tag Baselines (CI)
Baselines used by CI are generated and stored as GitHub Actions artifacts (not committed to the repo).
```bash
# Create and push a version tag
git tag vX.Y.Z
git push origin vX.Y.Z
# This triggers `.github/workflows/generate-baseline.yml` and uploads an artifact named:
# performance-baseline-vX_Y_Z (for tag vX.Y.Z; dots replaced with underscores)
# containing:
# baseline_results.txt
# metadata.json
```
You can fetch (and optionally regenerate missing/expired) baselines locally:
```bash
# Download the baseline artifact for a tag into baseline-artifacts/<tag>/
uv run benchmark-utils fetch-baseline --tag vX.Y.Z
# If the artifact is missing/expired, dispatch generate-baseline.yml and wait for it
uv run benchmark-utils fetch-baseline --tag vX.Y.Z --regenerate-missing
```
Compare any two tags locally without re-running benchmarks:
```bash
uv run benchmark-utils compare-tags --old-tag vX.Y.Z --new-tag vA.B.C
```
### Performance Regression Testing (Development)
```bash
# 1. Make code changes
# ... your modifications ...
# 2. Test for performance regressions
uv run benchmark-utils compare --baseline baseline-artifact/baseline_results.txt
# 3. Review results in benches/compare_results.txt
# 4. If regressions are acceptable, update the local baseline:
uv run benchmark-utils generate-baseline
```
### Fast Development Workflow (Development Mode)
```bash
# Quick iteration during development using --dev flag
# (Reduces benchmark time from ~10 minutes to ~30 seconds)
# 1. Make code changes
# ... your modifications ...
# 2. Quick performance check
uv run benchmark-utils compare --baseline baseline-artifact/baseline_results.txt --dev
# 3. If major changes needed, generate new dev baseline:
uv run benchmark-utils generate-baseline --dev
# 4. Final validation with full benchmarks before commit:
uv run benchmark-utils generate-baseline # Full baseline
uv run benchmark-utils compare --baseline baseline-artifact/baseline_results.txt # Full comparison
```
**Development Mode Benefits**:
- **10x faster**: Reduces sample size and measurement time
- **Quick feedback**: Ideal for iterative development
- **Same accuracy**: Still detects significant performance changes
- **Settings**: `sample_size=10, measurement_time=2s, warmup_time=1s`
### Changelog Generation Workflow
```bash
# 1. Make commits and create git tags
git tag vX.Y.Z
git push origin vX.Y.Z
# 2. Generate updated changelog with accurate commit dates and AI enhancement
uv run changelog-utils generate
# 3. Review and commit the updated changelog
git add CHANGELOG.md
git commit -m "Update changelog with AI enhancement for vX.Y.Z"
git push origin main
```
### Git Tagging from Changelog
```bash
# Create new tag with changelog content for GitHub releases
uv run changelog-utils tag vX.Y.Z
# Force recreate existing tag
uv run changelog-utils tag vX.Y.Z --force
# Push tag and create GitHub release
git push origin vX.Y.Z
gh release create vX.Y.Z --notes-from-tag
```
**Benefits of Using changelog-utils**:
- **Accurate Dating**: Shows when development work was actually completed
- **AI Enhancement**: Categorizes commits using Keep a Changelog format
- **Squashed PR Expansion**: Extracts detailed information from squashed commits
- **Professional Presentation**: Avoids all releases showing the same tag creation date
- **GitHub Integration**: Seamless integration with GitHub releases
### Manual Benchmark Analysis
```bash
# 1. Run benchmarks directly (CI performance suite)
cargo bench --bench ci_performance_suite
# 2. Generate new baseline
uv run benchmark-utils generate-baseline
# 3. Compare against previous baseline
uv run benchmark-utils compare --baseline baseline-artifact/baseline_results.txt
```
**CI Performance Suite**: The benchmark utilities now use `benches/ci_performance_suite.rs` for CI/CD-optimized performance testing:
- **Dimensions**: 2D, 3D, 4D, and 5D triangulations.
- **Point counts**: [10, 25, 50].
- **Runtime**: ~5–10 minutes.
- **Coverage**: Core triangulation performance across all supported dimensions.
**Migration Notes**:
- The CI performance suite now includes 2D triangulations for comprehensive coverage
- Existing baselines remain compatible as the CI suite maintains the same benchmark format
- Development workflow unchanged - use `--dev` flag for fast iteration
### Continuous Integration
The repository includes automated performance regression testing via GitHub Actions:
#### Automated baseline generation
- **Workflow file**: `.github/workflows/generate-baseline.yml`
- **Trigger**: Automatic on git tag creation
- **Artifacts**: Creates performance baseline artifacts for download
- **Integration**: Baselines are automatically available for benchmark comparisons
#### Separate Benchmark Workflow
- **Workflow file**: `.github/workflows/benchmarks.yml`
- **Trigger conditions**:
- Manual trigger (`workflow_dispatch`)
- Pushes to `main` branch affecting performance-critical files
- Changes to `src/`, `benches/`, `Cargo.toml`, `Cargo.lock`
#### CI Behavior
```bash
# If baseline exists:
# 1. Finds the latest semver tag baseline artifact (performance-baseline-vX_Y_Z) from generate-baseline.yml runs
# 2. Downloads and normalizes it to baseline-artifact/baseline_results.txt
# 3. Runs uv run benchmark-utils compare --baseline baseline-artifact/baseline_results.txt
# 4. Uploads comparison results (benches/compare_results.txt) as artifacts
# If no baseline exists:
# 1. Logs instructions for creating a baseline
# 2. Skips regression testing (does not fail CI)
# 3. Suggests creating a git tag to generate a baseline automatically
```
#### CI Integration Benefits
- **Automated baseline management**: No manual baseline commits needed
- **Stable artifact format**: Baseline artifacts contain `baseline_results.txt` (plus `metadata.json`)
- **Automatic normalization**: Ensures CI always uses `baseline-artifact/baseline_results.txt`
- **Separate from main CI**: Avoids slowing down regular development workflow
- **Environment consistency**: Uses macOS runners (Apple Silicon) for reproducible benchmark comparisons
- **Smart triggering**: Only runs on changes that could affect performance
- **Graceful degradation**: Skips if baseline missing, with clear setup instructions
- **Artifact collection**: Stores benchmark results for historical analysis
## Error Handling and Troubleshooting
### Common Issues
1. **Missing Dependencies**: Install required packages using your system's package manager
2. **Permission Errors**: Ensure scripts are executable with `chmod +x scripts/*.sh`
3. **Path Issues**: Run scripts from the project root directory
4. **Missing Baseline**: Create a git tag to automatically generate baseline via CI, or run `uv run benchmark-utils generate-baseline` locally
5. **Python Version**: Ensure Python 3.11+ is installed and available
6. **Baseline Format Issues**: CI expects a baseline artifact that contains `baseline_results.txt` (plus optional metadata).
### Exit Codes
- `0` - Success
- `1` - General error
- `2` - Missing dependency
- `3` - File/directory not found
### Debug Mode
```bash
# For Python scripts, use built-in help and verbose options
uv run benchmark-utils --help
uv run changelog-utils --help
uv run hardware-utils --help
# For changelog generation with debug output
uv run changelog-utils generate --debug
# For shell scripts
bash -x ./scripts/run_all_examples.sh
```
## Development Integration
### CI/CD Integration
The scripts are fully integrated with GitHub Actions workflows:
- **`generate-baseline.yml`**: Automatically generates performance baselines on git tag creation
- **`benchmarks.yml`**: Runs performance regression testing on relevant changes
- **`ci.yml`**: Includes Python code quality checks for all utilities
### Code Quality
All Python scripts are automatically checked in CI:
```bash
# Format Python code
uvx ruff format scripts/
# Lint and auto-fix Python code
uvx ruff check --fix scripts/
# Run tests
uv run pytest
```
### Module Organization
- **`subprocess_utils.py`**: Shared security-hardened subprocess utilities
- **`benchmark_utils.py`**: Standalone benchmarking functionality
- **`changelog_utils.py`**: Shared changelog operations and utilities
- **`hardware_utils.py`**: Standalone hardware detection functionality
- **`enhance_commits.py`**: AI-powered commit categorization (used by changelog_utils)
This modular design ensures code reuse and maintainability across all utilities.
## Script Maintenance
All scripts follow consistent patterns:
### Python Scripts
- **Modern Python**: Python 3.11+ with type hints and union syntax
- **Security**: Uses `subprocess_utils.py` for secure subprocess execution
- **Error Handling**: Custom exception classes with clear error messages
- **Configuration**: Uses `pyproject.toml` for dependencies and tool configuration
- **Code Quality**: Comprehensive linting with ruff and formatting standards
### Shell Scripts
- **Error Handling**: Strict mode with `set -euo pipefail`
- **Dependency Checking**: Validation of required commands
- **Usage Information**: Help text with `--help` flag
- **Project Root Detection**: Automatic detection of project directory
- **Error Messages**: Descriptive error output to stderr
When modifying scripts, maintain these patterns for consistency and reliability.