# benchkit
[](https://docs.rs/benchkit)
[](https://discord.gg/m3YfbXpUUY)
**Practical, Documentation-First Benchmarking for Rust.**
`benchkit` is a lightweight toolkit for performance analysis, born from the hard-learned lessons of optimizing high-performance libraries. It rejects rigid, all-or-nothing frameworks in favor of flexible, composable tools that integrate seamlessly into your existing workflow.
> 🎯 **NEW TO benchkit?** Start with [`usage.md`](usage.md) - Mandatory standards and requirements from production systems.
## The Benchmarking Dilemma
In Rust, developers often face a frustrating choice:
1. **The Heavy Framework (`criterion`):** Statistically powerful, but forces a rigid structure (`benches/`), complex setup, and produces reports that are difficult to integrate into your project's documentation. You must adapt your project to the framework.
2. **The Manual Approach (`std::time`):** Simple to start, but statistically naive. It leads to boilerplate, inconsistent measurements, and conclusions that are easily skewed by system noise.
`benchkit` offers a third way.
> **📋 Important**: For production use and development contributions, see [`usage.md`](usage.md) - mandatory standards with proven patterns, requirements, and compliance standards from production systems.
## A Toolkit, Not a Framework
This is the core philosophy of `benchkit`. It doesn't impose a workflow; it provides a set of professional, composable tools that you can use however you see fit.
* ✅ **Integrate Anywhere:** Write benchmarks in your test files, examples, or binaries. No required directory structure.
* ✅ **Documentation-First:** Treat performance reports as a first-class part of your documentation, with tools to automatically keep them in sync with your code.
* ✅ **Practical Focus:** Surface the key metrics needed for optimization decisions, hiding deep statistical complexity until you ask for it.
* ✅ **Zero Setup:** Start measuring performance in minutes with a simple, intuitive API.
---
## 🚀 Quick Start: Compare, Analyze, and Document
**📖 First time?** Review [`usage.md`](usage.md) for mandatory compliance standards and development requirements.
This example demonstrates the core `benchkit` workflow: comparing two algorithms and automatically updating a performance section in your `readme.md`.
**1. Add to `dev-dependencies` in `Cargo.toml`:**
```toml
[dev-dependencies]
benchkit = { version = "0.8.0", features = [ "full" ] }
```
**2. Create a benchmark in your `benches` directory:**
```rust
// In benches/performance_demo.rs
#![ cfg( feature = "enabled" ) ]
use benchkit::prelude::*;
fn generate_data( size : usize ) -> Vec< u32 >
{
#[ test ]
fn update_readme_performance_docs()
{
let mut comparison = ComparativeAnalysis::new( "Sorting Algorithms" );
let data = generate_data( 1000 );
// Benchmark the first algorithm
comparison = comparison.algorithm
(
"std_stable_sort",
{
let mut d = data.clone();
move ||
{
d.sort();
}
}
);
// Benchmark the second algorithm
comparison = comparison.algorithm
(
"std_unstable_sort",
{
let mut d = data.clone();
move ||
{
d.sort_unstable();
}
}
);
// Run the comparison and update readme.md
let report = comparison.run();
let markdown = report.to_markdown();
let updater = MarkdownUpdater::new( "readme.md", "Benchmark Results" ).unwrap();
updater.update_section( &markdown ).unwrap();
}
```
**3. Run your benchmark and watch readme.md update automatically:**
```bash
cargo run --bin performance_demo --features enabled
```
---
## 🧰 What's in the Toolkit?
`benchkit` provides a suite of composable tools. Use only what you need.
### 🆕 Enhanced Features
<details>
<summary><strong>🔥 NEW: Comprehensive Regression Analysis System</strong></summary>
Advanced performance regression detection with statistical analysis and trend identification.
```rust
use benchkit::prelude::*;
use std::collections::HashMap;
use std::time::{ Duration, SystemTime };
fn regression_analysis_example() -> Result< (), Box< dyn std::error::Error > >
{
// Current benchmark results
let mut current_results = HashMap::new();
let current_times = vec![ Duration::from_micros( 85 ), Duration::from_micros( 88 ), Duration::from_micros( 82 ) ];
current_results.insert( "fast_sort".to_string(), BenchmarkResult::new( "fast_sort", current_times ) );
// Historical baseline data
let mut baseline_data = HashMap::new();
let baseline_times = vec![ Duration::from_micros( 110 ), Duration::from_micros( 115 ), Duration::from_micros( 108 ) ];
baseline_data.insert( "fast_sort".to_string(), BenchmarkResult::new( "fast_sort", baseline_times ) );
let historical = HistoricalResults::new().with_baseline( baseline_data );
// Configure regression analyzer
let analyzer = RegressionAnalyzer::new()
.with_baseline_strategy( BaselineStrategy::FixedBaseline )
.with_significance_threshold( 0.05 ) // 5% significance level
.with_trend_window( 5 );
// Perform regression analysis
let regression_report = analyzer.analyze( ¤t_results, &historical );
// Check results
if regression_report.has_significant_changes() {
println!( "📊 Significant performance changes detected!" );
if let Some( trend ) = regression_report.get_trend_for( "fast_sort" ) {
match trend {
PerformanceTrend::Improving => println!( "🟢 Performance improved!" ),
PerformanceTrend::Degrading => println!( "🔴 Performance regression detected!" ),
PerformanceTrend::Stable => println!( "🟡 Performance remains stable" ),
}
}
// Generate professional markdown report
let markdown_report = regression_report.format_markdown();
println!( "{}", markdown_report );
}
Ok(())
}
```
**Key Features:**
- **Three Baseline Strategies**: Fixed baseline, rolling average, and previous run comparison
- **Statistical Significance**: Configurable thresholds with proper statistical testing
- **Trend Detection**: Automatic identification of improving, degrading, or stable performance
- **Professional Reports**: Publication-quality markdown with statistical analysis
- **CI/CD Integration**: Automated regression detection for deployment pipelines
- **Historical Data Management**: Long-term performance tracking with quality validation
**Use Cases:**
- Automated performance regression detection in CI/CD pipelines
- Long-term performance monitoring and trend analysis
- Code optimization validation with statistical confidence
- Production deployment gates with zero-regression tolerance
- Performance documentation with automated updates
</details>
<details>
<summary><strong>Safe Update Chain Pattern - Atomic Documentation Updates</strong></summary>
Coordinate multiple markdown section updates atomically - either all succeed or none are modified.
```rust
use benchkit::prelude::*;
fn update_markdown_atomically() -> Result< (), Box< dyn std::error::Error > >
{
let performance_markdown = "## Performance Results\n\nFast!";
let memory_markdown = "## Memory Usage\n\nLow!";
let cpu_markdown = "## CPU Usage\n\nOptimal!";
// Update multiple sections atomically
let chain = MarkdownUpdateChain::new("readme.md")?
.add_section("Performance Benchmarks", performance_markdown)
.add_section("Memory Analysis", memory_markdown)
.add_section("CPU Profiling", cpu_markdown);
// Validate all sections before any updates
let conflicts = chain.check_all_conflicts()?;
if !conflicts.is_empty() {
return Err(format!("Section conflicts detected: {:?}", conflicts).into());
}
// Atomic update - either all succeed or all fail
chain.execute()?;
Ok(())
}
```
**Key Features:**
- **Atomic Operations**: Either all sections update successfully or none are modified
- **Conflict Detection**: Validates all sections exist and are unambiguous before any changes
- **Automatic Rollback**: Failed operations restore original file state
- **Reduced I/O**: Single read and write operation instead of multiple file accesses
- **Error Recovery**: Comprehensive error handling with detailed diagnostics
**Use Cases:**
- Multi-section benchmark reports that must stay synchronized
- CI/CD pipelines requiring consistent documentation updates
- Coordinated updates across large documentation projects
- Production deployments where partial updates would be problematic
**Advanced Example:**
```rust
use benchkit::prelude::*;
fn complex_update_example() -> Result< (), Box< dyn std::error::Error > >
{
let performance_report = "Performance analysis results";
let memory_report = "Memory usage analysis";
let comparison_report = "Algorithm comparison data";
let validation_report = "Quality assessment report";
// Complex coordinated update across multiple report types
let chain = MarkdownUpdateChain::new("PROJECT_BENCHMARKS.md")?
.add_section("Performance Analysis", performance_report)
.add_section("Memory Usage Analysis", memory_report)
.add_section("Algorithm Comparison", comparison_report)
.add_section("Quality Assessment", validation_report);
// Validate everything before committing any changes
match chain.check_all_conflicts() {
Ok(conflicts) if conflicts.is_empty() => {
println!("✅ All {} sections validated", chain.len());
chain.execute()?;
},
Ok(conflicts) => {
eprintln!("⚠️ Conflicts: {:?}", conflicts);
// Handle conflicts or use more specific section names
},
Err(e) => eprintln!("❌ Validation failed: {}", e),
}
Ok(())
}
```
</details>
<details>
<summary><strong>Professional Report Templates - Research-Grade Documentation</strong></summary>
Generate standardized, publication-quality reports with full statistical analysis and customizable sections.
```rust
use benchkit::prelude::*;
use std::collections::HashMap;
fn generate_reports() -> Result< (), Box< dyn std::error::Error > >
{
let results = HashMap::new();
let comparison_results = HashMap::new();
// Comprehensive performance analysis
let performance_template = PerformanceReport::new()
.title("Algorithm Performance Analysis")
.add_context("Comparing sequential vs parallel processing approaches")
.include_statistical_analysis(true)
.include_regression_analysis(true)
.add_custom_section(CustomSection::new(
"Implementation Notes",
"Detailed implementation considerations and optimizations applied"
));
let performance_report = performance_template.generate(&results)?;
// A/B testing comparison with statistical significance
let comparison_template = ComparisonReport::new()
.title("Sequential vs Parallel Processing Comparison")
.baseline("Sequential Processing")
.candidate("Parallel Processing")
.significance_threshold(0.01) // 1% statistical significance
.practical_significance_threshold(0.05); // 5% practical significance
let comparison_report = comparison_template.generate(&comparison_results)?;
Ok(())
}
```
**Performance Report Features:**
- **Executive Summary**: Key metrics and performance indicators
- **Statistical Analysis**: Confidence intervals, coefficient of variation, reliability assessment
- **Performance Tables**: Sorted results with throughput, latency, and quality indicators
- **Custom Sections**: Domain-specific analysis and recommendations
- **Professional Formatting**: Publication-ready markdown with proper statistical notation
**Comparison Report Features:**
- **Significance Testing**: Both statistical and practical significance analysis
- **Confidence Intervals**: 95% CI analysis with overlap detection
- **Performance Ratios**: Clear improvement/regression percentages
- **Reliability Assessment**: Quality validation for both baseline and candidate
- **Decision Support**: Clear recommendations based on statistical analysis
**Advanced Template Composition:**
```rust
use benchkit::prelude::*;
fn create_enterprise_template() -> PerformanceReport
{
// Create domain-specific template with multiple custom sections
let enterprise_template = PerformanceReport::new()
.title("Enterprise Algorithm Performance Audit")
.add_context("Monthly performance review for production trading systems")
.include_statistical_analysis(true)
.add_custom_section(CustomSection::new(
"Risk Assessment",
r#"### Performance Risk Analysis
| Algorithm | Latency Risk | Throughput Risk | Stability | Overall |
|-----------|-------------|-----------------|-----------|----------|
| Current | 🟢 Low | 🟡 Medium | 🟢 Low | 🟡 Medium |
| Proposed | 🟢 Low | 🟢 Low | 🟢 Low | 🟢 Low |"#
))
.add_custom_section(CustomSection::new(
"Business Impact",
r#"### Projected Business Impact
- **Latency Improvement**: 15% faster response times
- **Throughput Increase**: +2,000 req/sec capacity
- **Cost Reduction**: -$50K/month in infrastructure
- **SLA Compliance**: 99.9% → 99.99% uptime"#
));
enterprise_template
}
```
</details>
<details>
<summary><strong>Benchmark Validation Framework - Quality Assurance</strong></summary>
Comprehensive quality assessment system with configurable criteria and automatic reliability analysis.
```rust
use benchkit::prelude::*;
use std::collections::HashMap;
fn validate_benchmark_results()
{
let results = HashMap::new();
// Configure validator for your specific requirements
let validator = BenchmarkValidator::new()
.min_samples(20) // Require 20+ measurements
.max_coefficient_variation(0.10) // 10% maximum variability
.require_warmup(true) // Detect warm-up periods
.max_time_ratio(3.0) // 3x max/min ratio
.min_measurement_time(Duration::from_micros(50)); // 50μs minimum duration
// Validate all results with detailed analysis
let validated_results = ValidatedResults::new(results, validator);
println!("Reliability: {:.1}%", validated_results.reliability_rate());
// Get detailed quality warnings
if let Some(warnings) = validated_results.reliability_warnings() {
println!("⚠️ Quality Issues Detected:");
for warning in warnings {
println!(" - {}", warning);
}
}
// Work with only statistically reliable results
let reliable_only = validated_results.reliable_results();
println!("Using {}/{} reliable benchmarks for analysis",
reliable_only.len(), validated_results.results.len());
}
```
**Validation Criteria:**
- **Sample Size**: Ensure sufficient measurements for statistical power
- **Variability**: Detect high coefficient of variation indicating noise
- **Measurement Duration**: Flag measurements that may be timing-resolution limited
- **Performance Range**: Identify outliers and wide performance distributions
- **Warm-up Detection**: Verify proper system warm-up for consistent results
**Warning Types:**
- `InsufficientSamples`: Too few measurements for reliable statistics
- `HighVariability`: Coefficient of variation exceeds threshold
- `ShortMeasurementTime`: Measurements may be affected by timer resolution
- `WidePerformanceRange`: Large ratio between fastest/slowest measurements
- `NoWarmup`: Missing warm-up period may indicate measurement issues
**Domain-Specific Validation:**
```rust
use benchkit::prelude::*;
use std::collections::HashMap;
fn domain_specific_validation()
{
let results = HashMap::new();
// Real-time systems validation (very strict)
let realtime_validator = BenchmarkValidator::new()
.min_samples(50)
.max_coefficient_variation(0.02) // 2% maximum
.max_time_ratio(1.5); // Very tight timing
// Interactive systems validation (balanced)
let interactive_validator = BenchmarkValidator::new()
.min_samples(15)
.max_coefficient_variation(0.15) // 15% acceptable
.require_warmup(false); // Interactive may not show warmup
// Batch processing validation (lenient)
let batch_validator = BenchmarkValidator::new()
.min_samples(10)
.max_coefficient_variation(0.25) // 25% acceptable
.max_time_ratio(5.0); // Allow more variation
// Apply appropriate validator for your domain
let domain_results = ValidatedResults::new(results, realtime_validator);
}
```
**Quality Reporting:**
```rust
use benchkit::prelude::*;
use std::collections::HashMap;
fn generate_validation_report()
{
let results = HashMap::new();
let validator = BenchmarkValidator::new();
// Generate comprehensive validation report
let validation_report = validator.generate_validation_report(&results);
// Validation report includes:
// - Summary statistics and reliability rates
// - Detailed warnings with improvement recommendations
// - Validation criteria documentation
// - Quality assessment for each benchmark
// - Actionable steps to improve measurement quality
println!("{}", validation_report);
}
```
</details>
<details>
<summary><strong>Complete Integration Examples</strong></summary>
Comprehensive examples demonstrating real-world usage patterns and advanced integration scenarios.
**Development Workflow Integration:**
```rust
use benchkit::prelude::*;
// Complete development cycle: benchmark → validate → document → commit
fn development_workflow() -> Result< (), Box< dyn std::error::Error > >
{
// Mock implementations for doc test
fn quicksort_implementation() {}
fn mergesort_implementation() {}
// 1. Run benchmarks
let mut suite = BenchmarkSuite::new("Algorithm Performance");
suite.benchmark("quicksort", || quicksort_implementation());
suite.benchmark("mergesort", || mergesort_implementation());
let results = suite.run_all();
// 2. Validate quality
let validator = BenchmarkValidator::new()
.min_samples(15)
.max_coefficient_variation(0.15);
let validated_results = ValidatedResults::new(results.results, validator);
if validated_results.reliability_rate() < 80.0 {
return Err("Benchmark quality insufficient for analysis".into());
}
// 3. Generate professional report
let template = PerformanceReport::new()
.title("Algorithm Performance Analysis")
.include_statistical_analysis(true)
.add_custom_section(CustomSection::new(
"Development Notes",
"Analysis conducted during algorithm optimization phase"
));
let report = template.generate(&validated_results.results)?;
// 4. Update documentation atomically
let chain = MarkdownUpdateChain::new("README.md")?
.add_section("Performance Analysis", report)
.add_section("Quality Assessment", validated_results.validation_report());
chain.execute()?;
println!("✅ Development documentation updated successfully");
Ok(())
}
```
**CI/CD Pipeline Integration:**
```rust
use benchkit::prelude::*;
use std::collections::HashMap;
// Automated performance regression detection
fn cicd_performance_check(baseline_results: HashMap<String, BenchmarkResult>,
pr_results: HashMap<String, BenchmarkResult>) -> Result< bool, Box< dyn std::error::Error > > {
// Validate both result sets
let validator = BenchmarkValidator::new().require_warmup(false);
let baseline_validated = ValidatedResults::new(baseline_results.clone(), validator.clone());
let pr_validated = ValidatedResults::new(pr_results.clone(), validator);
// Require high quality for regression analysis
if baseline_validated.reliability_rate() < 90.0 || pr_validated.reliability_rate() < 90.0 {
println!("❌ BLOCK: Insufficient benchmark quality for regression analysis");
return Ok(false);
}
// Compare performance for regression detection
let comparison = ComparisonReport::new()
.title("Performance Regression Analysis")
.baseline("baseline_version")
.candidate("pr_version")
.practical_significance_threshold(0.05); // 5% regression threshold
// Create combined results for comparison
let mut combined = HashMap::new();
combined.insert("baseline_version".to_string(),
baseline_results.values().next().unwrap().clone());
combined.insert("pr_version".to_string(),
pr_results.values().next().unwrap().clone());
let regression_report = comparison.generate(&combined)?;
// Check for regressions
let has_regression = regression_report.contains("slower");
if has_regression {
println!("❌ BLOCK: Performance regression detected");
// Save detailed report for review
std::fs::write("regression_analysis.md", regression_report)?;
Ok(false)
} else {
println!("✅ ALLOW: No performance regressions detected");
Ok(true)
}
}
```
**Multi-Project Coordination:**
```rust
use benchkit::prelude::*;
use std::collections::HashMap;
// Coordinate benchmark updates across multiple related projects
fn coordinate_multi_project_benchmarks() -> Result< (), Box< dyn std::error::Error > >
{
let projects = vec!["web-api", "batch-processor", "realtime-analyzer"];
let mut all_results = HashMap::new();
// Collect results from all projects
for project in &projects {
let project_results = run_project_benchmarks(project)?;
all_results.extend(project_results);
}
// Cross-project validation with lenient criteria
let validator = BenchmarkValidator::new()
.max_coefficient_variation(0.25) // Different environments have more noise
.require_warmup(false);
let cross_project_validated = ValidatedResults::new(all_results.clone(), validator);
// Generate consolidated impact analysis
let impact_template = PerformanceReport::new()
.title("Cross-Project Performance Impact Analysis")
.add_context("Shared library upgrade impact across all dependent projects")
.include_statistical_analysis(true)
.add_custom_section(CustomSection::new(
"Project Impact Summary",
format_project_impact_analysis(&projects, &all_results)
));
let impact_report = impact_template.generate(&all_results)?;
// Update shared documentation
let shared_chain = MarkdownUpdateChain::new("SHARED_LIBRARY_IMPACT.md")?
.add_section("Current Impact Analysis", &impact_report)
.add_section("Quality Assessment", &cross_project_validated.validation_report());
shared_chain.execute()?;
// Notify project maintainers
notify_project_teams(&projects, &impact_report)?;
Ok(())
}
// Helper functions for the example
fn run_project_benchmarks(_project: &str) -> Result< HashMap< String, BenchmarkResult >, Box< dyn std::error::Error > >
{
// Mock implementation for doc test
Ok(HashMap::new())
}
fn format_project_impact_analysis(_projects: &[&str], _results: &HashMap< String, BenchmarkResult >) -> String
{
// Mock implementation for doc test
"Impact analysis summary".to_string()
}
fn notify_project_teams(_projects: &[&str], _report: &str) -> Result< (), Box< dyn std::error::Error > >
{
// Mock implementation for doc test
Ok(())
}
```
</details>
<details>
<summary><strong>Measure: Core Timing and Profiling</strong></summary>
At its heart, `benchkit` provides simple and accurate measurement primitives.
```rust
use benchkit::prelude::*;
// A robust measurement with multiple iterations and statistical cleanup.
let result = bench_function
(
"summation_1000",
( 0..1000 ).fold( 0, | acc, x | acc + x )
}
);
println!( "Avg time: {:.2?}", result.mean_time() );
println!( "Throughput: {:.0} ops/sec", result.operations_per_second() );
// Track memory usage patterns alongside timing.
let memory_benchmark = MemoryBenchmark::new( "allocation_test" );
let ( timing, memory_stats ) = memory_benchmark.run_with_tracking
(
10,
let data = vec![ 0u8; 1024 ];
memory_benchmark.tracker.record_allocation( 1024 );
std::hint::black_box( data );
}
);
println!( "Peak memory usage: {} bytes", memory_stats.peak_usage );
```
</details>
<details>
<summary><strong>Analyze: Find Insights and Regressions</strong></summary>
Turn raw numbers into actionable insights.
```rust
use benchkit::prelude::*;
// Compare multiple implementations to find the best one.
let report = ComparativeAnalysis::new( "Hashing" )
{
println!( "Fastest algorithm: {}", fastest_name );
}
// Example benchmark results
let result_a = bench_function( "test_a", || { /* ... */ } );
let result_b = bench_function( "test_b", || { /* ... */ } );
// Compare two benchmark results
let comparison = result_a.compare( &result_b );
if comparison.is_improvement()
{
println!( "Performance improved!" );
}
```
</details>
<details>
<summary><strong>Generate: Create Realistic Test Data</strong></summary>
Stop writing boilerplate to create test data. `benchkit` provides generators for common scenarios.
```rust
use benchkit::prelude::*;
// Generate a comma-separated list of 100 items.
let list_data = generate_list_data( DataSize::Medium );
// Generate realistic unilang command strings for parser benchmarking.
let command_generator = DataGenerator::new()
.complexity( DataComplexity::Complex );
let commands = command_generator.generate_unilang_commands( 10 );
// Create reproducible data with a specific seed.
let mut seeded_gen = SeededGenerator::new( 42 );
let random_data = seeded_gen.random_string( 1024 );
```
</details>
<details>
<summary><strong>Document: Automate Your Reports</strong></summary>
The "documentation-first" philosophy is enabled by powerful report generation and file updating tools.
```rust,no_run
use benchkit::prelude::*;
fn main() -> Result< (), Box< dyn std::error::Error > >
{
let mut suite = BenchmarkSuite::new( "api_performance" );
suite.benchmark( "get_user", || { /* ... */ } );
suite.benchmark( "create_user", || { /* ... */ } );
let results = suite.run_analysis();
// Generate a markdown report from the results.
let markdown_report = results.generate_markdown_report().generate();
// Automatically update the "## Performance" section of a file.
let updater = MarkdownUpdater::new( "readme.md", "Performance" )?;
updater.update_section( &markdown_report )?;
Ok( () )
}
```
</details>
## The `benchkit` Workflow
`benchkit` is designed to make performance analysis a natural part of your development cycle.
```text
[ 1. Write Code ] -> [ 2. Add Benchmark in `benches/` ] -> [ 3. Run `cargo run --bin` ]
^ |
| v
[ 5. Commit Code + Perf Docs ] <- [ 4. Auto-Update `benchmark_results.md` ] <- [ Analyze Results ]
```
## 📁 MANDATORY `benches/` Directory - NO ALTERNATIVES
**ABSOLUTE REQUIREMENT**: ALL benchmark-related files MUST be in the `benches/` directory. This is NON-NEGOTIABLE for proper benchkit functionality:
- 🚫 **NEVER in `tests/`**: Benchmarks are NOT tests and MUST NOT be mixed with unit tests
- 🚫 **NEVER in `examples/`**: Examples are demonstrations, NOT performance measurements
- 🚫 **NEVER in `src/bin/`**: Source binaries are NOT benchmarks
- ✅ **ONLY in `benches/`**: This is the EXCLUSIVE location for ALL benchmark content
**Why This Requirement Exists:**
- ⚡ **Cargo Requirement**: `cargo bench` ONLY works with `benches/` directory
- 🏗️ **Ecosystem Standard**: ALL professional Rust projects use `benches/` EXCLUSIVELY
- 🔧 **Tool Compatibility**: IDEs, CI systems, linters expect benchmarks ONLY in `benches/`
- 📊 **Performance Isolation**: Benchmarks require different compilation and execution than tests
### Why This Matters
**Ecosystem Integration**: The `benches/` directory is the official Rust standard, ensuring compatibility with the entire Rust toolchain.
**Zero Configuration**: `cargo bench` automatically discovers and runs benchmarks in the `benches/` directory without additional setup.
**Community Expectations**: Developers expect to find benchmarks in `benches/` - this follows the principle of least surprise.
**Tool Compatibility**: All Rust tooling (IDEs, CI/CD, linters) is designed around the standard `benches/` structure.
### Automatic Documentation Updates
`benchkit` excels at maintaining comprehensive, automatically updated documentation in your project files:
```markdown
# Benchmark Results
## Algorithm Comparison
| quicksort | 1.23ms | 815 ops/s | baseline |
| mergesort | 1.45ms | 689 ops/s | 1.18x |
| heapsort | 1.67ms | 599 ops/s | 1.36x |
*Last updated: 2024-01-15 14:32:18 UTC*
*Generated by benchkit v0.4.0*
## Performance Trends
- quicksort maintains consistent performance across data sizes
- mergesort shows better cache behavior on large datasets
- heapsort provides predictable O(n log n) guarantees
## Test Configuration
- Hardware: 16-core AMD Ryzen, 32GB RAM
- Rust version: 1.75.0
- Optimization: --release
- Iterations: 1000 per benchmark
```
This documentation is automatically generated and updated every time you run benchmarks.
### Integration Examples
```rust,no_run
// ✅ In standard tests/ directory alongside unit tests
// tests/performance_comparison.rs
use benchkit::prelude::*;
#[test]
fn benchmark_algorithms()
{
let mut suite = BenchmarkSuite::new( "Algorithm Comparison" );
suite.benchmark( "quick_sort", ||
{
// Your quicksort implementation
});
suite.benchmark( "merge_sort", ||
{
// Your mergesort implementation
});
let results = suite.run_all();
// Automatically update readme.md with results
let updater = MarkdownUpdater::new( "readme.md", "Performance" ).unwrap();
updater.update_section( &results.generate_markdown_report().generate() ).unwrap();
}
```
```rust,no_run
// ✅ In examples/ directory for demonstrations
// examples/comprehensive_benchmark.rs
use benchkit::prelude::*;
fn main()
{
let mut comprehensive = BenchmarkSuite::new( "Comprehensive Performance Analysis" );
// Add multiple benchmarks
comprehensive.benchmark( "data_processing", || { /* code */ } );
comprehensive.benchmark( "memory_operations", || { /* code */ } );
comprehensive.benchmark( "io_operations", || { /* code */ } );
let results = comprehensive.run_all();
// Update readme.md with comprehensive report
let report = results.generate_markdown_report();
let updater = MarkdownUpdater::new( "readme.md", "Performance Analysis" ).unwrap();
updater.update_section( &report.generate() ).unwrap();
println!( "Updated readme.md with latest performance results" );
}
```
### 🔧 Feature Flag Recommendations
For optimal build performance and clean separation, put your benchmark code behind feature flags:
```rust,no_run
// ✅ In src/bin/ directory for dedicated benchmark executables
// src/bin/comprehensive_benchmark.rs
#[ cfg( feature = "enabled" ) ]
use benchkit::prelude::*;
#[ cfg( feature = "enabled" ) ]
fn main()
{
let mut suite = BenchmarkSuite::new( "Comprehensive Performance Suite" );
suite.benchmark( "algorithm_a", || { /* implementation */ } );
suite.benchmark( "algorithm_b", || { /* implementation */ } );
// Automatically update readme.md
let updater = MarkdownUpdater::new( "readme.md", "Latest Results" ).unwrap();
updater.update_section( &results.generate_markdown_report().generate() ).unwrap();
println!( "Benchmarks completed - readme.md updated" );
}
#[ cfg( not( feature = "enabled" ) ) ]
fn main()
{
println!( "Run with: cargo run --bin comprehensive_benchmark --features enabled" );
println!( "Results will be automatically saved to readme.md" );
}
```
Add to your `Cargo.toml`:
```toml
[features]
benchmark = ["benchkit"]
[dev-dependencies]
benchkit = { version = "0.8.0", features = ["full"], optional = true }
```
Run benchmarks selectively:
```bash
# Run only unit tests (fast)
cargo test
# Run specific benchmark binary (updates readme.md)
cargo run --bin comprehensive_benchmark --features enabled
# Run benchmarks from examples/
cargo run --example performance_demo --features enabled
# Run all binaries containing benchmarks
cargo run --bin performance_suite --features enabled
```
This approach keeps your regular builds fast while making comprehensive performance testing available when needed.
## 📚 Comprehensive Examples
`benchkit` includes extensive examples demonstrating every feature and usage pattern:
### 🎯 Feature-Specific Examples
- **[Update Chain Comprehensive](examples/update_chain_comprehensive.rs)**: Complete demonstration of atomic documentation updates
- Single and multi-section updates with conflict detection
- Error handling and recovery patterns
- Advanced conflict resolution strategies
- Performance optimization for bulk updates
- Full integration with validation and templates
- **[Templates Comprehensive](examples/templates_comprehensive.rs)**: Professional report generation in all scenarios
- Basic and fully customized Performance Report templates
- A/B testing with Comparison Report templates
- Custom sections with advanced markdown formatting
- Multiple comparison scenarios and batch processing
- Business impact analysis and risk assessment templates
- Comprehensive error handling for edge cases
- **[Validation Comprehensive](examples/validation_comprehensive.rs)**: Quality assurance for reliable benchmarking
- Default and custom validator configurations
- Individual warning types with detailed analysis
- Validation report generation and interpretation
- Reliable results filtering for analysis
- Domain-specific validation scenarios (research, development, production, micro)
- Full integration with templates and update chains
- **[Regression Analysis Comprehensive](examples/regression_analysis_comprehensive.rs)**: Complete regression analysis system demonstration
- All baseline strategies (Fixed, Rolling Average, Previous Run)
- Performance trend detection (Improving, Degrading, Stable)
- Statistical significance testing with configurable thresholds
- Professional markdown report generation with regression insights
- Real-world optimization scenarios and configuration guidance
- Full integration with PerformanceReport templates
- **[Historical Data Management](examples/historical_data_management.rs)**: Managing long-term performance data
- Incremental historical data building and TimestampedResults creation
- Data quality validation and cleanup procedures
- Performance trend analysis across multiple time windows
- Storage and serialization strategy recommendations
- Data retention and archival best practices
- Integration with RegressionAnalyzer for trend detection
### 🔧 Integration Examples
- **[Integration Workflows](examples/integration_workflows.rs)**: Real-world workflow automation
- Development cycle: benchmark → validate → document → commit
- CI/CD pipeline: regression detection → merge decision → automated reporting
- Multi-project coordination: impact analysis → consolidated reporting → team alignment
- Production monitoring: continuous tracking → alerting → dashboard updates
- **[Error Handling Patterns](examples/error_handling_patterns.rs)**: Robust operation under adverse conditions
- Update Chain file system errors (permissions, conflicts, recovery)
- Template generation errors (missing data, invalid parameters)
- Validation framework edge cases (malformed data, extreme variance)
- System errors (resource limits, concurrent access)
- Graceful degradation strategies with automatic fallbacks
- **[Advanced Usage Patterns](examples/advanced_usage_patterns.rs)**: Enterprise-scale benchmarking
- Domain-specific validation criteria (real-time, interactive, batch processing)
- Template composition and inheritance patterns
- Coordinated multi-document updates with consistency guarantees
- Memory-efficient large-scale processing (1000+ algorithms)
- Performance optimization techniques (caching, concurrency, incremental processing)
- **[CI/CD Regression Detection](examples/cicd_regression_detection.rs)**: Automated performance validation in CI/CD pipelines
- Multi-environment validation (development, staging, production)
- Configurable regression thresholds and statistical significance levels
- Automated performance gate decisions with proper exit codes
- GitHub Actions compatible reporting and documentation updates
- Progressive validation pipeline with halt-on-failure
- Real-world CI/CD integration patterns and best practices
- **🚨 [Cargo Bench Integration](examples/cargo_bench_integration.rs)**: CRITICAL - Standard `cargo bench` integration patterns
- Seamless integration with Rust's standard `cargo bench` command
- Automatic documentation updates during benchmark execution
- Standard `benches/` directory structure support
- Criterion compatibility layer for zero-migration adoption
- CI/CD integration with standard workflows and conventions
- Real-world project structure and configuration examples
- **This is the foundation requirement for benchkit adoption**
### 🚀 Running the Examples
```bash
# Feature-specific examples
cargo run --example update_chain_comprehensive --all-features
cargo run --example templates_comprehensive --all-features
cargo run --example validation_comprehensive --all-features
# NEW: Regression Analysis Examples
cargo run --example regression_analysis_comprehensive --all-features
cargo run --example historical_data_management --all-features
# Integration examples
cargo run --example integration_workflows --all-features
cargo run --example error_handling_patterns --all-features
cargo run --example advanced_usage_patterns --all-features
# NEW: CI/CD Integration Example
cargo run --example cicd_regression_detection --all-features
# 🚨 CRITICAL: Cargo Bench Integration Example
cargo run --example cargo_bench_integration --all-features
# Original enhanced features demo
cargo run --example enhanced_features_demo --all-features
```
Each example is fully documented with detailed explanations and demonstrates production-ready patterns you can adapt to your specific needs.
## Installation
Add `benchkit` to your `[dev-dependencies]` in `Cargo.toml`.
```toml
[dev-dependencies]
# For core functionality
benchkit = "0.1"
# Or enable all features for the full toolkit
benchkit = { version = "0.8.0", features = [ "full" ] }
```
## 📋 Development Guidelines & Best Practices
**⚠️ IMPORTANT**: Before using benchkit in production or contributing to development, **strongly review** the comprehensive [`usage.md`](usage.md) file. This document contains essential requirements, best practices, and lessons learned from real-world performance analysis work.
The recommendations cover:
- ✅ **Core philosophy** and toolkit vs framework principles
- ✅ **Technical architecture** requirements and feature organization
- ✅ **Performance analysis** best practices with standardized data patterns
- ✅ **Documentation integration** requirements for automated reporting
- ✅ **Statistical analysis** requirements for reliable measurements
**📖 Read [`usage.md`](usage.md) first** - it will save you time and ensure you're following proven patterns.
## Contributing
Contributions are welcome! `benchkit` aims to be a community-driven toolkit that solves real-world benchmarking problems.
**Before contributing:**
1. **📖 Read [`usage.md`](usage.md)** - Contains all development requirements and design principles
2. Review open tasks in the [`task/`](task/) directory
3. Check our contribution guidelines
All contributions must align with the principles and requirements outlined in [`usage.md`](usage.md).
## License
This project is licensed under the **MIT License**.
## Performance
*This section is automatically updated by benchkit when you run benchmarks.*