Crate benchkit

Crate benchkit 

Source
Expand description

Lightweight benchmarking toolkit focused on practical performance analysis and report generation.

Β§benchkit

docs.rs discord

Practical, Documentation-First Benchmarking for Rust.

benchkit is a lightweight toolkit for performance analysis, born from the hard-learned lessons of optimizing high-performance libraries. It rejects rigid, all-or-nothing frameworks in favor of flexible, composable tools that integrate seamlessly into your existing workflow.

🎯 NEW TO benchkit? Start with recommendations.md - Essential guidelines from real-world performance optimization experience.

Β§The Benchmarking Dilemma

In Rust, developers often face a frustrating choice:

  1. The Heavy Framework (criterion): Statistically powerful, but forces a rigid structure (benches/), complex setup, and produces reports that are difficult to integrate into your project’s documentation. You must adapt your project to the framework.
  2. The Manual Approach (std::time): Simple to start, but statistically naive. It leads to boilerplate, inconsistent measurements, and conclusions that are easily skewed by system noise.

benchkit offers a third way.

πŸ“‹ Important: For production use and development contributions, see recommendations.md - a comprehensive guide with proven patterns, requirements, and best practices from real-world benchmarking experience.

Β§A Toolkit, Not a Framework

This is the core philosophy of benchkit. It doesn’t impose a workflow; it provides a set of professional, composable tools that you can use however you see fit.

  • βœ… Integrate Anywhere: Write benchmarks in your test files, examples, or binaries. No required directory structure.
  • βœ… Documentation-First: Treat performance reports as a first-class part of your documentation, with tools to automatically keep them in sync with your code.
  • βœ… Practical Focus: Surface the key metrics needed for optimization decisions, hiding deep statistical complexity until you ask for it.
  • βœ… Zero Setup: Start measuring performance in minutes with a simple, intuitive API.

Β§πŸš€ Quick Start: Compare, Analyze, and Document

πŸ“– First time? Review recommendations.md for comprehensive best practices and development guidelines.

This example demonstrates the core benchkit workflow: comparing two algorithms and automatically updating a performance section in your readme.md.

1. Add to dev-dependencies in Cargo.toml:

[dev-dependencies]
benchkit = { version = "0.1", features = [ "full" ] }

2. Create a benchmark in your benches directory:

// In benches/performance_demo.rs
#![ cfg( feature = "enabled" ) ]
use benchkit::prelude::*;

fn generate_data( size : usize ) -> Vec< u32 >
{
  ( 0..size ).map( | x | x as u32 ).collect()
}

#[ test ]
fn update_readme_performance_docs()
{
  let mut comparison = ComparativeAnalysis::new( "Sorting Algorithms" );
  let data = generate_data( 1000 );

  // Benchmark the first algorithm
  comparison = comparison.algorithm
  (
    "std_stable_sort",
    {
      let mut d = data.clone();
      move ||
      {
        d.sort();
      }
    }
  );

  // Benchmark the second algorithm
  comparison = comparison.algorithm
  (
    "std_unstable_sort",
    {
      let mut d = data.clone();
      move ||
      {
        d.sort_unstable();
      }
    }
  );

  // Run the comparison and update readme.md
  let report = comparison.run();
  let markdown = report.to_markdown();

  let updater = MarkdownUpdater::new( "readme.md", "Benchmark Results" ).unwrap();
  updater.update_section( &markdown ).unwrap();
}

3. Run your benchmark and watch readme.md update automatically:

cargo run --bin performance_demo --features enabled  

§🧰 What’s in the Toolkit?

benchkit provides a suite of composable tools. Use only what you need.

Β§πŸ†• Enhanced Features

πŸ”₯ NEW: Comprehensive Regression Analysis System

Advanced performance regression detection with statistical analysis and trend identification.

use benchkit::prelude::*;
use std::collections::HashMap;
use std::time::{ Duration, SystemTime };

fn regression_analysis_example() -> Result< (), Box< dyn std::error::Error > > {
    // Current benchmark results
    let mut current_results = HashMap::new();
    let current_times = vec![ Duration::from_micros( 85 ), Duration::from_micros( 88 ), Duration::from_micros( 82 ) ];
    current_results.insert( "fast_sort".to_string(), BenchmarkResult::new( "fast_sort", current_times ) );
    
    // Historical baseline data
    let mut baseline_data = HashMap::new();
    let baseline_times = vec![ Duration::from_micros( 110 ), Duration::from_micros( 115 ), Duration::from_micros( 108 ) ];
    baseline_data.insert( "fast_sort".to_string(), BenchmarkResult::new( "fast_sort", baseline_times ) );
    
    let historical = HistoricalResults::new().with_baseline( baseline_data );
    
    // Configure regression analyzer
    let analyzer = RegressionAnalyzer::new()
        .with_baseline_strategy( BaselineStrategy::FixedBaseline )
        .with_significance_threshold( 0.05 )  // 5% significance level
        .with_trend_window( 5 );
    
    // Perform regression analysis
    let regression_report = analyzer.analyze( &current_results, &historical );
    
    // Check results
    if regression_report.has_significant_changes() {
        println!( "πŸ“Š Significant performance changes detected!" );
        
        if let Some( trend ) = regression_report.get_trend_for( "fast_sort" ) {
            match trend {
                PerformanceTrend::Improving => println!( "🟒 Performance improved!" ),
                PerformanceTrend::Degrading => println!( "πŸ”΄ Performance regression detected!" ),
                PerformanceTrend::Stable => println!( "🟑 Performance remains stable" ),
            }
        }
        
        // Generate professional markdown report
        let markdown_report = regression_report.format_markdown();
        println!( "{}", markdown_report );
    }
    
    Ok(())
}

Key Features:

  • Three Baseline Strategies: Fixed baseline, rolling average, and previous run comparison
  • Statistical Significance: Configurable thresholds with proper statistical testing
  • Trend Detection: Automatic identification of improving, degrading, or stable performance
  • Professional Reports: Publication-quality markdown with statistical analysis
  • CI/CD Integration: Automated regression detection for deployment pipelines
  • Historical Data Management: Long-term performance tracking with quality validation

Use Cases:

  • Automated performance regression detection in CI/CD pipelines
  • Long-term performance monitoring and trend analysis
  • Code optimization validation with statistical confidence
  • Production deployment gates with zero-regression tolerance
  • Performance documentation with automated updates
Safe Update Chain Pattern - Atomic Documentation Updates

Coordinate multiple markdown section updates atomically - either all succeed or none are modified.

use benchkit::prelude::*;

fn update_markdown_atomically() -> Result< (), Box< dyn std::error::Error > > {
    let performance_markdown = "## Performance Results\n\nFast!";
    let memory_markdown = "## Memory Usage\n\nLow!";
    let cpu_markdown = "## CPU Usage\n\nOptimal!";
    
    // Update multiple sections atomically
    let chain = MarkdownUpdateChain::new("readme.md")?
        .add_section("Performance Benchmarks", performance_markdown)
        .add_section("Memory Analysis", memory_markdown)
        .add_section("CPU Profiling", cpu_markdown);

    // Validate all sections before any updates
    let conflicts = chain.check_all_conflicts()?;
    if !conflicts.is_empty() {
        return Err(format!("Section conflicts detected: {:?}", conflicts).into());
    }

    // Atomic update - either all succeed or all fail
    chain.execute()?;
    Ok(())
}

Key Features:

  • Atomic Operations: Either all sections update successfully or none are modified
  • Conflict Detection: Validates all sections exist and are unambiguous before any changes
  • Automatic Rollback: Failed operations restore original file state
  • Reduced I/O: Single read and write operation instead of multiple file accesses
  • Error Recovery: Comprehensive error handling with detailed diagnostics

Use Cases:

  • Multi-section benchmark reports that must stay synchronized
  • CI/CD pipelines requiring consistent documentation updates
  • Coordinated updates across large documentation projects
  • Production deployments where partial updates would be problematic

Advanced Example:

use benchkit::prelude::*;

fn complex_update_example() -> Result< (), Box< dyn std::error::Error > > {
    let performance_report = "Performance analysis results";
    let memory_report = "Memory usage analysis";
    let comparison_report = "Algorithm comparison data";
    let validation_report = "Quality assessment report";
    
    // Complex coordinated update across multiple report types
    let chain = MarkdownUpdateChain::new("PROJECT_BENCHMARKS.md")?
        .add_section("Performance Analysis", performance_report)
        .add_section("Memory Usage Analysis", memory_report)
        .add_section("Algorithm Comparison", comparison_report)
        .add_section("Quality Assessment", validation_report);

    // Validate everything before committing any changes
    match chain.check_all_conflicts() {
        Ok(conflicts) if conflicts.is_empty() => {
            println!("βœ… All {} sections validated", chain.len());
            chain.execute()?;
        },
        Ok(conflicts) => {
            eprintln!("⚠️ Conflicts: {:?}", conflicts);
            // Handle conflicts or use more specific section names
        },
        Err(e) => eprintln!("❌ Validation failed: {}", e),
    }
    Ok(())
}
Professional Report Templates - Research-Grade Documentation

Generate standardized, publication-quality reports with full statistical analysis and customizable sections.

use benchkit::prelude::*;
use std::collections::HashMap;

fn generate_reports() -> Result< (), Box< dyn std::error::Error > > {
    let results = HashMap::new();
    let comparison_results = HashMap::new();
    
    // Comprehensive performance analysis
    let performance_template = PerformanceReport::new()
        .title("Algorithm Performance Analysis")
        .add_context("Comparing sequential vs parallel processing approaches")
        .include_statistical_analysis(true)
        .include_regression_analysis(true)
        .add_custom_section(CustomSection::new(
            "Implementation Notes", 
            "Detailed implementation considerations and optimizations applied"
        ));

    let performance_report = performance_template.generate(&results)?;

    // A/B testing comparison with statistical significance
    let comparison_template = ComparisonReport::new()
        .title("Sequential vs Parallel Processing Comparison")
        .baseline("Sequential Processing")
        .candidate("Parallel Processing") 
        .significance_threshold(0.01)     // 1% statistical significance
        .practical_significance_threshold(0.05);  // 5% practical significance

    let comparison_report = comparison_template.generate(&comparison_results)?;
    Ok(())
}

Performance Report Features:

  • Executive Summary: Key metrics and performance indicators
  • Statistical Analysis: Confidence intervals, coefficient of variation, reliability assessment
  • Performance Tables: Sorted results with throughput, latency, and quality indicators
  • Custom Sections: Domain-specific analysis and recommendations
  • Professional Formatting: Publication-ready markdown with proper statistical notation

Comparison Report Features:

  • Significance Testing: Both statistical and practical significance analysis
  • Confidence Intervals: 95% CI analysis with overlap detection
  • Performance Ratios: Clear improvement/regression percentages
  • Reliability Assessment: Quality validation for both baseline and candidate
  • Decision Support: Clear recommendations based on statistical analysis

Advanced Template Composition:

use benchkit::prelude::*;

fn create_enterprise_template() -> PerformanceReport {
    // Create domain-specific template with multiple custom sections
    let enterprise_template = PerformanceReport::new()
        .title("Enterprise Algorithm Performance Audit")
        .add_context("Monthly performance review for production trading systems")
        .include_statistical_analysis(true)
        .add_custom_section(CustomSection::new(
            "Risk Assessment",
            r#"### Performance Risk Analysis
            
    | Algorithm | Latency Risk | Throughput Risk | Stability | Overall |
    |-----------|-------------|-----------------|-----------|----------|
    | Current   | 🟒 Low     | 🟑 Medium      | 🟒 Low   | 🟑 Medium |
    | Proposed  | 🟒 Low     | 🟒 Low        | 🟒 Low   | 🟒 Low    |"#
        ))
        .add_custom_section(CustomSection::new(
            "Business Impact",
            r#"### Projected Business Impact

    - **Latency Improvement**: 15% faster response times
    - **Throughput Increase**: +2,000 req/sec capacity
    - **Cost Reduction**: -$50K/month in infrastructure
    - **SLA Compliance**: 99.9% β†’ 99.99% uptime"#
        ));
    enterprise_template
}
Benchmark Validation Framework - Quality Assurance

Comprehensive quality assessment system with configurable criteria and automatic reliability analysis.

use benchkit::prelude::*;
use std::collections::HashMap;

fn validate_benchmark_results() {
    let results = HashMap::new();
    
    // Configure validator for your specific requirements
    let validator = BenchmarkValidator::new()
        .min_samples(20)                           // Require 20+ measurements
        .max_coefficient_variation(0.10)           // 10% maximum variability
        .require_warmup(true)                      // Detect warm-up periods
        .max_time_ratio(3.0)                       // 3x max/min ratio
        .min_measurement_time(Duration::from_micros(50)); // 50ΞΌs minimum duration

    // Validate all results with detailed analysis
    let validated_results = ValidatedResults::new(results, validator);

    println!("Reliability: {:.1}%", validated_results.reliability_rate());

    // Get detailed quality warnings
    if let Some(warnings) = validated_results.reliability_warnings() {
        println!("⚠️ Quality Issues Detected:");
        for warning in warnings {
            println!("  - {}", warning);
        }
    }

    // Work with only statistically reliable results
    let reliable_only = validated_results.reliable_results();
    println!("Using {}/{} reliable benchmarks for analysis", 
             reliable_only.len(), validated_results.results.len());
}

Validation Criteria:

  • Sample Size: Ensure sufficient measurements for statistical power
  • Variability: Detect high coefficient of variation indicating noise
  • Measurement Duration: Flag measurements that may be timing-resolution limited
  • Performance Range: Identify outliers and wide performance distributions
  • Warm-up Detection: Verify proper system warm-up for consistent results

Warning Types:

  • InsufficientSamples: Too few measurements for reliable statistics
  • HighVariability: Coefficient of variation exceeds threshold
  • ShortMeasurementTime: Measurements may be affected by timer resolution
  • WidePerformanceRange: Large ratio between fastest/slowest measurements
  • NoWarmup: Missing warm-up period may indicate measurement issues

Domain-Specific Validation:

use benchkit::prelude::*;
use std::collections::HashMap;

fn domain_specific_validation() {
    let results = HashMap::new();
    
    // Real-time systems validation (very strict)
    let realtime_validator = BenchmarkValidator::new()
        .min_samples(50)
        .max_coefficient_variation(0.02)  // 2% maximum
        .max_time_ratio(1.5);             // Very tight timing

    // Interactive systems validation (balanced)  
    let interactive_validator = BenchmarkValidator::new()
        .min_samples(15)
        .max_coefficient_variation(0.15)  // 15% acceptable
        .require_warmup(false);           // Interactive may not show warmup

    // Batch processing validation (lenient)
    let batch_validator = BenchmarkValidator::new()
        .min_samples(10)
        .max_coefficient_variation(0.25)  // 25% acceptable  
        .max_time_ratio(5.0);             // Allow more variation

    // Apply appropriate validator for your domain
    let domain_results = ValidatedResults::new(results, realtime_validator);
}

Quality Reporting:

use benchkit::prelude::*;
use std::collections::HashMap;

fn generate_validation_report() {
    let results = HashMap::new();
    let validator = BenchmarkValidator::new();
    
    // Generate comprehensive validation report
    let validation_report = validator.generate_validation_report(&results);

    // Validation report includes:
    // - Summary statistics and reliability rates
    // - Detailed warnings with improvement recommendations  
    // - Validation criteria documentation
    // - Quality assessment for each benchmark
    // - Actionable steps to improve measurement quality

    println!("{}", validation_report);
}
Complete Integration Examples

Comprehensive examples demonstrating real-world usage patterns and advanced integration scenarios.

Development Workflow Integration:

use benchkit::prelude::*;

// Complete development cycle: benchmark β†’ validate β†’ document β†’ commit
fn development_workflow() -> Result< (), Box< dyn std::error::Error > > {
    // Mock implementations for doc test
    fn quicksort_implementation() {}
    fn mergesort_implementation() {}
    
    // 1. Run benchmarks
    let mut suite = BenchmarkSuite::new("Algorithm Performance");
    suite.benchmark("quicksort", || quicksort_implementation());
    suite.benchmark("mergesort", || mergesort_implementation());
    let results = suite.run_all();
    
    // 2. Validate quality
    let validator = BenchmarkValidator::new()
        .min_samples(15)
        .max_coefficient_variation(0.15);
    let validated_results = ValidatedResults::new(results.results, validator);
    
    if validated_results.reliability_rate() < 80.0 {
        return Err("Benchmark quality insufficient for analysis".into());
    }
    
    // 3. Generate professional report
    let template = PerformanceReport::new()
        .title("Algorithm Performance Analysis")
        .include_statistical_analysis(true)
        .add_custom_section(CustomSection::new(
            "Development Notes",
            "Analysis conducted during algorithm optimization phase"
        ));
    
    let report = template.generate(&validated_results.results)?;
    
    // 4. Update documentation atomically
    let chain = MarkdownUpdateChain::new("README.md")?
        .add_section("Performance Analysis", report)
        .add_section("Quality Assessment", validated_results.validation_report());
    
    chain.execute()?;
    println!("βœ… Development documentation updated successfully");
    
    Ok(())
}

CI/CD Pipeline Integration:

use benchkit::prelude::*;
use std::collections::HashMap;

// Automated performance regression detection
fn cicd_performance_check(baseline_results: HashMap<String, BenchmarkResult>, 
                          pr_results: HashMap<String, BenchmarkResult>) -> Result< bool, Box< dyn std::error::Error > > {
    // Validate both result sets
    let validator = BenchmarkValidator::new().require_warmup(false);
    let baseline_validated = ValidatedResults::new(baseline_results.clone(), validator.clone());
    let pr_validated = ValidatedResults::new(pr_results.clone(), validator);
    
    // Require high quality for regression analysis
    if baseline_validated.reliability_rate() < 90.0 || pr_validated.reliability_rate() < 90.0 {
        println!("❌ BLOCK: Insufficient benchmark quality for regression analysis");
        return Ok(false);
    }
    
    // Compare performance for regression detection
    let comparison = ComparisonReport::new()
        .title("Performance Regression Analysis")
        .baseline("baseline_version")
        .candidate("pr_version")
        .practical_significance_threshold(0.05);  // 5% regression threshold
    
    // Create combined results for comparison
    let mut combined = HashMap::new();
    combined.insert("baseline_version".to_string(), 
                   baseline_results.values().next().unwrap().clone());
    combined.insert("pr_version".to_string(), 
                   pr_results.values().next().unwrap().clone());
    
    let regression_report = comparison.generate(&combined)?;
    
    // Check for regressions
    let has_regression = regression_report.contains("slower");
    
    if has_regression {
        println!("❌ BLOCK: Performance regression detected");
        // Save detailed report for review
        std::fs::write("regression_analysis.md", regression_report)?;
        Ok(false)
    } else {
        println!("βœ… ALLOW: No performance regressions detected");
        Ok(true)
    }
}

Multi-Project Coordination:

use benchkit::prelude::*;
use std::collections::HashMap;

// Coordinate benchmark updates across multiple related projects
fn coordinate_multi_project_benchmarks() -> Result< (), Box< dyn std::error::Error > > {
    let projects = vec!["web-api", "batch-processor", "realtime-analyzer"];
    let mut all_results = HashMap::new();
    
    // Collect results from all projects  
    for project in &projects {
        let project_results = run_project_benchmarks(project)?;
        all_results.extend(project_results);
    }
    
    // Cross-project validation with lenient criteria
    let validator = BenchmarkValidator::new()
        .max_coefficient_variation(0.25)  // Different environments have more noise
        .require_warmup(false);
    
    let cross_project_validated = ValidatedResults::new(all_results.clone(), validator);
    
    // Generate consolidated impact analysis
    let impact_template = PerformanceReport::new()
        .title("Cross-Project Performance Impact Analysis")
        .add_context("Shared library upgrade impact across all dependent projects")
        .include_statistical_analysis(true)
        .add_custom_section(CustomSection::new(
            "Project Impact Summary",
            format_project_impact_analysis(&projects, &all_results)
        ));
    
    let impact_report = impact_template.generate(&all_results)?;
    
    // Update shared documentation
    let shared_chain = MarkdownUpdateChain::new("SHARED_LIBRARY_IMPACT.md")?
        .add_section("Current Impact Analysis", &impact_report)
        .add_section("Quality Assessment", &cross_project_validated.validation_report());
    
    shared_chain.execute()?;
    
    // Notify project maintainers
    notify_project_teams(&projects, &impact_report)?;
    
    Ok(())
}

// Helper functions for the example
fn run_project_benchmarks(_project: &str) -> Result< HashMap< String, BenchmarkResult >, Box< dyn std::error::Error > > {
    // Mock implementation for doc test
    Ok(HashMap::new())
}

fn format_project_impact_analysis(_projects: &[&str], _results: &HashMap< String, BenchmarkResult >) -> String {
    // Mock implementation for doc test  
    "Impact analysis summary".to_string()
}

fn notify_project_teams(_projects: &[&str], _report: &str) -> Result< (), Box< dyn std::error::Error > > {
    // Mock implementation for doc test
    Ok(())
}
Measure: Core Timing and Profiling

At its heart, benchkit provides simple and accurate measurement primitives.

use benchkit::prelude::*;

// A robust measurement with multiple iterations and statistical cleanup.
let result = bench_function
(
  "summation_1000",
  ||
  {
    ( 0..1000 ).fold( 0, | acc, x | acc + x )
  }
);
println!( "Avg time: {:.2?}", result.mean_time() );
println!( "Throughput: {:.0} ops/sec", result.operations_per_second() );

// Track memory usage patterns alongside timing.
let memory_benchmark = MemoryBenchmark::new( "allocation_test" );
let ( timing, memory_stats ) = memory_benchmark.run_with_tracking
(
  10,
  ||
  {
    let data = vec![ 0u8; 1024 ];
    memory_benchmark.tracker.record_allocation( 1024 );
    std::hint::black_box( data );
  }
);
println!( "Peak memory usage: {} bytes", memory_stats.peak_usage );
Analyze: Find Insights and Regressions

Turn raw numbers into actionable insights.

use benchkit::prelude::*;

// Compare multiple implementations to find the best one.
let report = ComparativeAnalysis::new( "Hashing" )
.algorithm( "fnv", || { /* ... */ } )
.algorithm( "siphash", || { /* ... */ } )
.run();

if let Some( ( fastest_name, _ ) ) = report.fastest()
{
  println!( "Fastest algorithm: {}", fastest_name );
}

// Example benchmark results
let result_a = bench_function( "test_a", || { /* ... */ } );
let result_b = bench_function( "test_b", || { /* ... */ } );

// Compare two benchmark results
let comparison = result_a.compare( &result_b );
if comparison.is_improvement()
{
  println!( "Performance improved!" );
}
Generate: Create Realistic Test Data

Stop writing boilerplate to create test data. benchkit provides generators for common scenarios.

use benchkit::prelude::*;

// Generate a comma-separated list of 100 items.
let list_data = generate_list_data( DataSize::Medium );

// Generate realistic unilang command strings for parser benchmarking.
let command_generator = DataGenerator::new()
.complexity( DataComplexity::Complex );
let commands = command_generator.generate_unilang_commands( 10 );

// Create reproducible data with a specific seed.
let mut seeded_gen = SeededGenerator::new( 42 );
let random_data = seeded_gen.random_string( 1024 );
Document: Automate Your Reports

The β€œdocumentation-first” philosophy is enabled by powerful report generation and file updating tools.

use benchkit::prelude::*;

fn main() -> Result< (), Box< dyn std::error::Error > >
{
  let mut suite = BenchmarkSuite::new( "api_performance" );
  suite.benchmark( "get_user", || { /* ... */ } );
  suite.benchmark( "create_user", || { /* ... */ } );
  let results = suite.run_analysis();

  // Generate a markdown report from the results.
  let markdown_report = results.generate_markdown_report().generate();

  // Automatically update the "## Performance" section of a file.
  let updater = MarkdownUpdater::new( "readme.md", "Performance" )?;
  updater.update_section( &markdown_report )?;
  
  Ok( () )
}

Β§The benchkit Workflow

benchkit is designed to make performance analysis a natural part of your development cycle.

[ 1. Write Code ] -> [ 2. Add Benchmark in `benches/` ] -> [ 3. Run `cargo run --bin` ]
       ^                                                                   |
       |                                                                   v
[ 5. Commit Code + Perf Docs ] <- [ 4. Auto-Update `benchmark_results.md` ] <- [ Analyze Results ]

Β§πŸ“ Why Not benches/? Standard Directory Integration

The traditional benches/ directory creates artificial separation between ALL your benchmark content and the standard Rust project structure. benchkit encourages you to use standard directories for ALL benchmark-related files:

  • βœ… Use tests/: Performance benchmarks alongside unit tests
  • βœ… Use examples/: Demonstration benchmarks and showcases
  • βœ… Use src/bin/: Dedicated benchmark executables
  • βœ… Standard integration: Keep ALL benchmark content in standard Rust directories
  • ❌ Avoid benches/: Don’t isolate ANY benchmark files in framework-specific directories

Β§Why This Matters

Workflow Integration: ALL benchmark content should be part of regular development, not isolated in framework-specific directories.

Documentation Proximity: ALL benchmark files are documentation - keep them integrated with your standard project structure for better maintainability.

Testing Philosophy: Performance is part of correctness validation - integrate benchmarks with your existing test suite.

Toolkit vs Framework: Frameworks enforce rigid benches/ isolation; toolkits integrate with your existing project structure.

Β§Automatic Documentation Updates

benchkit excels at maintaining comprehensive, automatically updated documentation in your project files:

# Benchmark Results

## Algorithm Comparison

| Algorithm | Mean Time | Throughput | Relative |
|-----------|-----------|------------|----------|
| quicksort | 1.23ms    | 815 ops/s  | baseline |
| mergesort | 1.45ms    | 689 ops/s  | 1.18x    |
| heapsort  | 1.67ms    | 599 ops/s  | 1.36x    |

*Last updated: 2024-01-15 14:32:18 UTC*
*Generated by benchkit v0.4.0*

## Performance Trends

- quicksort maintains consistent performance across data sizes
- mergesort shows better cache behavior on large datasets
- heapsort provides predictable O(n log n) guarantees

## Test Configuration

- Hardware: 16-core AMD Ryzen, 32GB RAM
- Rust version: 1.75.0
- Optimization: --release
- Iterations: 1000 per benchmark

This documentation is automatically generated and updated every time you run benchmarks.

Β§Integration Examples

// βœ… In standard tests/ directory alongside unit tests
// tests/performance_comparison.rs
use benchkit::prelude::*;

#[test]
fn benchmark_algorithms()
{
  let mut suite = BenchmarkSuite::new( "Algorithm Comparison" );
  
  suite.benchmark( "quick_sort", ||
  {
    // Your quicksort implementation
  });
  
  suite.benchmark( "merge_sort", ||  
  {
    // Your mergesort implementation
  });
  
  let results = suite.run_all();
  
  // Automatically update readme.md with results
  let updater = MarkdownUpdater::new( "readme.md", "Performance" ).unwrap();
  updater.update_section( &results.generate_markdown_report().generate() ).unwrap();
}
// βœ… In examples/ directory for demonstrations
// examples/comprehensive_benchmark.rs  
use benchkit::prelude::*;

fn main()
{
  let mut comprehensive = BenchmarkSuite::new( "Comprehensive Performance Analysis" );
  
  // Add multiple benchmarks
  comprehensive.benchmark( "data_processing", || { /* code */ } );
  comprehensive.benchmark( "memory_operations", || { /* code */ } );
  comprehensive.benchmark( "io_operations", || { /* code */ } );
  
  let results = comprehensive.run_all();
  
  // Update readme.md with comprehensive report
  let report = results.generate_markdown_report();
  let updater = MarkdownUpdater::new( "readme.md", "Performance Analysis" ).unwrap();
  updater.update_section( &report.generate() ).unwrap();
  
  println!( "Updated readme.md with latest performance results" );
}

Β§πŸ”§ Feature Flag Recommendations

For optimal build performance and clean separation, put your benchmark code behind feature flags:

// βœ… In src/bin/ directory for dedicated benchmark executables  
// src/bin/comprehensive_benchmark.rs
#[ cfg( feature = "enabled" ) ]
use benchkit::prelude::*;

#[ cfg( feature = "enabled" ) ]
fn main()
{
  let mut suite = BenchmarkSuite::new( "Comprehensive Performance Suite" );
  
  suite.benchmark( "algorithm_a", || { /* implementation */ } );
  suite.benchmark( "algorithm_b", || { /* implementation */ } );
  suite.benchmark( "data_structure_ops", || { /* implementation */ } );
  
  let results = suite.run_all();
  
  // Automatically update readme.md
  let updater = MarkdownUpdater::new( "readme.md", "Latest Results" ).unwrap();
  updater.update_section( &results.generate_markdown_report().generate() ).unwrap();
  
  println!( "Benchmarks completed - readme.md updated" );
}

#[ cfg( not( feature = "enabled" ) ) ]
fn main()
{
  println!( "Run with: cargo run --bin comprehensive_benchmark --features enabled" );
  println!( "Results will be automatically saved to readme.md" );
}

Add to your Cargo.toml:

[features]
benchmark = ["benchkit"]

[dev-dependencies]
benchkit = { version = "0.1", features = ["full"], optional = true }

Run benchmarks selectively:

# Run only unit tests (fast)
cargo test

# Run specific benchmark binary (updates readme.md)
cargo run --bin comprehensive_benchmark --features enabled

# Run benchmarks from examples/
cargo run --example performance_demo --features enabled

# Run all binaries containing benchmarks
cargo run --bin performance_suite --features enabled

This approach keeps your regular builds fast while making comprehensive performance testing available when needed.

Β§πŸ“š Comprehensive Examples

benchkit includes extensive examples demonstrating every feature and usage pattern:

§🎯 Feature-Specific Examples

  • Update Chain Comprehensive: Complete demonstration of atomic documentation updates

    • Single and multi-section updates with conflict detection
    • Error handling and recovery patterns
    • Advanced conflict resolution strategies
    • Performance optimization for bulk updates
    • Full integration with validation and templates
  • Templates Comprehensive: Professional report generation in all scenarios

    • Basic and fully customized Performance Report templates
    • A/B testing with Comparison Report templates
    • Custom sections with advanced markdown formatting
    • Multiple comparison scenarios and batch processing
    • Business impact analysis and risk assessment templates
    • Comprehensive error handling for edge cases
  • Validation Comprehensive: Quality assurance for reliable benchmarking

    • Default and custom validator configurations
    • Individual warning types with detailed analysis
    • Validation report generation and interpretation
    • Reliable results filtering for analysis
    • Domain-specific validation scenarios (research, development, production, micro)
    • Full integration with templates and update chains
  • Regression Analysis Comprehensive: Complete regression analysis system demonstration

    • All baseline strategies (Fixed, Rolling Average, Previous Run)
    • Performance trend detection (Improving, Degrading, Stable)
    • Statistical significance testing with configurable thresholds
    • Professional markdown report generation with regression insights
    • Real-world optimization scenarios and configuration guidance
    • Full integration with PerformanceReport templates
  • Historical Data Management: Managing long-term performance data

    • Incremental historical data building and TimestampedResults creation
    • Data quality validation and cleanup procedures
    • Performance trend analysis across multiple time windows
    • Storage and serialization strategy recommendations
    • Data retention and archival best practices
    • Integration with RegressionAnalyzer for trend detection

Β§πŸ”§ Integration Examples

  • Integration Workflows: Real-world workflow automation

    • Development cycle: benchmark β†’ validate β†’ document β†’ commit
    • CI/CD pipeline: regression detection β†’ merge decision β†’ automated reporting
    • Multi-project coordination: impact analysis β†’ consolidated reporting β†’ team alignment
    • Production monitoring: continuous tracking β†’ alerting β†’ dashboard updates
  • Error Handling Patterns: Robust operation under adverse conditions

    • Update Chain file system errors (permissions, conflicts, recovery)
    • Template generation errors (missing data, invalid parameters)
    • Validation framework edge cases (malformed data, extreme variance)
    • System errors (resource limits, concurrent access)
    • Graceful degradation strategies with automatic fallbacks
  • Advanced Usage Patterns: Enterprise-scale benchmarking

    • Domain-specific validation criteria (real-time, interactive, batch processing)
    • Template composition and inheritance patterns
    • Coordinated multi-document updates with consistency guarantees
    • Memory-efficient large-scale processing (1000+ algorithms)
    • Performance optimization techniques (caching, concurrency, incremental processing)
  • CI/CD Regression Detection: Automated performance validation in CI/CD pipelines

    • Multi-environment validation (development, staging, production)
    • Configurable regression thresholds and statistical significance levels
    • Automated performance gate decisions with proper exit codes
    • GitHub Actions compatible reporting and documentation updates
    • Progressive validation pipeline with halt-on-failure
    • Real-world CI/CD integration patterns and best practices
  • 🚨 Cargo Bench Integration: CRITICAL - Standard cargo bench integration patterns

    • Seamless integration with Rust’s standard cargo bench command
    • Automatic documentation updates during benchmark execution
    • Standard benches/ directory structure support
    • Criterion compatibility layer for zero-migration adoption
    • CI/CD integration with standard workflows and conventions
    • Real-world project structure and configuration examples
    • This is the foundation requirement for benchkit adoption

Β§πŸš€ Running the Examples

# Feature-specific examples
cargo run --example update_chain_comprehensive --all-features
cargo run --example templates_comprehensive --all-features  
cargo run --example validation_comprehensive --all-features

# NEW: Regression Analysis Examples
cargo run --example regression_analysis_comprehensive --all-features
cargo run --example historical_data_management --all-features

# Integration examples
cargo run --example integration_workflows --all-features
cargo run --example error_handling_patterns --all-features
cargo run --example advanced_usage_patterns --all-features

# NEW: CI/CD Integration Example
cargo run --example cicd_regression_detection --all-features

# 🚨 CRITICAL: Cargo Bench Integration Example
cargo run --example cargo_bench_integration --all-features

# Original enhanced features demo
cargo run --example enhanced_features_demo --all-features

Each example is fully documented with detailed explanations and demonstrates production-ready patterns you can adapt to your specific needs.

Β§Installation

Add benchkit to your [dev-dependencies] in Cargo.toml.

[dev-dependencies]
# For core functionality
benchkit = "0.1"

# Or enable all features for the full toolkit
benchkit = { version = "0.1", features = [ "full" ] }

Β§πŸ“‹ Development Guidelines & Best Practices

⚠️ IMPORTANT: Before using benchkit in production or contributing to development, strongly review the comprehensive recommendations.md file. This document contains essential requirements, best practices, and lessons learned from real-world performance analysis work.

The recommendations cover:

  • βœ… Core philosophy and toolkit vs framework principles
  • βœ… Technical architecture requirements and feature organization
  • βœ… Performance analysis best practices with standardized data patterns
  • βœ… Documentation integration requirements for automated reporting
  • βœ… Statistical analysis requirements for reliable measurements

πŸ“– Read recommendations.md first - it will save you time and ensure you’re following proven patterns.

Β§Contributing

Contributions are welcome! benchkit aims to be a community-driven toolkit that solves real-world benchmarking problems.

Before contributing:

  1. πŸ“– Read recommendations.md - Contains all development requirements and design principles
  2. Review open tasks in the task/ directory
  3. Check our contribution guidelines

All contributions must align with the principles and requirements outlined in recommendations.md.

Β§License

This project is licensed under the MIT License.

Β§Performance

This section is automatically updated by benchkit when you run benchmarks.

ModulesΒ§

analysis
Analysis tools for benchmark results
comparison
Framework and algorithm comparison utilities
data_generation
Advanced data generation utilities for benchmarking
diff
Git-style diff functionality for benchmark results
documentation
Documentation integration and auto-update utilities
generators
Data generators for benchmarking
measurement
Core measurement and timing functionality
memory_tracking
Memory allocation tracking and analysis for benchmarks
parser_analysis
Parser-specific analysis utilities
parser_data_generation
Parser-specific data generation utilities
plotting
Visualization and plotting utilities for benchmark results
prelude
Prelude module for convenient imports
profiling
Memory allocation and performance profiling tools
reporting
Report generation and markdown integration
scaling
Scaling analysis tools for performance testing
statistical
Research-grade statistical analysis for benchmark results
suite
Benchmark suite management
templates
Template system for consistent documentation formatting
throughput
Throughput calculation and analysis utilities
update_chain
Safe Update Chain Pattern for coordinated markdown section updates
validation
Benchmark validation and quality assessment framework

MacrosΒ§

bench_block
Measure a block of code (convenience macro)