benchkit
Practical, Documentation-First Benchmarking for Rust.
benchkit
is a lightweight toolkit for performance analysis, born from the hard-learned lessons of optimizing high-performance libraries. It rejects rigid, all-or-nothing frameworks in favor of flexible, composable tools that integrate seamlessly into your existing workflow.
🎯 NEW TO benchkit? Start with
recommendations.md
- Essential guidelines from real-world performance optimization experience.
The Benchmarking Dilemma
In Rust, developers often face a frustrating choice:
- The Heavy Framework (
criterion
): Statistically powerful, but forces a rigid structure (benches/
), complex setup, and produces reports that are difficult to integrate into your project's documentation. You must adapt your project to the framework. - The Manual Approach (
std::time
): Simple to start, but statistically naive. It leads to boilerplate, inconsistent measurements, and conclusions that are easily skewed by system noise.
benchkit
offers a third way.
📋 Important: For production use and development contributions, see
recommendations.md
- a comprehensive guide with proven patterns, requirements, and best practices from real-world benchmarking experience.
A Toolkit, Not a Framework
This is the core philosophy of benchkit
. It doesn't impose a workflow; it provides a set of professional, composable tools that you can use however you see fit.
- ✅ Integrate Anywhere: Write benchmarks in your test files, examples, or binaries. No required directory structure.
- ✅ Documentation-First: Treat performance reports as a first-class part of your documentation, with tools to automatically keep them in sync with your code.
- ✅ Practical Focus: Surface the key metrics needed for optimization decisions, hiding deep statistical complexity until you ask for it.
- ✅ Zero Setup: Start measuring performance in minutes with a simple, intuitive API.
🚀 Quick Start: Compare, Analyze, and Document
📖 First time? Review recommendations.md
for comprehensive best practices and development guidelines.
This example demonstrates the core benchkit
workflow: comparing two algorithms and automatically updating a performance section in your readme.md
.
1. Add to dev-dependencies
in Cargo.toml
:
[]
= { = "0.1", = [ "full" ] }
2. Create a benchmark in your benches
directory:
// In benches/performance_demo.rs
use *;
3. Run your benchmark and watch readme.md update automatically:
🧰 What's in the Toolkit?
benchkit
provides a suite of composable tools. Use only what you need.
🆕 Enhanced Features
Advanced performance regression detection with statistical analysis and trend identification.
use *;
use HashMap;
use ;
Key Features:
- Three Baseline Strategies: Fixed baseline, rolling average, and previous run comparison
- Statistical Significance: Configurable thresholds with proper statistical testing
- Trend Detection: Automatic identification of improving, degrading, or stable performance
- Professional Reports: Publication-quality markdown with statistical analysis
- CI/CD Integration: Automated regression detection for deployment pipelines
- Historical Data Management: Long-term performance tracking with quality validation
Use Cases:
- Automated performance regression detection in CI/CD pipelines
- Long-term performance monitoring and trend analysis
- Code optimization validation with statistical confidence
- Production deployment gates with zero-regression tolerance
- Performance documentation with automated updates
Coordinate multiple markdown section updates atomically - either all succeed or none are modified.
use *;
Key Features:
- Atomic Operations: Either all sections update successfully or none are modified
- Conflict Detection: Validates all sections exist and are unambiguous before any changes
- Automatic Rollback: Failed operations restore original file state
- Reduced I/O: Single read and write operation instead of multiple file accesses
- Error Recovery: Comprehensive error handling with detailed diagnostics
Use Cases:
- Multi-section benchmark reports that must stay synchronized
- CI/CD pipelines requiring consistent documentation updates
- Coordinated updates across large documentation projects
- Production deployments where partial updates would be problematic
Advanced Example:
use *;
Generate standardized, publication-quality reports with full statistical analysis and customizable sections.
use *;
use HashMap;
Performance Report Features:
- Executive Summary: Key metrics and performance indicators
- Statistical Analysis: Confidence intervals, coefficient of variation, reliability assessment
- Performance Tables: Sorted results with throughput, latency, and quality indicators
- Custom Sections: Domain-specific analysis and recommendations
- Professional Formatting: Publication-ready markdown with proper statistical notation
Comparison Report Features:
- Significance Testing: Both statistical and practical significance analysis
- Confidence Intervals: 95% CI analysis with overlap detection
- Performance Ratios: Clear improvement/regression percentages
- Reliability Assessment: Quality validation for both baseline and candidate
- Decision Support: Clear recommendations based on statistical analysis
Advanced Template Composition:
use *;
Comprehensive quality assessment system with configurable criteria and automatic reliability analysis.
use *;
use HashMap;
Validation Criteria:
- Sample Size: Ensure sufficient measurements for statistical power
- Variability: Detect high coefficient of variation indicating noise
- Measurement Duration: Flag measurements that may be timing-resolution limited
- Performance Range: Identify outliers and wide performance distributions
- Warm-up Detection: Verify proper system warm-up for consistent results
Warning Types:
InsufficientSamples
: Too few measurements for reliable statisticsHighVariability
: Coefficient of variation exceeds thresholdShortMeasurementTime
: Measurements may be affected by timer resolutionWidePerformanceRange
: Large ratio between fastest/slowest measurementsNoWarmup
: Missing warm-up period may indicate measurement issues
Domain-Specific Validation:
use *;
use HashMap;
Quality Reporting:
use *;
use HashMap;
Comprehensive examples demonstrating real-world usage patterns and advanced integration scenarios.
Development Workflow Integration:
use *;
// Complete development cycle: benchmark → validate → document → commit
CI/CD Pipeline Integration:
use *;
use HashMap;
// Automated performance regression detection
Multi-Project Coordination:
use *;
use HashMap;
// Coordinate benchmark updates across multiple related projects
// Helper functions for the example
At its heart, benchkit
provides simple and accurate measurement primitives.
use *;
// A robust measurement with multiple iterations and statistical cleanup.
let result = bench_function
;
println!;
println!;
// Track memory usage patterns alongside timing.
let memory_benchmark = new;
let = memory_benchmark.run_with_tracking
;
println!;
Turn raw numbers into actionable insights.
use *;
// Compare multiple implementations to find the best one.
let report = new
.algorithm
.algorithm
.run;
if let Some = report.fastest
// Example benchmark results
let result_a = bench_function;
let result_b = bench_function;
// Compare two benchmark results
let comparison = result_a.compare;
if comparison.is_improvement
Stop writing boilerplate to create test data. benchkit
provides generators for common scenarios.
use *;
// Generate a comma-separated list of 100 items.
let list_data = generate_list_data;
// Generate realistic unilang command strings for parser benchmarking.
let command_generator = new
.complexity;
let commands = command_generator.generate_unilang_commands;
// Create reproducible data with a specific seed.
let mut seeded_gen = new;
let random_data = seeded_gen.random_string;
The "documentation-first" philosophy is enabled by powerful report generation and file updating tools.
use *;
The benchkit
Workflow
benchkit
is designed to make performance analysis a natural part of your development cycle.
[ 1. Write Code ] -> [ 2. Add Benchmark in `benches/` ] -> [ 3. Run `cargo run --bin` ]
^ |
| v
[ 5. Commit Code + Perf Docs ] <- [ 4. Auto-Update `benchmark_results.md` ] <- [ Analyze Results ]
📁 Why Not benches/
? Standard Directory Integration
The traditional benches/
directory creates artificial separation between ALL your benchmark content and the standard Rust project structure. benchkit
encourages you to use standard directories for ALL benchmark-related files:
- ✅ Use
tests/
: Performance benchmarks alongside unit tests - ✅ Use
examples/
: Demonstration benchmarks and showcases - ✅ Use
src/bin/
: Dedicated benchmark executables - ✅ Standard integration: Keep ALL benchmark content in standard Rust directories
- ❌ Avoid
benches/
: Don't isolate ANY benchmark files in framework-specific directories
Why This Matters
Workflow Integration: ALL benchmark content should be part of regular development, not isolated in framework-specific directories.
Documentation Proximity: ALL benchmark files are documentation - keep them integrated with your standard project structure for better maintainability.
Testing Philosophy: Performance is part of correctness validation - integrate benchmarks with your existing test suite.
Toolkit vs Framework: Frameworks enforce rigid benches/
isolation; toolkits integrate with your existing project structure.
Automatic Documentation Updates
benchkit
excels at maintaining comprehensive, automatically updated documentation in your project files:
- --
- ---
*Last updated: 2024-01-15 14:32:18 UTC*
*Generated by benchkit v0.4.0*
This documentation is automatically generated and updated every time you run benchmarks.
Integration Examples
// ✅ In standard tests/ directory alongside unit tests
// tests/performance_comparison.rs
use *;
// ✅ In examples/ directory for demonstrations
// examples/comprehensive_benchmark.rs
use *;
🔧 Feature Flag Recommendations
For optimal build performance and clean separation, put your benchmark code behind feature flags:
// ✅ In src/bin/ directory for dedicated benchmark executables
// src/bin/comprehensive_benchmark.rs
use *;
Add to your Cargo.toml
:
[]
= ["benchkit"]
[]
= { = "0.1", = ["full"], = true }
Run benchmarks selectively:
# Run only unit tests (fast)
# Run specific benchmark binary (updates readme.md)
# Run benchmarks from examples/
# Run all binaries containing benchmarks
This approach keeps your regular builds fast while making comprehensive performance testing available when needed.
📚 Comprehensive Examples
benchkit
includes extensive examples demonstrating every feature and usage pattern:
🎯 Feature-Specific Examples
-
Update Chain Comprehensive: Complete demonstration of atomic documentation updates
- Single and multi-section updates with conflict detection
- Error handling and recovery patterns
- Advanced conflict resolution strategies
- Performance optimization for bulk updates
- Full integration with validation and templates
-
Templates Comprehensive: Professional report generation in all scenarios
- Basic and fully customized Performance Report templates
- A/B testing with Comparison Report templates
- Custom sections with advanced markdown formatting
- Multiple comparison scenarios and batch processing
- Business impact analysis and risk assessment templates
- Comprehensive error handling for edge cases
-
Validation Comprehensive: Quality assurance for reliable benchmarking
- Default and custom validator configurations
- Individual warning types with detailed analysis
- Validation report generation and interpretation
- Reliable results filtering for analysis
- Domain-specific validation scenarios (research, development, production, micro)
- Full integration with templates and update chains
-
Regression Analysis Comprehensive: Complete regression analysis system demonstration
- All baseline strategies (Fixed, Rolling Average, Previous Run)
- Performance trend detection (Improving, Degrading, Stable)
- Statistical significance testing with configurable thresholds
- Professional markdown report generation with regression insights
- Real-world optimization scenarios and configuration guidance
- Full integration with PerformanceReport templates
-
Historical Data Management: Managing long-term performance data
- Incremental historical data building and TimestampedResults creation
- Data quality validation and cleanup procedures
- Performance trend analysis across multiple time windows
- Storage and serialization strategy recommendations
- Data retention and archival best practices
- Integration with RegressionAnalyzer for trend detection
🔧 Integration Examples
-
Integration Workflows: Real-world workflow automation
- Development cycle: benchmark → validate → document → commit
- CI/CD pipeline: regression detection → merge decision → automated reporting
- Multi-project coordination: impact analysis → consolidated reporting → team alignment
- Production monitoring: continuous tracking → alerting → dashboard updates
-
Error Handling Patterns: Robust operation under adverse conditions
- Update Chain file system errors (permissions, conflicts, recovery)
- Template generation errors (missing data, invalid parameters)
- Validation framework edge cases (malformed data, extreme variance)
- System errors (resource limits, concurrent access)
- Graceful degradation strategies with automatic fallbacks
-
Advanced Usage Patterns: Enterprise-scale benchmarking
- Domain-specific validation criteria (real-time, interactive, batch processing)
- Template composition and inheritance patterns
- Coordinated multi-document updates with consistency guarantees
- Memory-efficient large-scale processing (1000+ algorithms)
- Performance optimization techniques (caching, concurrency, incremental processing)
-
CI/CD Regression Detection: Automated performance validation in CI/CD pipelines
- Multi-environment validation (development, staging, production)
- Configurable regression thresholds and statistical significance levels
- Automated performance gate decisions with proper exit codes
- GitHub Actions compatible reporting and documentation updates
- Progressive validation pipeline with halt-on-failure
- Real-world CI/CD integration patterns and best practices
-
🚨 Cargo Bench Integration: CRITICAL - Standard
cargo bench
integration patterns- Seamless integration with Rust's standard
cargo bench
command - Automatic documentation updates during benchmark execution
- Standard
benches/
directory structure support - Criterion compatibility layer for zero-migration adoption
- CI/CD integration with standard workflows and conventions
- Real-world project structure and configuration examples
- This is the foundation requirement for benchkit adoption
- Seamless integration with Rust's standard
🚀 Running the Examples
# Feature-specific examples
# NEW: Regression Analysis Examples
# Integration examples
# NEW: CI/CD Integration Example
# 🚨 CRITICAL: Cargo Bench Integration Example
# Original enhanced features demo
Each example is fully documented with detailed explanations and demonstrates production-ready patterns you can adapt to your specific needs.
Installation
Add benchkit
to your [dev-dependencies]
in Cargo.toml
.
[]
# For core functionality
= "0.1"
# Or enable all features for the full toolkit
= { = "0.1", = [ "full" ] }
📋 Development Guidelines & Best Practices
⚠️ IMPORTANT: Before using benchkit in production or contributing to development, strongly review the comprehensive recommendations.md
file. This document contains essential requirements, best practices, and lessons learned from real-world performance analysis work.
The recommendations cover:
- ✅ Core philosophy and toolkit vs framework principles
- ✅ Technical architecture requirements and feature organization
- ✅ Performance analysis best practices with standardized data patterns
- ✅ Documentation integration requirements for automated reporting
- ✅ Statistical analysis requirements for reliable measurements
📖 Read recommendations.md
first - it will save you time and ensure you're following proven patterns.
Contributing
Contributions are welcome! benchkit
aims to be a community-driven toolkit that solves real-world benchmarking problems.
Before contributing:
- 📖 Read
recommendations.md
- Contains all development requirements and design principles - Review open tasks in the
task/
directory - Check our contribution guidelines
All contributions must align with the principles and requirements outlined in recommendations.md
.
License
This project is licensed under the MIT License.
Performance
This section is automatically updated by benchkit when you run benchmarks.