benchkit
Practical, Documentation-First Benchmarking for Rust.
benchkit is a lightweight toolkit for performance analysis, born from the hard-learned lessons of optimizing high-performance libraries. It rejects rigid, all-or-nothing frameworks in favor of flexible, composable tools that integrate seamlessly into your existing workflow.
🎯 NEW TO benchkit? Start with
usage.md- Mandatory standards and requirements from production systems.
The Benchmarking Dilemma
In Rust, developers often face a frustrating choice:
- The Heavy Framework (
criterion): Statistically powerful, but forces a rigid structure (benches/), complex setup, and produces reports that are difficult to integrate into your project's documentation. You must adapt your project to the framework. - The Manual Approach (
std::time): Simple to start, but statistically naive. It leads to boilerplate, inconsistent measurements, and conclusions that are easily skewed by system noise.
benchkit offers a third way.
📋 Important: For production use and development contributions, see
usage.md- mandatory standards with proven patterns, requirements, and compliance standards from production systems.
A Toolkit, Not a Framework
This is the core philosophy of benchkit. It doesn't impose a workflow; it provides a set of professional, composable tools that you can use however you see fit.
- ✅ Integrate Anywhere: Write benchmarks in your test files, examples, or binaries. No required directory structure.
- ✅ Documentation-First: Treat performance reports as a first-class part of your documentation, with tools to automatically keep them in sync with your code.
- ✅ Practical Focus: Surface the key metrics needed for optimization decisions, hiding deep statistical complexity until you ask for it.
- ✅ Zero Setup: Start measuring performance in minutes with a simple, intuitive API.
🚀 Quick Start: Compare, Analyze, and Document
📖 First time? Review usage.md for mandatory compliance standards and development requirements.
This example demonstrates the core benchkit workflow: comparing two algorithms and automatically updating a performance section in your readme.md.
1. Add to dev-dependencies in Cargo.toml:
[]
= { = "0.8.0", = [ "full" ] }
2. Create a benchmark in your benches directory:
// In benches/performance_demo.rs
use *;
3. Run your benchmark and watch readme.md update automatically:
🧰 What's in the Toolkit?
benchkit provides a suite of composable tools. Use only what you need.
🆕 Enhanced Features
Advanced performance regression detection with statistical analysis and trend identification.
use *;
use HashMap;
use ;
Key Features:
- Three Baseline Strategies: Fixed baseline, rolling average, and previous run comparison
- Statistical Significance: Configurable thresholds with proper statistical testing
- Trend Detection: Automatic identification of improving, degrading, or stable performance
- Professional Reports: Publication-quality markdown with statistical analysis
- CI/CD Integration: Automated regression detection for deployment pipelines
- Historical Data Management: Long-term performance tracking with quality validation
Use Cases:
- Automated performance regression detection in CI/CD pipelines
- Long-term performance monitoring and trend analysis
- Code optimization validation with statistical confidence
- Production deployment gates with zero-regression tolerance
- Performance documentation with automated updates
Coordinate multiple markdown section updates atomically - either all succeed or none are modified.
use *;
Key Features:
- Atomic Operations: Either all sections update successfully or none are modified
- Conflict Detection: Validates all sections exist and are unambiguous before any changes
- Automatic Rollback: Failed operations restore original file state
- Reduced I/O: Single read and write operation instead of multiple file accesses
- Error Recovery: Comprehensive error handling with detailed diagnostics
Use Cases:
- Multi-section benchmark reports that must stay synchronized
- CI/CD pipelines requiring consistent documentation updates
- Coordinated updates across large documentation projects
- Production deployments where partial updates would be problematic
Advanced Example:
use *;
Generate standardized, publication-quality reports with full statistical analysis and customizable sections.
use *;
use HashMap;
Performance Report Features:
- Executive Summary: Key metrics and performance indicators
- Statistical Analysis: Confidence intervals, coefficient of variation, reliability assessment
- Performance Tables: Sorted results with throughput, latency, and quality indicators
- Custom Sections: Domain-specific analysis and recommendations
- Professional Formatting: Publication-ready markdown with proper statistical notation
Comparison Report Features:
- Significance Testing: Both statistical and practical significance analysis
- Confidence Intervals: 95% CI analysis with overlap detection
- Performance Ratios: Clear improvement/regression percentages
- Reliability Assessment: Quality validation for both baseline and candidate
- Decision Support: Clear recommendations based on statistical analysis
Advanced Template Composition:
use *;
Comprehensive quality assessment system with configurable criteria and automatic reliability analysis.
use *;
use HashMap;
Validation Criteria:
- Sample Size: Ensure sufficient measurements for statistical power
- Variability: Detect high coefficient of variation indicating noise
- Measurement Duration: Flag measurements that may be timing-resolution limited
- Performance Range: Identify outliers and wide performance distributions
- Warm-up Detection: Verify proper system warm-up for consistent results
Warning Types:
InsufficientSamples: Too few measurements for reliable statisticsHighVariability: Coefficient of variation exceeds thresholdShortMeasurementTime: Measurements may be affected by timer resolutionWidePerformanceRange: Large ratio between fastest/slowest measurementsNoWarmup: Missing warm-up period may indicate measurement issues
Domain-Specific Validation:
use *;
use HashMap;
Quality Reporting:
use *;
use HashMap;
Comprehensive examples demonstrating real-world usage patterns and advanced integration scenarios.
Development Workflow Integration:
use *;
// Complete development cycle: benchmark → validate → document → commit
CI/CD Pipeline Integration:
use *;
use HashMap;
// Automated performance regression detection
Multi-Project Coordination:
use *;
use HashMap;
// Coordinate benchmark updates across multiple related projects
// Helper functions for the example
At its heart, benchkit provides simple and accurate measurement primitives.
use *;
// A robust measurement with multiple iterations and statistical cleanup.
let result = bench_function
;
println!;
println!;
// Track memory usage patterns alongside timing.
let memory_benchmark = new;
let = memory_benchmark.run_with_tracking
;
println!;
Turn raw numbers into actionable insights.
use *;
// Compare multiple implementations to find the best one.
let report = new
.algorithm
.algorithm
.run;
if let Some = report.fastest
// Example benchmark results
let result_a = bench_function;
let result_b = bench_function;
// Compare two benchmark results
let comparison = result_a.compare;
if comparison.is_improvement
Stop writing boilerplate to create test data. benchkit provides generators for common scenarios.
use *;
// Generate a comma-separated list of 100 items.
let list_data = generate_list_data;
// Generate realistic unilang command strings for parser benchmarking.
let command_generator = new
.complexity;
let commands = command_generator.generate_unilang_commands;
// Create reproducible data with a specific seed.
let mut seeded_gen = new;
let random_data = seeded_gen.random_string;
The "documentation-first" philosophy is enabled by powerful report generation and file updating tools.
use *;
The benchkit Workflow
benchkit is designed to make performance analysis a natural part of your development cycle.
[ 1. Write Code ] -> [ 2. Add Benchmark in `benches/` ] -> [ 3. Run `cargo run --bin` ]
^ |
| v
[ 5. Commit Code + Perf Docs ] <- [ 4. Auto-Update `benchmark_results.md` ] <- [ Analyze Results ]
📁 MANDATORY benches/ Directory - NO ALTERNATIVES
ABSOLUTE REQUIREMENT: ALL benchmark-related files MUST be in the benches/ directory. This is NON-NEGOTIABLE for proper benchkit functionality:
- 🚫 NEVER in
tests/: Benchmarks are NOT tests and MUST NOT be mixed with unit tests - 🚫 NEVER in
examples/: Examples are demonstrations, NOT performance measurements - 🚫 NEVER in
src/bin/: Source binaries are NOT benchmarks - ✅ ONLY in
benches/: This is the EXCLUSIVE location for ALL benchmark content
Why This Requirement Exists:
- ⚡ Cargo Requirement:
cargo benchONLY works withbenches/directory - 🏗️ Ecosystem Standard: ALL professional Rust projects use
benches/EXCLUSIVELY - 🔧 Tool Compatibility: IDEs, CI systems, linters expect benchmarks ONLY in
benches/ - 📊 Performance Isolation: Benchmarks require different compilation and execution than tests
Why This Matters
Ecosystem Integration: The benches/ directory is the official Rust standard, ensuring compatibility with the entire Rust toolchain.
Zero Configuration: cargo bench automatically discovers and runs benchmarks in the benches/ directory without additional setup.
Community Expectations: Developers expect to find benchmarks in benches/ - this follows the principle of least surprise.
Tool Compatibility: All Rust tooling (IDEs, CI/CD, linters) is designed around the standard benches/ structure.
Automatic Documentation Updates
benchkit excels at maintaining comprehensive, automatically updated documentation in your project files:
*Last updated: 2024-01-15 14:32:18 UTC*
*Generated by benchkit v0.4.0*
- --
- ---
This documentation is automatically generated and updated every time you run benchmarks.
Integration Examples
// ✅ In standard tests/ directory alongside unit tests
// tests/performance_comparison.rs
use *;
// ✅ In examples/ directory for demonstrations
// examples/comprehensive_benchmark.rs
use *;
🔧 Feature Flag Recommendations
For optimal build performance and clean separation, put your benchmark code behind feature flags:
// ✅ In src/bin/ directory for dedicated benchmark executables
// src/bin/comprehensive_benchmark.rs
use *;
Add to your Cargo.toml:
[]
= ["benchkit"]
[]
= { = "0.8.0", = ["full"], = true }
Run benchmarks selectively:
# Run only unit tests (fast)
# Run specific benchmark binary (updates readme.md)
# Run benchmarks from examples/
# Run all binaries containing benchmarks
This approach keeps your regular builds fast while making comprehensive performance testing available when needed.
📚 Comprehensive Examples
benchkit includes extensive examples demonstrating every feature and usage pattern:
🎯 Feature-Specific Examples
-
Update Chain Comprehensive: Complete demonstration of atomic documentation updates
- Single and multi-section updates with conflict detection
- Error handling and recovery patterns
- Advanced conflict resolution strategies
- Performance optimization for bulk updates
- Full integration with validation and templates
-
Templates Comprehensive: Professional report generation in all scenarios
- Basic and fully customized Performance Report templates
- A/B testing with Comparison Report templates
- Custom sections with advanced markdown formatting
- Multiple comparison scenarios and batch processing
- Business impact analysis and risk assessment templates
- Comprehensive error handling for edge cases
-
Validation Comprehensive: Quality assurance for reliable benchmarking
- Default and custom validator configurations
- Individual warning types with detailed analysis
- Validation report generation and interpretation
- Reliable results filtering for analysis
- Domain-specific validation scenarios (research, development, production, micro)
- Full integration with templates and update chains
-
Regression Analysis Comprehensive: Complete regression analysis system demonstration
- All baseline strategies (Fixed, Rolling Average, Previous Run)
- Performance trend detection (Improving, Degrading, Stable)
- Statistical significance testing with configurable thresholds
- Professional markdown report generation with regression insights
- Real-world optimization scenarios and configuration guidance
- Full integration with PerformanceReport templates
-
Historical Data Management: Managing long-term performance data
- Incremental historical data building and TimestampedResults creation
- Data quality validation and cleanup procedures
- Performance trend analysis across multiple time windows
- Storage and serialization strategy recommendations
- Data retention and archival best practices
- Integration with RegressionAnalyzer for trend detection
🔧 Integration Examples
-
Integration Workflows: Real-world workflow automation
- Development cycle: benchmark → validate → document → commit
- CI/CD pipeline: regression detection → merge decision → automated reporting
- Multi-project coordination: impact analysis → consolidated reporting → team alignment
- Production monitoring: continuous tracking → alerting → dashboard updates
-
Error Handling Patterns: Robust operation under adverse conditions
- Update Chain file system errors (permissions, conflicts, recovery)
- Template generation errors (missing data, invalid parameters)
- Validation framework edge cases (malformed data, extreme variance)
- System errors (resource limits, concurrent access)
- Graceful degradation strategies with automatic fallbacks
-
Advanced Usage Patterns: Enterprise-scale benchmarking
- Domain-specific validation criteria (real-time, interactive, batch processing)
- Template composition and inheritance patterns
- Coordinated multi-document updates with consistency guarantees
- Memory-efficient large-scale processing (1000+ algorithms)
- Performance optimization techniques (caching, concurrency, incremental processing)
-
CI/CD Regression Detection: Automated performance validation in CI/CD pipelines
- Multi-environment validation (development, staging, production)
- Configurable regression thresholds and statistical significance levels
- Automated performance gate decisions with proper exit codes
- GitHub Actions compatible reporting and documentation updates
- Progressive validation pipeline with halt-on-failure
- Real-world CI/CD integration patterns and best practices
-
🚨 Cargo Bench Integration: CRITICAL - Standard
cargo benchintegration patterns- Seamless integration with Rust's standard
cargo benchcommand - Automatic documentation updates during benchmark execution
- Standard
benches/directory structure support - Criterion compatibility layer for zero-migration adoption
- CI/CD integration with standard workflows and conventions
- Real-world project structure and configuration examples
- This is the foundation requirement for benchkit adoption
- Seamless integration with Rust's standard
🚀 Running the Examples
# Feature-specific examples
# NEW: Regression Analysis Examples
# Integration examples
# NEW: CI/CD Integration Example
# 🚨 CRITICAL: Cargo Bench Integration Example
# Original enhanced features demo
Each example is fully documented with detailed explanations and demonstrates production-ready patterns you can adapt to your specific needs.
Installation
Add benchkit to your [dev-dependencies] in Cargo.toml.
[]
# For core functionality
= "0.1"
# Or enable all features for the full toolkit
= { = "0.8.0", = [ "full" ] }
📋 Development Guidelines & Best Practices
⚠️ IMPORTANT: Before using benchkit in production or contributing to development, strongly review the comprehensive usage.md file. This document contains essential requirements, best practices, and lessons learned from real-world performance analysis work.
The recommendations cover:
- ✅ Core philosophy and toolkit vs framework principles
- ✅ Technical architecture requirements and feature organization
- ✅ Performance analysis best practices with standardized data patterns
- ✅ Documentation integration requirements for automated reporting
- ✅ Statistical analysis requirements for reliable measurements
📖 Read usage.md first - it will save you time and ensure you're following proven patterns.
Contributing
Contributions are welcome! benchkit aims to be a community-driven toolkit that solves real-world benchmarking problems.
Before contributing:
- 📖 Read
usage.md- Contains all development requirements and design principles - Review open tasks in the
task/directory - Check our contribution guidelines
All contributions must align with the principles and requirements outlined in usage.md.
License
This project is licensed under the MIT License.
Performance
This section is automatically updated by benchkit when you run benchmarks.