Vault Audit Tools
High-performance command-line tools for analyzing HashiCorp Vault audit logs, written in Rust.
Features
- Fast: 3x faster than equivalent implementations (~17s vs 60s for 4M line logs)
- Parallel Processing: Automatically processes multiple files concurrently using all available CPU cores
- Memory Efficient: 10x less memory usage through streaming parser
- Compressed File Support: Direct analysis of
.gzand.zstfiles without manual decompression - Multi-File Support: Analyze weeks/months of logs without manual concatenation
- Comprehensive: 16 specialized analysis commands for different use cases
- Production Ready: Tested on 100GB+ multi-day production audit logs
- Shell Completion: Tab completion support for bash, zsh, fish, powershell, and elvish
Installation
From Source
This installs the vault-audit binary to ~/.cargo/bin/.
Pre-built Binaries
Download from the Releases page.
Shell Completion
After installation, enable tab completion for your shell:
Linux/macOS
# Bash (Linux) - single command
&& \
| && \
# Bash (macOS with Homebrew) - single command
&& \
&& \
# Zsh - single command
&& \
&& \
|| && \
|| && \
# Fish - single command
&& \
&& \
# PowerShell (Windows/Cross-platform) - single command
; | ; | ;
Windows (Git Bash)
Git Bash users need special handling since ~ doesn't expand in output redirection:
# Single command installation for Git Bash
&& \
&& \
|| && \
Troubleshooting:
- Use
$HOMEvariable instead of~for paths in Git Bash - If completions don't work immediately, open a new terminal window
- Verify the completion file exists:
ls -la "$HOME/.bash_completions/vault-audit" - Check your shell rc file sources it:
grep vault-audit ~/.bashrc
Commands
System Analysis
system-overview- High-level overview of all operations, entities, and auth methods (parallel processing)entity-gaps- Identify operations without entity IDs (no-entity operations) (parallel processing)path-hotspots- Find most accessed paths with optimization recommendations (parallel processing)
Authentication Analysis
k8s-auth- Analyze Kubernetes/OpenShift authentication patterns and entity churn (parallel processing)token-analysis- Unified token operations analysis with abuse detection and CSV export (parallel processing)- Track token lifecycle operations (create, renew, revoke, lookup)
- Detect excessive token lookup patterns
- Export per-accessor detail to CSV
Entity Analysis
entity-analysis- Unified entity lifecycle analysis (recommended)churn- Multi-day entity lifecycle tracking with ephemeral detectioncreation- Entity creation patterns by authentication pathpreprocess- Extract entity mappings (auto-generated by default)gaps- Detect activity gapstimeline- Individual entity operation timeline- Key improvement: Auto-preprocessing eliminates multi-step workflows!
Vault API Integration
client-activity- Query Vault for client activity metrics by mountentity-list- Export complete entity list from Vault (for baseline analysis)
KV Secrets Analysis
kv-analysis- Unified KV secrets analysis (recommended)analyze- Analyze KV usage by path and entity (generates CSV) (parallel processing)compare- Compare KV usage between two time periods (CSV comparison)summary- Summarize KV secret usage from CSV exports (CSV analysis)
kv-analyzer- ⚠️ DEPRECATED: Usekv-analysis analyzeinsteadkv-compare- ⚠️ DEPRECATED: Usekv-analysis compareinsteadkv-summary- ⚠️ DEPRECATED: Usekv-analysis summaryinstead
Documentation
API Documentation
View the full API documentation with detailed module and function descriptions:
# Generate and open documentation in your browser
The documentation includes:
- Comprehensive crate overview and architecture
- Module-level documentation for all components
- Function-level documentation with examples
- Type definitions and their usage
Once published to crates.io, the documentation will be automatically available at docs.rs/vault-audit-tools.
Command Help
Get detailed help for any command:
# General help
# Unified command help
# Subcommand-specific help
Application-Specific
airflow-polling- Analyze Airflow secret polling patterns with burst rate detection (parallel processing)
Utilities
generate-completion- Generate shell completion scripts
Usage Examples
Compressed File Support
All commands automatically detect and decompress .gz (gzip) and .zst (zstandard) files:
# Analyze compressed files directly - no manual decompression needed
# Mix compressed and uncompressed files
# Glob patterns work with compressed files
# Streaming decompression - no temp files, no extra disk space needed
Performance: Compressed file processing maintains full speed (~57 MB/s) with no memory overhead thanks to streaming decompression.
Understanding Entities vs Token Accessors
When analyzing token operations, it's important to understand the difference between entities and accessors:
Entity (User/Service Identity):
- A single identity like "fg-PIOP0SRVDEVOPS" or "approle"
- Can have multiple tokens (accessors) over time
- Summary view shows aggregated totals per entity
- Example: One service might have 233,668 total operations
Accessor (Individual Token):
- A unique token identifier for a single token
- Each accessor belongs to one entity
- Tokens get rotated/recreated, creating new accessors
- Example: That same service's 233k operations might be spread across 3 tokens:
- Token 1: 113,028 operations (10/06 07:26 - 10/07 07:41, 24.3h lifespan)
- Token 2: 79,280 operations (10/06 07:26 - 10/07 07:40, 24.2h lifespan)
- Token 3: 41,360 operations (10/06 07:28 - 10/07 07:40, 24.2h lifespan)
When to use each view:
- Summary mode (default): Shows per-entity totals for understanding overall usage patterns
- CSV export (
--export): Shows per-accessor detail for token lifecycle analysis, rotation patterns, and identifying specific problematic tokens
# See entity-level summary (6,091 entities with totals)
# Export accessor-level detail (907 individual tokens with timestamps)
# Filter to high-volume tokens only
Quick Analysis
# Get system overview (works with plain or compressed files)
# Analyze multiple days without concatenation
# Find authentication issues
# Detect token abuse across multiple compressed files
Multi-File Long-Term Analysis
All audit log commands support multiple files (compressed or uncompressed) for historical analysis:
# Week-long system overview with compressed files
# Month-long entity churn tracking (auto-preprocesses entity mappings)
# Multi-day token operations analysis with mixed file types
# Path hotspot analysis across 30 days of compressed logs
Parallel Processing
Commands automatically use parallel processing when analyzing multiple files:
# Single file - uses sequential processing
# Multiple files - automatically parallelizes across all CPU cores
# Glob expansion with many files - maximizes CPU utilization
Commands with Parallel Processing:
system-overview- System-wide audit analysisentity-analysis gaps- Operations without entity IDsentity-gaps- Operations without entity IDs (deprecated, use entity-analysis)path-hotspots- Most accessed pathsk8s-auth- Kubernetes authentication analysisairflow-polling- Airflow polling pattern detectionkv-analysis analyze- KV secrets usage analysistoken-analysis- Token operations analysis
How it works:
- Automatically detects when multiple files are provided
- Processes files concurrently using all available CPU cores
- Uses streaming approach to maintain low memory usage
- Combines results correctly with proper aggregation
- Provides accurate progress tracking across all files
Performance benefits:
- Near-linear speedup with number of CPU cores
- 8-core system: ~7x faster on 8+ files
- Real-world improvements: 40% faster for KV analysis, 7x for system overview
- Memory efficient: 2x memory overhead for significant speed gains
- No configuration needed - works automatically
- Falls back to sequential processing for single files
Deep Dive Analysis
# Analyze entity creation patterns by auth path (auto-preprocessing enabled)
# Track entity lifecycle across multiple days (auto-preprocessing enabled)
# Analyze specific entity behavior
# Detect activity gaps (potential security issues)
# Token analysis with multiple output modes
# Analyze Airflow polling with burst detection
# Query Vault API for client activity metrics
KV Usage Analysis
# Generate KV usage report (new unified command with parallel processing)
# Multi-file analysis - 40% faster with parallel processing
# Compare two time periods
# Get summary statistics
Performance
Tested on production audit logs:
Single File:
- Log Size: 15.7 GB (3,986,972 lines)
- Processing Time: ~17 seconds
- Memory Usage: <100 MB
- Throughput: ~230,000 lines/second
Multi-File Sequential (7 days):
- Total Size: 105 GB (26,615,476 lines)
- Processing Time: ~2.5 minutes average per command
- Memory Usage: <100 MB (streaming approach)
- Throughput: ~175,000 lines/second sustained
Multi-File Parallel (multiple files, multi-core):
- Total Size: Varies by workload
- Processing Time: 40-85% faster than sequential (command-dependent)
- Memory Usage: 80-300 MB (2x overhead for parallel workers)
- Throughput: 2-7x sequential performance
- Speedup: Near-linear scaling with CPU cores
- Example: KV analysis 40% faster (141s → 85s, ~77 MB memory)
Compressed Files:
- File Size: 1.79 GB compressed → 13.8 GB uncompressed
- Processing Time: ~31 seconds (299,958 login operations)
- Throughput: ~57 MB/sec compressed, ~230,000 lines/second
- Memory Usage: <100 MB (streaming decompression, no temp files)
- Formats Supported: gzip (.gz), zstandard (.zst)
Parallel Processing Benchmarks (Real-World):
- KV Analysis (
kv-analysis analyze)- Sequential: 2m 21.32s (140 MB/s, ~40 MB memory)
- Parallel: 1m 24.60s (233 MB/s, ~77 MB memory)
- Improvement: 40.1% faster (56.7 second reduction)
- CPU utilization: 124.68s → 175.60s user time (multi-core usage)
- Memory overhead: 2x (expected for parallel workers)
Output Formats
Most commands produce formatted text output with:
- Summary statistics
- Top N lists sorted by volume/importance
- Percentage breakdowns
- Optimization recommendations
CSV export commands generate standard CSV files for:
- Spreadsheet analysis
- Database imports
- Further processing with other tools
Architecture
- Streaming Parser: Processes logs line-by-line without loading entire file into memory
- Parallel Processing: Multi-file workloads automatically use all CPU cores via Rayon
- Efficient Data Structures: Uses HashMaps and BTreeMaps for fast aggregation
- Smart Processing Mode: Auto-detects single vs multi-file operations for optimal performance
- Type Safety: Comprehensive error handling with anyhow
Development
Build
Test
Benchmarking
To measure performance and memory usage on macOS/Linux:
# macOS - shows execution time and peak memory usage
|
# Linux - shows execution time and peak memory usage
|
# Example: Benchmark KV analysis
Key metrics:
- Real time: Wall-clock time (actual duration)
- User time: CPU time (higher with parallel processing = good!)
- Maximum resident set size: Peak memory usage in bytes
- Divide by 1,048,576 to convert to MB
- Example: 80,461,824 bytes = ~77 MB
License
MIT
Contributing
Contributions welcome! Please open an issue or PR.
Requirements
- Rust 1.70+ (2021 edition)
- Works on Linux, macOS, and Windows
Support
For issues or questions, please open a GitHub issue.