vault-audit-tools 0.11.2

High-performance command-line tools for analyzing HashiCorp Vault audit logs with intelligent ephemeral entity detection
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
# Vault Audit Tools

[![CI](https://github.com/trenner1/hashicorp-vault-audit-analysis/actions/workflows/ci.yml/badge.svg)](https://github.com/trenner1/hashicorp-vault-audit-analysis/actions/workflows/ci.yml)
[![Security](https://github.com/trenner1/hashicorp-vault-audit-analysis/actions/workflows/security.yml/badge.svg)](https://github.com/trenner1/hashicorp-vault-audit-analysis/actions/workflows/security.yml)
[![codecov](https://codecov.io/github/trenner1/hashicorp-vault-audit-analysis/graph/badge.svg?token=QYMT1SKDQ6)](https://codecov.io/github/trenner1/hashicorp-vault-audit-analysis)
[![Docs](https://img.shields.io/badge/docs-latest-brightgreen.svg)](https://trenner1.github.io/hashicorp-vault-audit-analysis/latest/vault_audit_tools/index.html)
[Browse versions](https://trenner1.github.io/hashicorp-vault-audit-analysis/versions.html)


High-performance command-line tools for analyzing HashiCorp Vault audit logs, written in Rust.

## Features

- **Fast**: 3x faster than equivalent implementations (~17s vs 60s for 4M line logs)
- **Parallel Processing**: Automatically processes multiple files concurrently using all available CPU cores
- **Memory Efficient**: 10x less memory usage through streaming parser
- **Compressed File Support**: Direct analysis of `.gz` and `.zst` files without manual decompression
- **Multi-File Support**: Analyze weeks/months of logs without manual concatenation
- **Comprehensive**: 16 specialized analysis commands for different use cases
- **Production Ready**: Tested on 100GB+ multi-day production audit logs
- **Shell Completion**: Tab completion support for bash, zsh, fish, powershell, and elvish

## Installation

### From Source

```bash
cd vault-audit-tools
cargo install --path .
```

This installs the `vault-audit` binary to `~/.cargo/bin/`.

### Pre-built Binaries

Download from the [Releases](https://github.com/trenner1/hashicorp-vault-audit-analysis/releases) page.

### Shell Completion

After installation, enable tab completion for your shell:

#### Linux/macOS

```bash
# Bash (Linux) - single command
sudo mkdir -p /usr/local/etc/bash_completion.d && \
vault-audit generate-completion bash | sudo tee /usr/local/etc/bash_completion.d/vault-audit > /dev/null && \
echo "Completion installed. Restart your shell or run: source /usr/local/etc/bash_completion.d/vault-audit"

# Bash (macOS with Homebrew) - single command
mkdir -p $(brew --prefix)/etc/bash_completion.d && \
vault-audit generate-completion bash > $(brew --prefix)/etc/bash_completion.d/vault-audit && \
echo "Completion installed. Restart your shell or run: source $(brew --prefix)/etc/bash_completion.d/vault-audit"

# Zsh - single command
mkdir -p ~/.zsh/completions && \
vault-audit generate-completion zsh > ~/.zsh/completions/_vault-audit && \
grep -q 'fpath=(~/.zsh/completions $fpath)' ~/.zshrc || echo 'fpath=(~/.zsh/completions $fpath)' >> ~/.zshrc && \
grep -q 'autoload -Uz compinit && compinit' ~/.zshrc || echo 'autoload -Uz compinit && compinit' >> ~/.zshrc && \
echo "Completion installed. Restart your shell or run: source ~/.zshrc"

# Fish - single command
mkdir -p ~/.config/fish/completions && \
vault-audit generate-completion fish > ~/.config/fish/completions/vault-audit.fish && \
echo "Completion installed. Restart your shell."

# PowerShell (Windows/Cross-platform) - single command
$profileDir = Split-Path $PROFILE; New-Item -ItemType Directory -Force -Path $profileDir | Out-Null; vault-audit generate-completion powershell | Out-File -Append -FilePath $PROFILE -Encoding utf8; Write-Host "Completion installed. Restart PowerShell or run: . `$PROFILE"

# Elvish - single command
mkdir -p ~/.config/elvish/lib && \
vault-audit generate-completion elvish > ~/.config/elvish/lib/vault-audit.elv && \
grep -q 'use vault-audit' ~/.config/elvish/rc.elv || echo 'use vault-audit' >> ~/.config/elvish/rc.elv && \
echo "Completion installed. Restart your shell."
```

#### Windows (Git Bash)

Git Bash users need special handling since `~` doesn't expand in output redirection:

```bash
# Single command installation for Git Bash
mkdir -p "$HOME/.bash_completions" && \
vault-audit generate-completion bash > "$HOME/.bash_completions/vault-audit" && \
grep -q 'source "$HOME/.bash_completions/vault-audit"' ~/.bashrc || echo 'source "$HOME/.bash_completions/vault-audit"' >> ~/.bashrc && \
echo "Completion installed. Restart Git Bash or run: source ~/.bashrc"
```

**Troubleshooting**:
- Use `$HOME` variable instead of `~` for paths in Git Bash
- If completions don't work immediately, open a new terminal window
- Verify the completion file exists: `ls -la "$HOME/.bash_completions/vault-audit"`
- Check your shell rc file sources it: `grep vault-audit ~/.bashrc`

## Commands

### System Analysis

- **`system-overview`** - High-level overview of all operations, entities, and auth methods (parallel processing)
- **`entity-gaps`** - Identify operations without entity IDs (no-entity operations) (parallel processing)
- **`path-hotspots`** - Find most accessed paths with optimization recommendations (parallel processing)

### Authentication Analysis

- **`k8s-auth`** - Analyze Kubernetes/OpenShift authentication patterns and entity churn (parallel processing)
- **`token-analysis`** - Unified token operations analysis with abuse detection and CSV export (parallel processing)
  - Track token lifecycle operations (create, renew, revoke, lookup)
  - Detect excessive token lookup patterns
  - Export per-accessor detail to CSV

### Entity Analysis

- **`entity-analysis`** - Unified entity lifecycle analysis (recommended)
  - `churn` - Multi-day entity lifecycle tracking with ephemeral detection
  - `creation` - Entity creation patterns by authentication path
  - `preprocess` - Extract entity mappings (auto-generated by default)
  - `gaps` - Detect activity gaps
  - `timeline` - Individual entity operation timeline
  - **Key improvement**: Auto-preprocessing eliminates multi-step workflows!

### Vault API Integration

- **`client-activity`** - Query Vault for client activity metrics by mount
- **`entity-list`** - Export complete entity list from Vault (for baseline analysis)

### Mount Enumeration

- **`kv-mounts`** - Enumerate KV secret mounts with optional depth-based tree traversal
  - Automatically discovers all KV v1 and v2 mounts
  - Recursively lists secrets and folders within each mount
  - Supports unlimited depth (default) or limited traversal (`--depth N`)
  - Output formats: CSV (flattened), JSON (nested tree), or stdout (visual tree)
  - **Example**: `vault-audit kv-mounts --format stdout`
  - **Example**: `vault-audit kv-mounts --depth 2 --format csv --output kv-inventory.csv`

- **`auth-mounts`** - Enumerate authentication mounts with role/user discovery
  - Automatically discovers all auth methods
  - Lists roles, users, and groups within each mount (when `--depth > 0`)
  - Supports kubernetes, approle, userpass, jwt/oidc, and ldap auth types
  - Output formats: CSV (flattened), JSON (nested), or stdout (visual tree)
  - **Example**: `vault-audit auth-mounts --format stdout`
  - **Example**: `vault-audit auth-mounts --depth 0 --format json` (mounts only, no roles)

### KV Secrets Analysis

- **`kv-analysis`** - Unified KV secrets analysis (recommended)
  - `analyze` - Analyze KV usage by path and entity (generates CSV) (parallel processing)
  - `compare` - Compare KV usage between two time periods (CSV comparison)
  - `summary` - Summarize KV secret usage from CSV exports (CSV analysis)
- **`kv-analyzer`** - DEPRECATED: Use `kv-analysis analyze` instead
- **`kv-compare`** - DEPRECATED: Use `kv-analysis compare` instead
- **`kv-summary`** - DEPRECATED: Use `kv-analysis summary` instead

## Vault Token Requirements

Most commands analyze audit log files and **do not require any Vault API access**. The following commands interact with Vault's API and require a token with specific permissions.

### Commands That Don't Need Vault Access

These commands only read audit log files:
- `system-overview`, `path-hotspots`, `entity-gaps`
- `token-analysis`, `k8s-auth`, `airflow-polling`
- `entity-analysis` (all subcommands)
- `kv-analysis` (all subcommands)

### Commands That Need Vault API Access

#### `kv-mounts` Command

Enumerates all KV secret mounts and optionally lists their contents in a tree structure.

**Required ACL Policy:**
```hcl
# List and read secret mounts
path "sys/mounts" {
  capabilities = ["read"]
}

# List KV v2 secrets (for each mount discovered)
path "+/metadata/*" {
  capabilities = ["list"]
}

# List KV v1 secrets (for each mount discovered)
path "+/*" {
  capabilities = ["list"]
}
```

#### `auth-mounts` Command

Enumerates all authentication mounts and optionally lists roles, users, and groups within each mount.

**Required ACL Policy:**
```hcl
# List and read auth mounts
path "sys/auth" {
  capabilities = ["read"]
}

# List roles for kubernetes, approle, jwt/oidc auth methods
path "auth/+/role" {
  capabilities = ["list"]
}

# List users for userpass and ldap auth methods
path "auth/+/users" {
  capabilities = ["list"]
}

# List groups for ldap auth method
path "auth/+/groups" {
  capabilities = ["list"]
}
```

#### `entity-list` Command

Exports complete entity list from Vault for baseline analysis.

**Required ACL Policy:**
```hcl
# Read entity information
path "identity/entity/id" {
  capabilities = ["list"]
}

path "identity/entity/id/*" {
  capabilities = ["read"]
}

# Read auth mount configuration
path "sys/auth" {
  capabilities = ["read"]
}
```

#### `client-activity` Command

Queries Vault's activity log API for client usage metrics.

**Required ACL Policy:**
```hcl
# Export client activity data
path "sys/internal/counters/activity/export" {
  capabilities = ["read"]
}

# Read mount configuration (secret engines and auth methods)
path "sys/mounts" {
  capabilities = ["read"]
}

path "sys/auth" {
  capabilities = ["read"]
}
```

### Creating a Token with Required Permissions

**Option 1: Separate policies for each command**

```bash
# For entity-list command
vault policy write vault-audit-entity-list - <<EOF
path "identity/entity/id" {
  capabilities = ["list"]
}
path "identity/entity/id/*" {
  capabilities = ["read"]
}
path "sys/auth" {
  capabilities = ["read"]
}
EOF

vault token create -policy=vault-audit-entity-list

# For client-activity command
vault policy write vault-audit-client-activity - <<EOF
path "sys/internal/counters/activity/export" {
  capabilities = ["read"]
}
path "sys/mounts" {
  capabilities = ["read"]
}
path "sys/auth" {
  capabilities = ["read"]
}
EOF

vault token create -policy=vault-audit-client-activity
```

**Option 2: Combined policy for all API commands**

```bash
vault policy write vault-audit-tools - <<EOF
# Entity list access
path "identity/entity/id" {
  capabilities = ["list"]
}
path "identity/entity/id/*" {
  capabilities = ["read"]
}

# Client activity access
path "sys/internal/counters/activity/export" {
  capabilities = ["read"]
}

# Mount information (used by both commands)
path "sys/mounts" {
  capabilities = ["read"]
}
path "sys/auth" {
  capabilities = ["read"]
}
EOF

vault token create -policy=vault-audit-tools
```

**Option 3: Use existing token**

If you already have a Vault token with appropriate permissions (e.g., root token for testing, or admin token), you can use it:

```bash
export VAULT_ADDR="https://vault.example.com:8200"
export VAULT_TOKEN="hvs.your-token-here"

vault-audit entity-list --output entities.csv
vault-audit client-activity --start-time 2025-10-01T00:00:00Z --end-time 2025-10-31T23:59:59Z
```

### Environment Variables

Commands that interact with Vault API respect standard Vault environment variables:

- `VAULT_ADDR` - Vault server address (e.g., `https://vault.example.com:8200`)
- `VAULT_TOKEN` - Authentication token for API access
- `VAULT_NAMESPACE` - Vault namespace for API requests (e.g., `tenant1`, `admin/team-a`)
- `VAULT_SKIP_VERIFY` - Skip TLS certificate verification (set to `1`, `true`, or `yes`) - **USE ONLY FOR TESTING**
- `VAULT_CACERT` - Path to CA certificate for TLS verification

You can also provide these via command-line flags:
```bash
# Query entities from a specific namespace
vault-audit entity-list \
  --vault-addr https://vault.example.com:8200 \
  --vault-token hvs.xxxxx \
  --vault-namespace tenant1 \
  --output entities.csv

# Client activity for a namespace
vault-audit client-activity \
  --start 2025-10-01T00:00:00Z \
  --end 2025-10-31T23:59:59Z \
  --vault-namespace admin/security \
  --output activity.csv

# Skip TLS verification (dev/test only)
vault-audit entity-list --insecure --output entities.csv
```

## Namespace Support

Vault Enterprise supports [namespaces](https://developer.hashicorp.com/vault/docs/enterprise/namespaces) for multi-tenant isolation. This toolset provides comprehensive namespace support for both API commands and audit log analysis.

### API Commands with Namespaces

Commands that query Vault's API (`entity-list` and `client-activity`) support the `--vault-namespace` flag (or `VAULT_NAMESPACE` environment variable) to target a specific namespace:

```bash
# Set namespace via environment variable
export VAULT_NAMESPACE="tenant1"
vault-audit entity-list --output tenant1-entities.csv

# Or use command-line flag
vault-audit client-activity \
  --start 2025-10-01T00:00:00Z \
  --end 2025-10-31T23:59:59Z \
  --vault-namespace admin/security
```

### Audit Log Analysis with Namespace Filtering

Audit logs from namespaced Vault clusters include namespace information in each entry. Use the `--namespace-filter` flag to analyze logs from a specific namespace:

```bash
# Analyze only operations in the "prod" namespace
vault-audit system-overview audit.log --namespace-filter root

# Show system overview for specific namespace
vault-audit system-overview logs/*.log.gz --namespace-filter tenant1

# All other audit log commands can filter by namespace using similar patterns
vault-audit token-analysis audit.log --namespace-filter admin
vault-audit kv-analysis analyze audit.log --namespace-filter myapp --output kv-usage.csv
```

**Note**: Namespace filtering for audit log commands currently supported:
- `system-overview` - Full support with `--namespace-filter`
- Other commands - Namespace ID is available in audit log entries via the `request.namespace.id` field

### Namespace Best Practices

1. **API Access**: Tokens must have appropriate permissions within the target namespace
2. **Audit Logs**: Ensure audit logs include namespace information (enabled by default in Vault Enterprise)
3. **Cross-Namespace Analysis**: To analyze multiple namespaces, run separate commands for each namespace
4. **Root Namespace**: Use `--namespace-filter root` for operations in the root namespace

## Documentation

### API Documentation

View the full API documentation with detailed module and function descriptions:

```bash
# Generate and open documentation in your browser
cd vault-audit-tools
cargo doc --no-deps --open
```

The documentation includes:
- Comprehensive crate overview and architecture
- Module-level documentation for all components
- Function-level documentation with examples
- Type definitions and their usage

Once published to crates.io, the documentation will be automatically available at [docs.rs/vault-audit-tools](https://docs.rs/vault-audit-tools).

### Command Help

Get detailed help for any command:

```bash
# General help
vault-audit --help

# Unified command help
vault-audit entity-analysis --help
vault-audit token-analysis --help
vault-audit kv-analysis --help

# Subcommand-specific help
vault-audit entity-analysis churn --help
vault-audit kv-analysis analyze --help
```

### Application-Specific

- **`airflow-polling`** - Analyze Airflow secret polling patterns with burst rate detection (parallel processing)

### Utilities

- **`generate-completion`** - Generate shell completion scripts

## Usage Examples

### Compressed File Support

All commands automatically detect and decompress `.gz` (gzip) and `.zst` (zstandard) files:

```bash
# Analyze compressed files directly - no manual decompression needed
vault-audit system-overview vault_audit.log.gz

# Mix compressed and uncompressed files
vault-audit entity-churn day1.log.gz day2.log day3.log.zst

# Glob patterns work with compressed files
vault-audit path-hotspots logs/*.log.gz

# Streaming decompression - no temp files, no extra disk space needed
vault-audit token-analysis huge_file.log.gz  # processes 1.79GB compressed → 13.8GB uncompressed
```

**Performance**: Compressed file processing maintains full speed (~57 MB/s) with no memory overhead thanks to streaming decompression.

### Understanding Entities vs Token Accessors

When analyzing token operations, it's important to understand the difference between **entities** and **accessors**:

**Entity** (User/Service Identity):
- A single identity like "fg-PIOP0SRVDEVOPS" or "approle"
- Can have multiple tokens (accessors) over time
- Summary view shows aggregated totals per entity
- Example: One service might have 233,668 total operations

**Accessor** (Individual Token):
- A unique token identifier for a single token
- Each accessor belongs to one entity
- Tokens get rotated/recreated, creating new accessors
- Example: That same service's 233k operations might be spread across 3 tokens:
  - Token 1: 113,028 operations (10/06 07:26 - 10/07 07:41, 24.3h lifespan)
  - Token 2: 79,280 operations (10/06 07:26 - 10/07 07:40, 24.2h lifespan)
  - Token 3: 41,360 operations (10/06 07:28 - 10/07 07:40, 24.2h lifespan)

**When to use each view**:
- **Summary mode** (default): Shows per-entity totals for understanding overall usage patterns
- **CSV export** (`--export`): Shows per-accessor detail for token lifecycle analysis, rotation patterns, and identifying specific problematic tokens

```bash
# See entity-level summary (6,091 entities with totals)
vault-audit token-analysis vault_audit.log

# Export accessor-level detail (907 individual tokens with timestamps)
vault-audit token-analysis vault_audit.log --export tokens.csv

# Filter to high-volume tokens only
vault-audit token-analysis vault_audit.log --export tokens.csv --min-operations 1000
```

### Quick Analysis

```bash
# Get system overview (works with plain or compressed files)
vault-audit system-overview vault_audit.log
vault-audit system-overview vault_audit.log.gz

# Analyze multiple days without concatenation
vault-audit system-overview logs/vault_audit.2025-10-*.log

# Find authentication issues
vault-audit k8s-auth vault_audit.log

# Detect token abuse across multiple compressed files
vault-audit token-analysis day1.log.gz day2.log.gz day3.log.gz --abuse-threshold 5000
```

### Multi-File Long-Term Analysis

All audit log commands support multiple files (compressed or uncompressed) for historical analysis:

```bash
# Week-long system overview with compressed files
vault-audit system-overview vault_audit.2025-10-{07,08,09,10,11,12,13}.log.gz

# Month-long entity churn tracking (auto-preprocesses entity mappings)
vault-audit entity-analysis churn october/*.log.gz

# Multi-day token operations analysis with mixed file types
vault-audit token-analysis logs/vault_audit.*.log --export token_ops.csv

# Path hotspot analysis across 30 days of compressed logs
vault-audit path-hotspots logs/vault_audit.2025-10-*.log.zst
```

### Mount Enumeration and Discovery

Enumerate and discover all mounts, roles, and secrets without needing to know mount names in advance:

```bash
# Discover all KV mounts and their complete tree structure
vault-audit kv-mounts --format stdout

# List only KV mount points (no traversal into secrets)
vault-audit kv-mounts --depth 0 --format csv

# Traverse 2 levels deep and save to CSV
vault-audit kv-mounts --depth 2 --format csv --output kv-inventory.csv

# Get complete KV structure as JSON for further processing
vault-audit kv-mounts --format json --output kv-tree.json

# Discover all auth mounts with their roles and users
vault-audit auth-mounts --format stdout

# List only auth mount points (no role enumeration)
vault-audit auth-mounts --depth 0 --format json

# Export auth configuration with roles to CSV
vault-audit auth-mounts --format csv --output auth-config.csv
```

**Example Output - KV Mounts (stdout format):**
```
KV Mounts:
================================================================================
Path: kv/
  Mount Type: kv
  Version: 2
  Description: key/value secret storage
  Accessor: kv_f1c7d8b2
  Children (11 paths):
  kv/
  └── dev/
      └── apps/
          ├── backend-service/
          │   ├── config
          │   └── example
          ├── frontend-app/
          │   ├── config
          │   └── example
          └── mobile-app/
              ├── config
              └── example
```

**Example Output - Auth Mounts (stdout format):**
```
Auth Mounts:
================================================================================
Path: kubernetes/
  Type: kubernetes
  Description:
  Accessor: auth_kubernetes_e954d6e1
  Roles/Users (5):
    ├── backend-service
    ├── cache-service
    ├── database-operator
    ├── frontend-app
    └── monitoring

Path: approle/
  Type: approle
  Description:
  Accessor: auth_approle_6a0e0046
  Roles/Users (5):
    ├── ansible
    ├── automation
    ├── ci-pipeline
    ├── monitoring-agent
    └── terraform
```

### Parallel Processing

Commands automatically use parallel processing when analyzing multiple files:

```bash
# Single file - uses sequential processing
vault-audit system-overview vault_audit.log

# Multiple files - automatically parallelizes across all CPU cores
vault-audit system-overview day1.log day2.log day3.log day4.log

# Glob expansion with many files - maximizes CPU utilization
vault-audit path-hotspots logs/*.log.gz  # processes all files concurrently
```

**Commands with Parallel Processing:**
- `system-overview` - System-wide audit analysis
- `entity-analysis gaps` - Operations without entity IDs
- `entity-gaps` - Operations without entity IDs (deprecated, use entity-analysis)
- `path-hotspots` - Most accessed paths
- `k8s-auth` - Kubernetes authentication analysis
- `airflow-polling` - Airflow polling pattern detection
- `kv-analysis analyze` - KV secrets usage analysis
- `token-analysis` - Token operations analysis

**How it works:**
- Automatically detects when multiple files are provided
- Processes files concurrently using all available CPU cores
- Uses streaming approach to maintain low memory usage
- Combines results correctly with proper aggregation
- Provides accurate progress tracking across all files

**Performance benefits:**
- Near-linear speedup with number of CPU cores
- 8-core system: ~7x faster on 8+ files
- Real-world improvements: 40% faster for KV analysis, 7x for system overview
- Memory efficient: 2x memory overhead for significant speed gains
- No configuration needed - works automatically
- Falls back to sequential processing for single files

### Deep Dive Analysis

```bash
# Analyze entity creation patterns by auth path (auto-preprocessing enabled)
vault-audit entity-analysis creation vault_audit.log

# Track entity lifecycle across multiple days (auto-preprocessing enabled)
vault-audit entity-analysis churn day1.log day2.log day3.log --baseline baseline_entities.json

# Analyze specific entity behavior
vault-audit entity-analysis timeline --entity-id <UUID> day1.log day2.log

# Detect activity gaps (potential security issues)
vault-audit entity-analysis gaps vault_audit.log --window-seconds 300

# Token analysis with multiple output modes
vault-audit token-analysis vault_audit.log                              # Summary view (per-entity)
vault-audit token-analysis vault_audit.log --abuse-threshold 10000      # Abuse detection
vault-audit token-analysis vault_audit.log --filter lookup,revoke       # Filter operation types
vault-audit token-analysis vault_audit.log --export tokens.csv          # Export per-accessor detail (907 tokens)
vault-audit token-analysis vault_audit.log --export tokens.csv --min-operations 1000  # High-volume tokens only

# Analyze Airflow polling with burst detection
vault-audit airflow-polling vault_audit.log

# Query Vault API for client activity metrics
vault-audit client-activity --start 2025-10-01T00:00:00Z --end 2025-11-01T00:00:00Z
```

### KV Usage Analysis

```bash
# Generate KV usage report (new unified command with parallel processing)
vault-audit kv-analysis analyze vault_audit.log --kv-prefix "appcodes/" --output kv_usage.csv

# Multi-file analysis - 40% faster with parallel processing
vault-audit kv-analysis analyze logs/*.log --output kv_usage.csv

# Compare two time periods
vault-audit kv-analysis compare old_usage.csv new_usage.csv

# Get summary statistics
vault-audit kv-analysis summary kv_usage.csv
```

## Performance

Tested on production audit logs:

**Single File:**
- **Log Size**: 15.7 GB (3,986,972 lines)
- **Processing Time**: ~17 seconds
- **Memory Usage**: <100 MB
- **Throughput**: ~230,000 lines/second

**Multi-File Sequential (7 days):**
- **Total Size**: 105 GB (26,615,476 lines)
- **Processing Time**: ~2.5 minutes average per command
- **Memory Usage**: <100 MB (streaming approach)
- **Throughput**: ~175,000 lines/second sustained

**Multi-File Parallel (multiple files, multi-core):**
- **Total Size**: Varies by workload
- **Processing Time**: 40-85% faster than sequential (command-dependent)
- **Memory Usage**: 80-300 MB (2x overhead for parallel workers)
- **Throughput**: 2-7x sequential performance
- **Speedup**: Near-linear scaling with CPU cores
- **Example**: KV analysis 40% faster (141s → 85s, ~77 MB memory)

**Compressed Files:**
- **File Size**: 1.79 GB compressed → 13.8 GB uncompressed
- **Processing Time**: ~31 seconds (299,958 login operations)
- **Throughput**: ~57 MB/sec compressed, ~230,000 lines/second
- **Memory Usage**: <100 MB (streaming decompression, no temp files)
- **Formats Supported**: gzip (.gz), zstandard (.zst)

**Parallel Processing Benchmarks (Real-World):**
- **KV Analysis** (`kv-analysis analyze`)
  - Sequential: 2m 21.32s (140 MB/s, ~40 MB memory)
  - Parallel: 1m 24.60s (233 MB/s, ~77 MB memory)
  - **Improvement: 40.1% faster** (56.7 second reduction)
  - CPU utilization: 124.68s → 175.60s user time (multi-core usage)
  - Memory overhead: 2x (expected for parallel workers)

## Output Formats

Most commands produce formatted text output with:
- Summary statistics
- Top N lists sorted by volume/importance
- Percentage breakdowns
- Optimization recommendations

CSV export commands generate standard CSV files for:
- Spreadsheet analysis
- Database imports
- Further processing with other tools

## Architecture

- **Streaming Parser**: Processes logs line-by-line without loading entire file into memory
- **Parallel Processing**: Multi-file workloads automatically use all CPU cores via Rayon
- **Efficient Data Structures**: Uses HashMaps and BTreeMaps for fast aggregation
- **Smart Processing Mode**: Auto-detects single vs multi-file operations for optimal performance
- **Type Safety**: Comprehensive error handling with anyhow

## Development

### Build

```bash
cd vault-audit-tools
cargo build --release
```

### Test

```bash
cargo test
```

### Benchmarking

To measure performance and memory usage on macOS/Linux:

```bash
# macOS - shows execution time and peak memory usage
/usr/bin/time -l ./target/release/vault-audit <command> <args> 2>&1 | grep -E "(real|maximum resident)"

# Linux - shows execution time and peak memory usage
/usr/bin/time -v ./target/release/vault-audit <command> <args> 2>&1 | grep -E "(Elapsed|Maximum resident)"

# Example: Benchmark KV analysis
/usr/bin/time -l ./target/release/vault-audit kv-analysis analyze logs/*.log
```

**Key metrics:**
- **Real time**: Wall-clock time (actual duration)
- **User time**: CPU time (higher with parallel processing = good!)
- **Maximum resident set size**: Peak memory usage in bytes
  - Divide by 1,048,576 to convert to MB
  - Example: 80,461,824 bytes = ~77 MB

## License

MIT

## Contributing

Contributions welcome! Please open an issue or PR.

## Requirements

- Rust 1.70+ (2021 edition)
- Works on Linux, macOS, and Windows

## Support

For issues or questions, please open a GitHub issue.