# kazoe
Fast `wc` replacement. Counts words, lines, and bytes.
Command: `kz`
## Installation
### From crates.io
```bash
cargo install kazoe
```
### From source
```bash
git clone https://github.com/ConeDjordjic/kazoe
cd kazoe
cargo install --path .
```
### From GitHub
```bash
cargo install --git https://github.com/ConeDjordjic/kazoe
```
## Performance
Benchmarked on a 1GB text file:
```
Word counting: 21x faster than wc (48ms vs 1.0s)
All counts (lwc): 16x faster than wc (63ms vs 1.0s)
Pattern matching: 90x faster than grep (32ms vs 2.9s)
Line counting: 1.7x faster than wc (32ms vs 56ms)
Multiple files: 23x faster than wc (86ms vs 2.0s)
```
Performance scales with file size:
- Files < 512KB: sequential processing, similar speed to wc
- Files > 1GB: 15-90x faster depending on operation
## Basic Usage
```bash
# Default (lines, words, bytes)
kz file.txt
# Count lines
kz -l file.txt
# Count words
kz -w file.txt
# Count bytes
kz -c file.txt
# Count characters (UTF-8 aware)
kz -m file.txt
# Show max line length
kz -L file.txt
# Combine flags
kz -lwc file.txt
# Multiple files with totals
kz -w file1.txt file2.txt file3.txt
# Read from stdin
# Count pattern occurrences
kz --pattern "search term" file.txt
```
## Advanced Features
### JSON Output
Output as JSON (combine with `-l`, `-w`, `-c` etc. to select counts):
```bash
kz --json -lwc file.txt
```
Example output:
```json
[
{
"file": "file.txt",
"counts": {
"lines": 100,
"words": 500,
"bytes": 2048,
"chars": 2000,
"max_line_length": 80,
"pattern": 0,
"unique_words": 0
}
}
]
```
### Statistics Mode
Show file statistics (combine with `-l`, `-w`, `-c` etc. to include counts):
```bash
kz --stats -lwc file.txt
```
Example output:
```
Statistics:
Lines: 100
Words: 500
Bytes: 2048
Mean line length: 20.48
Median line length: 18
Std deviation: 12.34
Min line length: 0
Max line length: 80
Empty lines: 5
```
### Histogram
Line length distribution:
```bash
kz --histogram file.txt
```
Example output:
```
Line Length Histogram:
0- 9: 21 ███████████████
10- 19: 5 ███
20- 29: 24 ████████████████
30- 39: 18 ████████████
```
### Unique Word Count
Count unique words:
```bash
kz --unique file.txt
```
### Recursive Directory Processing
Recursive directory processing:
```bash
kz -r -w src/
```
Exclude specific patterns:
```bash
kz -r --exclude "*.min.js" --exclude "node_modules/*" -w src/
```
### Binary File Detection
Automatically detects and skips binary files:
```bash
kz binary.exe
# Output: kz: binary.exe: binary file detected, skipping
```
### Format-Aware Counting
Count only code (skip comments and blank lines):
```bash
kz --code -l file.rs
```
Count markdown text (skip code blocks):
```bash
kz --markdown -l README.md
```
### Fast Mode
Skip UTF-8 validation for faster processing:
```bash
kz --fast -m huge.log
```
### Null-Terminated File Lists
Process files from null-terminated list:
```bash
find . -name "*.txt" -print0 | kz -w --files0-from -
```
## Shell Completions
Generate shell completions:
```bash
# Bash
kz --generate-completion bash > ~/.local/share/bash-completion/completions/kz
# Zsh
kz --generate-completion zsh > ~/.zfunc/_kz
# Fish
kz --generate-completion fish > ~/.config/fish/completions/kz.fish
# PowerShell
kz --generate-completion powershell > kz.ps1
```
## Features Summary
### Core Counting
- `-l, --lines` - Print line counts
- `-w, --words` - Print word counts
- `-c, --bytes` - Print byte counts
- `-m, --chars` - Print character counts (UTF-8 aware)
- `-L, --max-line-length` - Print length of longest line
- `--unique` - Count unique words
### Advanced Analysis
- `--stats` - Show detailed statistics (mean, median, std dev)
- `--histogram` - Show line length distribution
- `--pattern <PATTERN>` - Count occurrences of a pattern
### Output Formats
- `--json` - Output results as JSON
### File Processing
- `-r, --recursive` - Process directories recursively
- `--exclude <PATTERN>` - Exclude files matching pattern
- `--files0-from <FILE>` - Read null-terminated file names
- Multiple files with automatic totals
### Performance
- `--fast` - Skip UTF-8 validation for speed
- Automatic binary file detection
- Memory mapped I/O
### Format-Aware
- `--code` - Count only code (skip comments)
- `--markdown` - Count markdown (skip code blocks)
### Shell Integration
- `--generate-completion <SHELL>` - Generate completions
- Stdin support
- Compatible with `wc` output format
## Building
```bash
cargo build --release
```
## Testing
```bash
cargo test
# Create 1GB test file
# Benchmark
hyperfine './target/release/kz -w /tmp/largefile.txt' 'wc -w /tmp/largefile.txt'
```
## Implementation
- Parallel processing with Rayon (1MB chunks)
- Memory mapped I/O with fallback for special files
- memchr for SIMD pattern matching
- Files < 512KB processed sequentially to avoid thread overhead
- UTF-8 aware character counting
- Binary file detection
- Format-aware filtering for code and markdown
## License
MIT