kazoe 0.1.4

Fast word, line, and byte counter
# kazoe

Fast `wc` replacement. Counts words, lines, and bytes.

Command: `kz`

## Installation

### From crates.io

```bash
cargo install kazoe
```

### From source

```bash
git clone https://github.com/ConeDjordjic/kazoe
cd kazoe
cargo install --path .
```

### From GitHub

```bash
cargo install --git https://github.com/ConeDjordjic/kazoe
```

## Performance

Benchmarked on a 1GB text file:

```
Word counting:       21x faster than wc      (48ms vs 1.0s)
All counts (lwc):    16x faster than wc      (63ms vs 1.0s)
Pattern matching:    90x faster than grep    (32ms vs 2.9s)
Line counting:       1.7x faster than wc     (32ms vs 56ms)
Multiple files:      23x faster than wc      (86ms vs 2.0s)
```

Performance scales with file size:
- Files < 512KB: sequential processing, similar speed to wc
- Files > 1GB: 15-90x faster depending on operation

## Basic Usage

```bash
# Default (lines, words, bytes)
kz file.txt

# Count lines
kz -l file.txt

# Count words
kz -w file.txt

# Count bytes
kz -c file.txt

# Count characters (UTF-8 aware)
kz -m file.txt

# Show max line length
kz -L file.txt

# Combine flags
kz -lwc file.txt

# Multiple files with totals
kz -w file1.txt file2.txt file3.txt

# Read from stdin
cat file.txt | kz -w

# Count pattern occurrences
kz --pattern "search term" file.txt
```

## Advanced Features

### JSON Output

Output as JSON (combine with `-l`, `-w`, `-c` etc. to select counts):

```bash
kz --json -lwc file.txt
```

Example output:
```json
[
  {
    "file": "file.txt",
    "counts": {
      "lines": 100,
      "words": 500,
      "bytes": 2048,
      "chars": 2000,
      "max_line_length": 80,
      "pattern": 0,
      "unique_words": 0
    }
  }
]
```

### Statistics Mode

Show file statistics (combine with `-l`, `-w`, `-c` etc. to include counts):

```bash
kz --stats -lwc file.txt
```

Example output:
```
Statistics:
  Lines: 100
  Words: 500
  Bytes: 2048
  Mean line length: 20.48
  Median line length: 18
  Std deviation: 12.34
  Min line length: 0
  Max line length: 80
  Empty lines: 5
```

### Histogram

Line length distribution:

```bash
kz --histogram file.txt
```

Example output:
```
Line Length Histogram:
     0-   9:     21 ███████████████
    10-  19:      5 ███
    20-  29:     24 ████████████████
    30-  39:     18 ████████████
```

### Unique Word Count

Count unique words:

```bash
kz --unique file.txt
```

### Recursive Directory Processing

Recursive directory processing:

```bash
kz -r -w src/
```

Exclude specific patterns:

```bash
kz -r --exclude "*.min.js" --exclude "node_modules/*" -w src/
```

### Binary File Detection

Automatically detects and skips binary files:

```bash
kz binary.exe
# Output: kz: binary.exe: binary file detected, skipping
```

### Format-Aware Counting

Count only code (skip comments and blank lines):

```bash
kz --code -l file.rs
```

Count markdown text (skip code blocks):

```bash
kz --markdown -l README.md
```

### Fast Mode

Skip UTF-8 validation for faster processing:

```bash
kz --fast -m huge.log
```

### Null-Terminated File Lists

Process files from null-terminated list:

```bash
find . -name "*.txt" -print0 | kz -w --files0-from -
```

## Shell Completions

Generate shell completions:

```bash
# Bash
kz --generate-completion bash > ~/.local/share/bash-completion/completions/kz

# Zsh
kz --generate-completion zsh > ~/.zfunc/_kz

# Fish
kz --generate-completion fish > ~/.config/fish/completions/kz.fish

# PowerShell
kz --generate-completion powershell > kz.ps1
```

## Features Summary

### Core Counting
- `-l, --lines` - Print line counts
- `-w, --words` - Print word counts
- `-c, --bytes` - Print byte counts
- `-m, --chars` - Print character counts (UTF-8 aware)
- `-L, --max-line-length` - Print length of longest line
- `--unique` - Count unique words

### Advanced Analysis
- `--stats` - Show detailed statistics (mean, median, std dev)
- `--histogram` - Show line length distribution
- `--pattern <PATTERN>` - Count occurrences of a pattern

### Output Formats
- `--json` - Output results as JSON

### File Processing
- `-r, --recursive` - Process directories recursively
- `--exclude <PATTERN>` - Exclude files matching pattern
- `--files0-from <FILE>` - Read null-terminated file names
- Multiple files with automatic totals

### Performance
- `--fast` - Skip UTF-8 validation for speed
- Automatic binary file detection
- Memory mapped I/O

### Format-Aware
- `--code` - Count only code (skip comments)
- `--markdown` - Count markdown (skip code blocks)

### Shell Integration
- `--generate-completion <SHELL>` - Generate completions
- Stdin support
- Compatible with `wc` output format

## Building

```bash
cargo build --release
```

## Testing

```bash
cargo test

# Create 1GB test file
yes "the quick brown fox jumps over the lazy dog" | head -c 1G > /tmp/largefile.txt

# Benchmark
hyperfine './target/release/kz -w /tmp/largefile.txt' 'wc -w /tmp/largefile.txt'
```

## Implementation

- Parallel processing with Rayon (1MB chunks)
- Memory mapped I/O with fallback for special files
- memchr for SIMD pattern matching
- Files < 512KB processed sequentially to avoid thread overhead
- UTF-8 aware character counting
- Binary file detection
- Format-aware filtering for code and markdown

## License

MIT