kazoe 0.1.4

Fast word, line, and byte counter
kazoe-0.1.4 is not a library.

kazoe

Fast wc replacement. Counts words, lines, and bytes.

Command: kz

Installation

From crates.io

cargo install kazoe

From source

git clone https://github.com/ConeDjordjic/kazoe
cd kazoe
cargo install --path .

From GitHub

cargo install --git https://github.com/ConeDjordjic/kazoe

Performance

Benchmarked on a 1GB text file:

Word counting:       21x faster than wc      (48ms vs 1.0s)
All counts (lwc):    16x faster than wc      (63ms vs 1.0s)
Pattern matching:    90x faster than grep    (32ms vs 2.9s)
Line counting:       1.7x faster than wc     (32ms vs 56ms)
Multiple files:      23x faster than wc      (86ms vs 2.0s)

Performance scales with file size:

  • Files < 512KB: sequential processing, similar speed to wc
  • Files > 1GB: 15-90x faster depending on operation

Basic Usage

# Default (lines, words, bytes)
kz file.txt

# Count lines
kz -l file.txt

# Count words
kz -w file.txt

# Count bytes
kz -c file.txt

# Count characters (UTF-8 aware)
kz -m file.txt

# Show max line length
kz -L file.txt

# Combine flags
kz -lwc file.txt

# Multiple files with totals
kz -w file1.txt file2.txt file3.txt

# Read from stdin
cat file.txt | kz -w

# Count pattern occurrences
kz --pattern "search term" file.txt

Advanced Features

JSON Output

Output as JSON (combine with -l, -w, -c etc. to select counts):

kz --json -lwc file.txt

Example output:

[
  {
    "file": "file.txt",
    "counts": {
      "lines": 100,
      "words": 500,
      "bytes": 2048,
      "chars": 2000,
      "max_line_length": 80,
      "pattern": 0,
      "unique_words": 0
    }
  }
]

Statistics Mode

Show file statistics (combine with -l, -w, -c etc. to include counts):

kz --stats -lwc file.txt

Example output:

Statistics:
  Lines: 100
  Words: 500
  Bytes: 2048
  Mean line length: 20.48
  Median line length: 18
  Std deviation: 12.34
  Min line length: 0
  Max line length: 80
  Empty lines: 5

Histogram

Line length distribution:

kz --histogram file.txt

Example output:

Line Length Histogram:
     0-   9:     21 ███████████████
    10-  19:      5 ███
    20-  29:     24 ████████████████
    30-  39:     18 ████████████

Unique Word Count

Count unique words:

kz --unique file.txt

Recursive Directory Processing

Recursive directory processing:

kz -r -w src/

Exclude specific patterns:

kz -r --exclude "*.min.js" --exclude "node_modules/*" -w src/

Binary File Detection

Automatically detects and skips binary files:

kz binary.exe
# Output: kz: binary.exe: binary file detected, skipping

Format-Aware Counting

Count only code (skip comments and blank lines):

kz --code -l file.rs

Count markdown text (skip code blocks):

kz --markdown -l README.md

Fast Mode

Skip UTF-8 validation for faster processing:

kz --fast -m huge.log

Null-Terminated File Lists

Process files from null-terminated list:

find . -name "*.txt" -print0 | kz -w --files0-from -

Shell Completions

Generate shell completions:

# Bash
kz --generate-completion bash > ~/.local/share/bash-completion/completions/kz

# Zsh
kz --generate-completion zsh > ~/.zfunc/_kz

# Fish
kz --generate-completion fish > ~/.config/fish/completions/kz.fish

# PowerShell
kz --generate-completion powershell > kz.ps1

Features Summary

Core Counting

  • -l, --lines - Print line counts
  • -w, --words - Print word counts
  • -c, --bytes - Print byte counts
  • -m, --chars - Print character counts (UTF-8 aware)
  • -L, --max-line-length - Print length of longest line
  • --unique - Count unique words

Advanced Analysis

  • --stats - Show detailed statistics (mean, median, std dev)
  • --histogram - Show line length distribution
  • --pattern <PATTERN> - Count occurrences of a pattern

Output Formats

  • --json - Output results as JSON

File Processing

  • -r, --recursive - Process directories recursively
  • --exclude <PATTERN> - Exclude files matching pattern
  • --files0-from <FILE> - Read null-terminated file names
  • Multiple files with automatic totals

Performance

  • --fast - Skip UTF-8 validation for speed
  • Automatic binary file detection
  • Memory mapped I/O

Format-Aware

  • --code - Count only code (skip comments)
  • --markdown - Count markdown (skip code blocks)

Shell Integration

  • --generate-completion <SHELL> - Generate completions
  • Stdin support
  • Compatible with wc output format

Building

cargo build --release

Testing

cargo test

# Create 1GB test file
yes "the quick brown fox jumps over the lazy dog" | head -c 1G > /tmp/largefile.txt

# Benchmark
hyperfine './target/release/kz -w /tmp/largefile.txt' 'wc -w /tmp/largefile.txt'

Implementation

  • Parallel processing with Rayon (1MB chunks)
  • Memory mapped I/O with fallback for special files
  • memchr for SIMD pattern matching
  • Files < 512KB processed sequentially to avoid thread overhead
  • UTF-8 aware character counting
  • Binary file detection
  • Format-aware filtering for code and markdown

License

MIT