cw 0.1.0

Count Words, a wc clone
cw-0.1.0 is not a library.
Visit the last successful build: cw-0.7.0

cw - Count Words

A wc clone in Rust.

Synopsis

cw 0.1.0
Thomas Hurst <tom@hur.st>
Count Words - word, line, character and byte count

USAGE:
    cw [FLAGS] [input]...

FLAGS:
    -c, --bytes              Count bytes
    -m, --chars              Count UTF-8 characters instead of bytes
    -h, --help               Prints help information
    -l, --lines              Count lines
    -L, --max-line-length    Count bytes (default) or characters (-m) of the longest line
    -V, --version            Prints version information
    -w, --words              Count words

ARGS:
    <input>...    Input files

Performance

It's quite fast. Line counts are optimized using the bytecount crate:

-% dd if=pwned-passwords-1.0.txt of=/dev/null bs=32k
392544+1 records in
392544+1 records out
12862899504 bytes transferred in 21.437070 secs (600030675 bytes/sec)
21.440 real, 0.173 user, 21.264 sys

-% wc -l pwned-passwords-1.0.txt
 306259512 pwned-passwords-1.0.txt
39.252 real, 18.679 user, 20.569 sys

-% cw -l pwned-passwords-1.0.txt
 306259512 pwned-passwords-1.0.txt
21.935 real, 1.070 user, 20.857 sys

Other counts are probably faster because there's no multibyte handling by default:

-% wc pwned-passwords-1.0.txt
 306259512 306259512 12862899504 pwned-passwords-1.0.txt
1:57.72 real, 1:37.12 user, 20.592 sys

-% cw pwned-passwords-1.0.txt
 306259512 306259512 12862899504 pwned-passwords-1.0.txt
1:03.70 real, 42.798 user, 20.899 sys

But even using UTF-8 processing it's not bad:

-% wc -mLlw pwned-passwords-1.0.txt
 306259512 306259512 12862899504      41 pwned-passwords-1.0.txt
5:53.70 real, 5:32.75 user, 20.920 sys

-% cw -mLlw pwned-passwords-1.0.txt
 306259512 306259512 12862899504      41 pwned-passwords-1.0.txt
2:15.46 real, 1:54.45 user, 21.008 sys

For best results build with:

cargo build --release --features runtime-dispatch-simd

This enables SIMD optimizations for line counting. It has no affect if you have it count anything else.