tagalyzer 0.1.1

A CLI tool to gather statistics on collections of plaintext-adjacent files.
Documentation

Tagalyzer

This is a CLI tool that counts words in files, then prints the counts in an easily human-readable format. This should provide some guidance on relevant tags for blog posts.

This tool will eventually be a word relative frequency analyzer. The eventual intended goal is to point it at a directory or list of files, and it will analyze statistical values for a sum total of all words in all files, as well as breaking out the how word frequency varies by file.

Examples

$ tagalyzer --help
Usage: tagalyzer [OPTIONS] [PATHS]...

Arguments:
  [PATHS]...  File or directory paths to count words from

Options:
--- [snip] ---
$ tagalyzer README.md # This file
Sorted wordcount for README.md
the                                           : 11
to                                            : 9
this                                          : 8
of                                            : 8
a                                             : 8
--- [snip] ---
$ tagalyzer -c README.md # Case sensitive
Sorted wordcount for README.md
to                                            : 10
the                                           : 10
of                                            : 9
a                                             : 9
or                                            : 7
and                                           : 7
words                                         : 6
This                                          : 6
in                                            : 6
I                                             : 5

Long-Term Plans

I plan on developing this tool into both a CLI binary and a parallel library to provide an out-of-the-box solution and high customization. It will fit into my workflow by providing frequency of words and phrases (e.g. strings of up to n words or characters) of the directory where I keep all my blog posts, which I can use to help me decide on a set of applicable tags.

License

This work is licensed under either the MIT or Apache 2.0 license at the choice of the user.

Contributions are assumed to be licensed under MIT unless otherwise stated.

Contributing

Contributions are always welcome! The project is hosted on GitLab. Bug reports, commits, or even just suggestions are appreciated.

If you do want to contribute code, I'm more familiar with merging branches than forks. I have gating tests and lints in CI, which should be equivalent to the code block below. If the code or results ever differ between this block running locally and what happens in CI, please open an issue.

cargo fmt &&
cargo test &&
cargo clippy --no-deps -- \
-Dclippy::pedantic \
-Dclippy::nursery \
-Dclippy::style \
-Dclippy::unwrap_used \
-Dclippy::expect_used \
-Dclippy::missing_docs_in_private_items \
-Dclippy::single_char_lifetime_names \
-Dclippy::use_self \
-Dclippy::str_to_string \
-Ddead_code \
-Aclippy::needless_return \
-Aclippy::tabs_in_doc_comments \
-Dwarnings