# textstat
[](https://crates.io/crates/textstat)
[](https://docs.rs/textstat)
[](https://github.com/trananhtung/textstat/actions/workflows/ci.yml)
[](#license)
[](#no_std)
**Readability metrics for English text.** Flesch Reading Ease, Flesch-Kincaid
Grade, Gunning Fog, SMOG, Automated Readability Index, Coleman-Liau — plus the
word / sentence / syllable counts they're built on. Inspired by Python's
[`textstat`](https://pypi.org/project/textstat/). Zero dependencies, `#![no_std]`.
```rust
let text = "The cat sat on the mat. The dog ran fast.";
assert_eq!(textstat::lexicon_count(text), 10); // words
assert_eq!(textstat::sentence_count(text), 2);
assert_eq!(textstat::syllable_count(text), 10);
let ease = textstat::flesch_reading_ease(text); // ~117 (very easy)
let grade = textstat::flesch_kincaid_grade(text); // ~ -1.8 (well below grade 1)
```
## Why textstat?
Rust's `readability` crates are *article extractors* (arc90/Mozilla Readability) —
they pull the main content out of a web page. **None of them compute readability
*scores*.** `textstat` fills that gap: drop-in functions for the standard formulas,
useful for content tooling, SEO, writing assistants, education, and accessibility
(WCAG) checks.
## Install
```toml
[dependencies]
textstat = "0.1"
```
## Metrics
| `flesch_reading_ease` | 0–100+ score (higher = easier) |
| `flesch_kincaid_grade` | U.S. school grade level |
| `gunning_fog` | U.S. school grade level |
| `smog_index` | U.S. school grade level |
| `automated_readability_index` | U.S. school grade level |
| `coleman_liau_index` | U.S. school grade level |
| `reading_time(text, wpm)` | estimated seconds to read |
### Counts
`lexicon_count` (words), `sentence_count`, `syllable_count`, `syllables(word)`,
`polysyllabic_count`, `char_count` (non-whitespace), `letter_count`.
## Accuracy
Syllables are estimated with a fast English heuristic (vowel groups + a silent-`e`
rule, including accented Latin vowels), not a pronunciation dictionary, so scores
are *close to* — not bit-identical with — dictionary-based tools. The metric
**formulas** are the standard published ones.
- `sentence_count` skips decimal points (`3.14`) and initialism dots (`U.S.A.`);
trailing abbreviation dots (`Dr.`) may still add one.
- Best results are on English text. Non-Latin scripts (no Latin vowels) fall back
to one syllable per word, so scores stay bounded rather than correct.
- Empty input yields `0` everywhere — no panics, no division by zero.
## no_std
`textstat` is `#![no_std]` (needs only `alloc`) with a dependency-free Newton's
-method `sqrt`, so it builds for bare-metal targets such as `thumbv7em-none-eabi`.
## License
Licensed under either of [Apache-2.0](LICENSE-APACHE) or [MIT](LICENSE-MIT) at
your option.