writing-analysis 0.1.0

Lightweight writing analysis and NLP tools for Rust
Documentation

writing-analysis

Crates.io docs.rs CI License: MIT

Lightweight writing analysis and NLP tools for Rust. Pure rule-based — no ML models required.

Features

English (default)

  • Readability scoring — Flesch-Kincaid, Flesch Reading Ease, SMOG, Coleman-Liau, ARI
  • Passive voice detection — be-verb + past participle pattern matching with 90+ irregular verbs
  • Cliche detection — 85 built-in English cliches, case-insensitive matching
  • Filter word detection — spot filler words (just, really, very, basically, ...)
  • Sentiment analysis — AFINN-111 lexicon (2400+ words), token-level scoring
  • Sentence variety — length stats, starter diversity, structure variety score

Chinese (feature flag: chinese)

  • Chinese readability — SCRI (Simple Chinese Readability Index), common character ratio (top 3000)
  • Passive voice detection — 被/受/遭/給 + 為...所 patterns with exclusion list
  • Cliche detection — ~100 entries: overused idioms, bureaucratic phrases, writing cliches (Traditional + Simplified)
  • Filter word detection — ~26 common filler words (其實、基本上、就是、然後...)
  • Sentiment analysis — 500+ word lexicon with negation/intensifier handling (不/沒/非常/很)
  • Sentence variety — segmentation-based analysis
  • Word segmentation — opencc-jieba-rs, supports both Traditional and Simplified Chinese

General

  • Zero unsafe code — memory safe by design
  • No runtime I/O — all data embedded at compile time

Quick Start

cargo add writing-analysis

# With Chinese support
cargo add writing-analysis --features chinese
use writing_analysis::{analyze_all, analyze_readability, detect_passive_voice};

fn main() -> writing_analysis::Result<()> {
    let text = "The ball was thrown by the boy. \
                She ran quickly through the park. \
                It was a beautiful day.";

    // Run all analyses at once
    let result = analyze_all(text)?;

    println!("Flesch Reading Ease: {:.1}", result.readability.flesch_reading_ease);
    println!("Passive voice: {:.0}%", result.passive_voice.percentage);
    println!("Cliches found: {}", result.cliches.count);
    println!("Filter words: {}", result.filter_words.count);
    println!("Sentiment: {:.2}", result.sentiment.score);
    println!("Sentence variety: {:.2}", result.sentence_variety.structure_variety);

    Ok(())
}

Usage

Readability

let scores = writing_analysis::analyze_readability(
    "The implementation of sophisticated algorithms necessitates \
     comprehensive understanding of computational complexity."
)?;

assert!(scores.flesch_kincaid_grade > 12.0);  // college level
assert!(scores.flesch_reading_ease < 30.0);   // very difficult

Passive Voice

let result = writing_analysis::detect_passive_voice(
    "The report was written by the team. She presented the findings."
)?;

assert_eq!(result.instances.len(), 1);
assert_eq!(result.instances[0].phrase, "was written");
assert_eq!(result.percentage, 50.0);  // 1 of 2 sentences

Cliches

let result = writing_analysis::detect_cliches(
    "At the end of the day, we need to think outside the box."
)?;

assert_eq!(result.count, 2);
for cliche in &result.instances {
    println!("'{}' at offset {}", cliche.canonical, cliche.offset);
}

Filter Words

let result = writing_analysis::detect_filter_words(
    "She just really wanted to basically understand."
)?;

assert_eq!(result.count, 3);  // just, really, basically
println!("{:.1}% filter words", result.percentage);

Sentiment

let result = writing_analysis::analyze_sentiment(
    "I love this wonderful movie."
)?;

assert!(result.score > 0.0);  // positive
for token in &result.tokens {
    if token.score != 0 {
        println!("{}: {}", token.word, token.score);
    }
}

Sentence Variety

let result = writing_analysis::analyze_sentence_variety(
    "The cat sat. The dog ran. The bird flew. A fish swam."
)?;

println!("Avg length: {:.1} words", result.avg_length);
println!("Variety: {:.2}", result.structure_variety);  // 0.5 — repetitive starters

Chinese Analysis

use writing_analysis::{analyze_all_zh, analyze_readability_zh, detect_passive_voice_zh};

fn main() -> writing_analysis::Result<()> {
    let text = "今天天氣很好。我們去公園散步。孩子們在草地上玩耍。";

    let result = analyze_all_zh(text)?;

    println!("SCRI: {:.1}", result.readability.scri);
    println!("Common char ratio: {:.1}%", result.readability.common_char_ratio);
    println!("Passive voice: {:.0}%", result.passive_voice.percentage);
    println!("Cliches: {}", result.cliches.count);
    println!("Filter words: {}", result.filter_words.count);
    println!("Sentiment: {:.2}", result.sentiment.score);

    Ok(())
}

API Overview

English Functions

Function Returns Description
analyze_all AnalysisResult Run all 6 analyzers
analyze_readability ReadabilityScores 5 readability formulas
detect_passive_voice PassiveVoiceResult Passive voice instances + percentage
detect_cliches ClicheResult Cliche instances + count
detect_filter_words FilterWordResult Filter word instances + percentage
analyze_sentiment SentimentResult Score + per-token breakdown
analyze_sentence_variety SentenceVarietyResult Length stats + starter diversity

Chinese Functions (feature: chinese)

Function Returns Description
analyze_all_zh AnalysisResultZh Run all 6 Chinese analyzers
analyze_readability_zh ChineseReadabilityScores SCRI + common char ratio
detect_passive_voice_zh PassiveVoiceResult 被/受/遭/給/為...所 patterns
detect_cliches_zh ClicheResult Chinese cliche detection
detect_filter_words_zh FilterWordResult Chinese filler word detection
analyze_sentiment_zh SentimentResult Sentiment with negation handling
analyze_sentence_variety_zh SentenceVarietyResult Segmentation-based variety analysis
segment Vec<String> Word segmentation (Traditional + Simplified)
split_sentences_zh Vec<&str> Chinese sentence splitting

Result Types

Type Key Fields
ReadabilityScores flesch_kincaid_grade, flesch_reading_ease, smog_index, coleman_liau_index, automated_readability_index
ChineseReadabilityScores scri, avg_sentence_length, avg_word_length, common_char_ratio, character_count, word_count, sentence_count
PassiveVoiceResult instances: Vec<PassiveInstance>, percentage: f64
ClicheResult instances: Vec<ClicheInstance>, count: usize
FilterWordResult instances: Vec<FilterWordInstance>, count: usize, percentage: f64
SentimentResult score: f64, comparative: f64, tokens: Vec<TokenSentiment>
SentenceVarietyResult avg_length: f64, length_variance: f64, starters: Vec<String>, structure_variety: f64

Part of the Scrivener Rust Ecosystem

Crate Description
scrivener-rtf RTF parser and generator
scrivener Scrivener .scriv project reader/writer
writing-analysis Writing analysis and NLP tools

License

MIT