Runmunch
A fast, efficient Rust implementation of hunspell's unmunch tool for expanding dictionary words using morphological affix rules. Generate all possible word forms from dictionaries or expand individual words interactively.
Features
- High Performance - Processes 23K+ words/second (German), optimized Rust implementation
- Dual Interface - Use as both a library and command-line tool
- Word Expansion - Interactive word expansion using affix rules (
--expandmode) - Base Word Finding - Find base forms from inflected words (
--find-basemode) - Dictionary Unmunching - Batch processing of entire dictionary files
- Unicode Support - Full support for international languages (German, Croatian, etc.)
- Hunspell Compatible - Works with standard hunspell .aff and .dic files
- Flag Alias Support - Handles complex affix flag systems (AF directive)
- Memory Efficient - Optimized for large dictionaries and complex morphology
Installation
Or build from source:
Usage
Command Line Interface
Unmunch a dictionary (expand all words from a dictionary file):
Example:
Expand specific words using affix rules (-e/--expand mode):
Without dictionary (tries all possible rules):
|
# or
|
With dictionary (uses word-specific flags for better results):
|
# or
|
Find base words and expand them (-b/--find-base mode):
Find base forms from inflected words and expand them (requires dictionary):
|
# or
|
This mode:
- Analyzes inflected forms (e.g., "cats", "walked", "books")
- Finds their base words (e.g., "cat", "walk", "book")
- Expands the base words using their dictionary flags
- Returns all possible forms of the base words
Examples:
# German words - expand base forms
|
# Croatian words - expand base forms
|
# English inflected forms - find base and expand
|
Library Usage
use ;
// Create a new Runmunch instance
let mut runmunch = new;
// Load affix and dictionary files
runmunch.load_affix_file?;
runmunch.load_dictionary?;
// Expand all words from the dictionary
let expanded_words = runmunch.unmunch?;
for word in expanded_words
// Or expand specific words
let word_forms = runmunch.expand_word?;
for form in word_forms
// Find base word and expand it
let expanded_forms = runmunch.find_base_and_expand?;
for form in expanded_forms
Using the WordExpander directly:
use ;
// Load affix file
let affix_file = load?;
// Create expander and set affix file
let mut expander = new;
expander.set_affix_file;
// Expand a word with specific flags
let expanded = expander.expand_with_flags?;
// Results might include: ["work", "worked"]
// Find base word from inflected form
let base_words = expander.find_base_word?;
// Results might include: ["work"]
// Find base and expand
let all_forms = expander.find_base_and_expand?;
// Results might include: ["work", "worked", ...]
File Formats
Affix Files (.aff)
Runmunch supports hunspell affix file format with features like:
- Prefix rules (
PFX) - Suffix rules (
SFX) - Cross-product flags for combining prefixes and suffixes
- Condition patterns using regular expressions
- Long flags (
FLAG long)
Example affix file:
FLAG long
PFX UN Y 1
PFX UN 0 un .
SFX ED Y 1
SFX ED 0 ed .
SFX S Y 1
SFX S 0 s .
Dictionary Files (.dic)
Standard hunspell dictionary format:
3
hello/ED
world
test/UN,S
- First line contains word count
- Each subsequent line contains a word optionally followed by flags after
/
Examples
Basic Word Expansion
# Create a simple affix file
# Test expansion
|
# Output: happy, unhappy
Croatian Language Example
# Expand Croatian words
|
# Unmunch Croatian dictionary
|
# Shows total expanded words
🚀 Performance Benchmarks
Runmunch delivers excellent performance across different languages and use cases:
Dictionary Unmunching (Full Expansion)
| Language | Input Words | Output Words | Time | Speed | Expansion Ratio |
|---|---|---|---|---|---|
| German | 75,888 | 1,226,445 | 3.23s | 23,493 w/s | 16.16x |
| Croatian | 53,712 | 28,428,780 | 52.63s | 1,020 w/s | 529.28x |
Word Expansion (--expand mode)
| Mode | Language | Input | Output | Time | Speed |
|---|---|---|---|---|---|
| No Dict | German | 10 words | 528 forms | 0.023s | 435 w/s |
| No Dict | Croatian | 10 words | 2,015 forms | 0.027s | 370 w/s |
| With Dict | German | 10 words | 236 forms | 0.073s | 137 w/s |
| With Dict | Croatian | 10 words | 1,515 forms | 0.081s | 123 w/s |
Key Performance Features
- Zero-cost abstractions - Leverages Rust's performance guarantees
- Memory efficient - Optimized data structures and algorithms
- Unicode aware - Proper handling of international characters
- Scalable - Performance scales reasonably with morphological complexity
Compatibility
- Hunspell format: Full compatibility with standard hunspell .aff and .dic files
- Languages: Extensively tested with German (de) and Croatian (hr_HR), supports any hunspell language
- Flag systems: Supports single flags, long flags (
FLAG long), and flag aliases (AFdirective) - Morphology: Handles simple (Germanic) to complex (Slavic) morphological systems
- Platforms: Cross-platform (Linux, macOS, Windows)
- Unicode: Full UTF-8 support for international characters
API Documentation
The main components of the library:
Runmunch
The main interface combining affix files and dictionaries.
WordExpander
Core word expansion logic using affix rules.
AffixFile
Parser and representation of hunspell affix files.
Dictionary
Parser and representation of hunspell dictionary files.
Error Handling
Runmunch uses comprehensive error handling with descriptive error messages:
use RunmunchError;
match runmunch.load_affix_file
Contributing
Contributions are welcome! Please feel free to:
- Report bugs
- Suggest features
- Submit pull requests
- Improve documentation
License
Licensed under MIT OR Apache-2.0.
Acknowledgments
- Based on hunspell's unmunch tool by Németh László and contributors
- Inspired by Lingua::Spelling::Alternative Perl module by Dobrica Pavlinušić