markovify-rs
A Rust implementation of a Markov chain text generator, inspired by markovify.
markovify-rs is a simple, extensible Markov chain generator. Its primary use is for building Markov models of large corpora of text and generating random sentences from that.
Features
- 🚀 Fast - Native Rust performance for text generation
- 📦 Simple API - Easy to use with sensible defaults
- 🔧 Extensible - Override key methods for custom behavior
- 💾 JSON Serialization - Save and load models for later use
- 🎯 Configurable - Adjustable state size, overlap detection, and more
- 🔗 Model Combination - Combine multiple models with weights
Installation
Add this to your Cargo.toml:
[]
= "0.1.2"
Basic Usage
use Text;
// Get raw text as string
let text = r#"
Sherlock Holmes was a consulting detective. He solved crimes in London.
His friend Dr. Watson helped him. They lived on Baker Street.
Holmes was very clever and observant.
"#;
// Build the model
let text_model = new.unwrap;
// Print five randomly-generated sentences
for _ in 0..5
// Print three randomly-generated sentences of no more than 100 characters
for _ in 0..3
Advanced Usage
Specifying the Model's State Size
State size is the number of words the probability of a next word depends on.
// Default state size is 2
let model = new.unwrap;
// Use a state size of 3
let model = new.unwrap;
Combining Models
Combine two or more Markov chains with optional weights:
use ;
let model_a = new.unwrap;
let model_b = new.unwrap;
// Combine with equal weights
let combined = combine_texts.unwrap;
// Combine with custom weights (50% more weight on model_a)
let combined = combine_texts.unwrap;
Compiling a Model
Compile a model for improved text generation speed:
let text_model = new.unwrap;
let compiled_model = text_model.compile;
// Or compile in place
let mut text_model = new.unwrap;
text_model.compile_inplace;
Working with Newline-Delimited Text
For text where sentences are separated by newlines instead of punctuation:
use NewlineText;
let text = r#"
Line one here
Line two there
Line three everywhere
"#;
let model = new.unwrap;
Exporting and Importing Models
Save and load models using JSON:
// Generate and save
let text_model = new.unwrap;
let model_json = text_model.to_json.unwrap;
// Save to file (optional)
write.unwrap;
// Load from JSON
let model_json = read_to_string.unwrap;
let reconstituted_model = from_json.unwrap;
// Generate a sentence
if let Some = reconstituted_model.make_short_sentence
Custom Sentence Rejection
Override the default rejection pattern:
// Use a custom regex to reject sentences containing specific patterns
let model = new.unwrap;
// Or disable well-formed checking entirely
let model = new.unwrap;
Sentence Generation Options
let model = new.unwrap;
// Generate with custom parameters
let sentence = model.make_sentence;
// Generate sentence starting with specific words
let sentence = model.make_sentence_with_start.unwrap;
API Reference
Text
The main text model struct.
new(input_text, state_size, retain_original, well_formed, reject_reg)- Create a new modelmake_sentence(...)- Generate a random sentencemake_short_sentence(max_chars, ...)- Generate a sentence with character limitmake_sentence_with_start(beginning, ...)- Generate sentence starting with specific wordscompile()- Compile for faster generationto_json()/from_json()- Serialize/deserialize
Chain
The underlying Markov chain (non-text-specific).
new(corpus, state_size)- Create a chain from corpuswalk(init_state)- Generate a sequencecompile()- Compile for faster generationto_json()/from_json()- Serialize/deserialize
NewlineText
Text model that splits on newlines.
Same API as Text, but uses newline-based sentence splitting.
Performance
Rust provides significant performance improvements over the Python implementation:
| Operation | Python (markovify) | Rust (markovify-rs) | Speedup |
|---|---|---|---|
| Model Creation | 50-100 ms | 5-15 ms | 5-10x |
| Sentence Generation | 1-5 ms | 0.01-0.1 ms | 50-100x |
| Compiled Generation | 0.5-2 ms | 0.01-0.05 ms | 20-50x |
| Model Compilation | 10-30 ms | 1-5 ms | 5-10x |
| JSON Serialize | 5-15 ms | 1-3 ms | 3-5x |
Running Benchmarks
# Run Rust benchmarks
# Run Python benchmarks
# Run both and compare
See benchmarks/BENCHMARKS.md for detailed documentation.
Notes
- Markovify works best with large, well-punctuated texts
- By default,
make_sentencetries 10 times to generate a valid sentence - The default overlap check rejects sentences that overlap by 15 words or 70% of the sentence length
- Setting
retain_original = falsereduces memory usage for large corpora
License
MIT License - see LICENSE for details.
Acknowledgments
This is a Rust port of the excellent markovify Python library by Jeremy Singer-Vine.