markovgen
A library for building markov chain graphs from text datasets and performantly generating text sequences by traversing them.
Features
- Simple API for building and traversing graphs
- Configurable minimum sequence length
- An example CLI application (markovcli) that supports building graphs from datasets and writing them to the disk, as well as sampling such graphs with customizable sequence length.
- Capable of generating >2.5 million names per second with default settings (and cli_no_print feature set to avoid IO overhead) on my machine from the first_names benchmark dataset (see
benches/) - Try it using
cargo run -r -F serde --bin markovcli
- Capable of generating >2.5 million names per second with default settings (and cli_no_print feature set to avoid IO overhead) on my machine from the first_names benchmark dataset (see
Example
src/bin/example.rs:
use Arc;
use *;
const NAME_DATASET: &str = "Tim\nTom\nThomas\nNathan\nNina\nTiara\nTyra\nTyrone";
const SEQUENCE_START: char = '\x01';
const SEQUENCE_END: char = '\x02';
Running this should yield something like:
$ cargo run --bin example
Ninathom
Plans for 1.0
- This was one of my first Rust projects, which I just cleaned up a little. I'll probably be changing the API to make it a little more ergonomic before the 1.0.0 release
- Proper multi-threading support (I think just cloning GraphSteppers and using them in different tasks should already work, but haven't actually tried it)
- Supporting multiple kinds of components (strings in addition to chars)
- Generic implementation to allow for String and char vertices (currently only chars are supported since this fit my original use-case of generating names)