CMUSphinx pronunciation dictionary
This is a fork of the original rust library for getting pronunciations from the CMUSphinx pronunciation dictionary.
The major changes are:
- switched from the depreciated failure crate to thiserror based errors
- switched to rust 2018 in the cargo config
- removed the utility function to download
- removed a lot of dependencies that were either no longer needed or depreciated
- fixed hung lookup bug (due to the new internals (I still don't know what caused it))
- changed the internals so file i/o wasn't needed for every lookup (and as a result simplified them)
The last of those changes is where the name comes from - as a result of the entire dictionary being loaded into volitile memory, each individual lookup is much faster than the original crate (O(k) where k is the maximum length of all words in the dictionary). Of course, there's the downside that it takes longer to create the object and uses more memory, but for the application I initially created this for the lookup time was essential. To adress the obvious, yes, this is slower than a hashmap since it uses a Radix tree internally (like the original crate). Still, it's much faster than the range-based file lookup or whatever you'd call it old crate used, while also being less complicated.
Installation
To use in your rust
project, add the following to your Cargo.toml
:
[]
= "0.5"
Usage
To use the dictionary, you have to get an instance of the Cmudict
struct:
use Cmudict;
You can retrieve the pronunciation for a word like this:
use Cmudict;