Crate wordfreq_model
source ·Expand description
wordfreq-model
This crate provides a loader for pre-compiled wordfreq models,
allowing you to easily create WordFreq
instances for various languages.
Instructions
The provided models are the same as those distributed in the original Python package. See the original documentation for the supported languages and their sources.
You need to specify models you want to use with features
.
The feature names are in the form of large-xx
or small-xx
, where xx
is the language code.
For example, if you want to use the large-English and small-Japanese models,
specify large-en
and small-ja
as follows:
# Cargo.toml
[dependencies.wordfreq-model]
version = "0.2"
features = ["large-en", "small-ja"]
There is no default feature. Be sure to specify features you want to use.
Examples
load_wordfreq
can create a WordFreq
instance from a ModelKind
enum value.
ModelKind
will have the specified feature names in CamelCase, such as LargeEn
or SmallJa
.
By default, only ModelKind::ExampleEn
appears for tests.
use approx::assert_relative_eq;
use wordfreq_model::load_wordfreq;
use wordfreq_model::ModelKind;
let wf = load_wordfreq(ModelKind::ExampleEn).unwrap();
assert_relative_eq!(wf.word_frequency("las"), 0.25);
assert_relative_eq!(wf.word_frequency("vegas"), 0.75);
assert_relative_eq!(wf.word_frequency("Las"), 0.25); // Standardized
Standardization
As the above example shows, the model automatically standardizes words before looking them up (i.e., Las
is handled as las
).
This is done by an instance Standardizer
set up in the WordFreq
instance.
load_wordfreq
automatically sets up an appropriate Standardizer
instance for each language.
Notes
This crate downloads specified model files and embeds the models directly into the source code. Specify as many models as you need to avoid extra downloads and bloating the resulting binary.
The actual model files to be used are placed here together with the credits. If you do not desire automatic model downloads and binary embedding, you can create instances from these files directly. See the instructions in wordfreq.
Enums
- Supported model kinds.
Functions
- Loads a pre-compiled
WordFreq
model, setting up an appropriateStandardizer
instance.