1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
//! Rust implementation of Weighted Finite States Transducers.
//!
//! Rustfst is a library for constructing, combining, optimizing, and searching weighted
//! finite-state transducers (FSTs). Weighted finite-state transducers are automata where
//! each transition has an input label, an output label, and a weight.
//! The more familiar finite-state acceptor is represented as a transducer
//! with each transition's input and output label equal. Finite-state acceptors
//! are used to represent sets of strings (specifically, regular or rational sets);
//! finite-state transducers are used to represent binary relations between pairs of
//! strings (specifically, rational transductions). The weights can be used to represent
//! the cost of taking a particular transition.
//!
//! FSTs have key applications in speech recognition and synthesis, machine translation,
//! optical character recognition, pattern matching, string processing, machine learning,
//! information extraction and retrieval among others. Often a weighted transducer is used to
//! represent a probabilistic model (e.g., an n-gram model, pronunciation model). FSTs can be
//! optimized by determinization and minimization, models can be applied to hypothesis sets
//! (also represented as automata) or cascaded by finite-state composition, and the best
//! results can be selected by shortest-path algorithms.
//!
//! ![fst](https://raw.githubusercontent.com/Garvys/rustfst-images-doc/master/images/project_in.svg?sanitize=true)
//!
//! ## Overview
//!
//! For a basic [example](#example) see the section below.
//!
//! Some simple and commonly encountered types of FSTs can be easily
//! created with the macro [`fst`] or the functions
//! [`acceptor`](utils::acceptor) and
//! [`transducer`](utils::transducer).
//!
//! For more complex cases you will likely start with the
//! [`VectorFst`](fst_impls::VectorFst) type, which will be imported
//! in the [`prelude`] along with most everything else you need.
//! [`VectorFst<TropicalWeight>`](fst_impls::VectorFst) corresponds
//! directly to the OpenFST `StdVectorFst`, and can be used to load
//! its files using [`read`](fst_traits::SerializableFst::read) or
//! [`read_text`](fst_traits::SerializableFst::read_text).
//!
//! Because "iteration" over an FST can mean many different things,
//! there are a variety of different iterators.  To iterate over state
//! IDs you may use
//! [`states_iter`](fst_traits::StateIterator::states_iter), while to
//! iterate over transitions out of a state, you may use
//! [`get_trs`](fst_traits::CoreFst::get_trs).  Since it is common to
//! iterate over both, this can be done using
//! [`fst_iter`](fst_traits::FstIterator::fst_iter) or
//! [`fst_into_iter`](fst_traits::FstIntoIterator::fst_into_iter).  It
//! is also very common to iterate over paths accepted by an FST,
//! which can be done with
//! [`paths_iter`](fst_traits::Fst::paths_iter), and as a convenience
//! for generating text,
//! [`string_paths_iter`](fst_traits::Fst::string_paths_iter).
//! Alternately, in the case of a linear FST, you may retrieve the
//! only possible path with
//! [`decode_linear_fst`](utils::decode_linear_fst).
//!
//! Note that iterating over paths is not the same thing as finding
//! the *shortest* path or paths, which is done with
//! [`shortest_path`](algorithms::shortest_path) (for a single path)
//! or
//! [`shortest_path_with_config`](algorithms::shortest_path_with_config)
//! (for N-shortest paths).
//!
//! For the complete list of algorithms, see the [`algorithms`] module.
//!
//! You may now be wondering, especially if you have previously used
//! such linguist-friendly tools as
//! [pyfoma](https://github.com/mhulden/pyfoma), "what if I just want
//! to *transduce some text*???"  The unfriendly answer is that
//! rustfst is a somewhat lower-level library, designed for
//! implementing things like speech recognizers.  The somewhat more
//! helpful answer is that you would do this by constructing an
//! [`acceptor`](utils::acceptor) for your input, which you will
//! [`compose`](algorithms::compose) with a
//! [`transducer`](utils::transducer), then
//! [`project`](algorithms::project) the result [to its
//! output](algorithms::ProjectType::ProjectOutput), and finally
//! [iterate over the paths](fst_traits::Fst::string_paths_iter) in
//! the resulting FST.
//!
//! ## References
//!
//! Implementation heavily inspired from Mehryar Mohri's, Cyril Allauzen's and Michael Riley's work :
//! - [Weighted automata algorithms](https://cs.nyu.edu/~mohri/pub/hwa.pdf)
//! - [The design principles of a weighted finite-state transducer library](https://core.ac.uk/download/pdf/82101846.pdf)
//! - [OpenFst: A general and efficient weighted finite-state transducer library](https://link.springer.com/chapter/10.1007%2F978-3-540-76336-9_3)
//! - [Weighted finite-state transducers in speech recognition](https://repository.upenn.edu/cgi/viewcontent.cgi?article=1010&context=cis_papers)
//!
//! The API closely resembles that of OpenFST, with some
//! simplifications and changes to make it more idiomatic in Rust, notably
//! the use of `Tr` instead of `Arc`.  See [Differences from
//! OpenFST](#differences-from-openfst) for more information.
//!
//! ## Example
//!
//! ```rust
//! use anyhow::Result;
//! use rustfst::prelude::*;
//! use rustfst::algorithms::determinize::{DeterminizeType, determinize};
//! use rustfst::algorithms::rm_epsilon::rm_epsilon;
//!
//! fn main() -> Result<()> {
//!     // Creates a empty wFST
//!     let mut fst = VectorFst::<TropicalWeight>::new();
//!
//!     // Add some states
//!     let s0 = fst.add_state();
//!     let s1 = fst.add_state();
//!     let s2 = fst.add_state();
//!
//!     // Set s0 as the start state
//!     fst.set_start(s0)?;
//!
//!     // Add a transition from s0 to s1
//!     fst.add_tr(s0, Tr::new(3, 5, 10.0, s1))?;
//!
//!     // Add a transition from s0 to s2
//!     fst.add_tr(s0, Tr::new(5, 7, 18.0, s2))?;
//!
//!     // Set s1 and s2 as final states
//!     fst.set_final(s1, 31.0)?;
//!     fst.set_final(s2, 45.0)?;
//!
//!     // Iter over all the paths in the wFST
//!     for p in fst.paths_iter() {
//!          println!("{:?}", p);
//!     }
//!
//!     // A lot of operations are available to modify/optimize the FST.
//!     // Here are a few examples :
//!
//!     // - Remove useless states.
//!     connect(&mut fst)?;
//!
//!     // - Optimize the FST by merging states with the same behaviour.
//!     minimize(&mut fst)?;
//!
//!     // - Copy all the input labels in the output.
//!     project(&mut fst, ProjectType::ProjectInput);
//!
//!     // - Remove epsilon transitions.
//!     rm_epsilon(&mut fst)?;
//!
//!     // - Compute an equivalent FST but deterministic.
//!     fst = determinize(&fst)?;
//!
//!     Ok(())
//! }
//! ```
//!
//! ## Differences from OpenFST
//!
//! Here is a non-exhaustive list of ways in which Rustfst's API
//! differs from OpenFST:
//!
//! - The default epsilon symbol is `<eps>` and not `<epsilon>`.
//! - Functions and methods follow Rust naming conventions,
//!   e.g. `add_state` rather than `AddState`, but are otherwise mostly
//!   equivalent, except that:
//! - Transitions are called `Tr` and not `Arc`, because `Arc` has a
//!   rather different and well-established meaning in Rust, and rustfst
//!   uses it (`std::sync::Arc`, that is) to reference-count symbol
//!   tables.  All associated functions also use `tr`.
//! - Final states are not indicated by a final weight of `zero`.  You
//!   can test for finality using [`is_final`](fst_traits::CoreFst::is_final), and
//!   [`final_weight`](fst_traits::CoreFst::final_weight) returns an [`Option`].  This
//!   requires some care when converting OpenFST code.
//! - Transitions can be accessed directly as a slice rather than requiring
//!   an iterator.
//! - Semiring operations are expressed as plain old methods rather
//!   than strange C++ things.  So write `w1.plus(w2)` rather than
//!   `Plus(w1, w2)`, for instance.
//! - Weights have in-place operations for ⊕
//!   ([`plus_assign`](Semiring::plus_assign)) and ⊗
//!   ([`times_assign`](Semiring::times_assign)).
//! - Most of the type aliases (which would be trait aliases in Rust) such
//!   as `StdArc`, `StdFst`, and so forth, are missing, but type inference
//!   allows us to avoid explicit type arguments in most cases, such as
//!   when calling [`Tr::new`], for instance.
//! - State IDs are unsigned, with [`NO_STATE_ID`] used for a missing value.
//!   They are also 32 bits by default (presumably, 4 billion states
//!   is enough for most applications).  This means you must take care to
//!   cast them to [`usize`] when using them as indices, and vice-versa,
//!   preferably checking for overflows
//! - Symbol IDs are also unsigned and 32-bits, with [`NO_LABEL`] used
//!   for a missing value.
//! - Floating-point weights are not generic, so are always single-precision.

#[warn(missing_docs)]
#[cfg(test)]
extern crate counter;
#[macro_use]
extern crate anyhow;
#[cfg(test)]
extern crate serde;
#[cfg(test)]
extern crate serde_json;

pub use crate::drawing_config::DrawingConfig;
pub use crate::fst_path::{check_path_in_fst, FstPath};
pub use crate::string_path::StringPath;
pub use crate::symbol_table::SymbolTable;

pub use self::tr::Tr;
pub use self::trs::{Trs, TrsConst, TrsVec};

pub use crate::semirings::Semiring;
#[cfg(test)]
use doc_comment::doc_comment;

// When running `cargo test`, rustdoc will check this file as well.
#[cfg(test)]
doc_comment!(include_str!("../../README.md"));

#[cfg(test)]
mod tests_openfst;

mod symbol_table;

/// Type used for the input label and output label of a transition in a wFST -> usize
#[cfg(feature = "state-label-u32")]
pub type Label = u32;
#[cfg(not(feature = "state-label-u32"))]
pub type Label = usize;
/// Symbol to map in the Symbol Table -> String
pub type Symbol = String;

/// Type used to identify a state in a wFST -> usize
#[cfg(feature = "state-label-u32")]
pub type StateId = u32;
#[cfg(not(feature = "state-label-u32"))]
pub type StateId = usize;

/// Epsilon label representing the epsilon transition (empty transition) = `0`.
pub const EPS_LABEL: Label = 0;
/// Epsilon symbol representing the epsilon transition (empty transition) = `<eps>`.
pub const EPS_SYMBOL: &str = "<eps>";

/// A few utilities to manipulate wFSTs.
pub mod utils;

/// Provides algorithms that are generic to all Fst.
pub mod algorithms;

/// Provides the `FstProperties` struct and some utils functions around it.
/// Useful to assert some properties on a Fst.
pub mod fst_properties;
/// Implementation of the transitions inside a Fst.
mod tr;

#[macro_use]
/// Provides traits that must be implemented to be able to use generic algorithms.
pub mod fst_traits;
/// Implementation of the wFST traits with different data structures.
pub mod fst_impls;
/// Provides a trait that shall be implemented for all weights stored inside a wFST.
pub mod semirings;

mod drawing_config;
/// Implementation of a successful path inside a wFST.
mod fst_path;
mod parsers;
mod string_path;

pub use crate::parsers::nom_utils::NomCustomError;

/// A representable float near .001. (Used in Quantize)
pub const KDELTA: f32 = 1.0f32 / 1024.0f32;
/// Default tolerance value used in floating-point comparisons.
pub const KSHORTESTDELTA: f32 = 1e-6;

/// Module re-exporting most of the objects from this crate.
pub mod prelude {
    pub use crate::algorithms::tr_compares::*;
    pub use crate::algorithms::tr_mappers::*;
    pub use crate::algorithms::*;
    pub use crate::fst_impls::*;
    pub use crate::fst_traits::*;
    pub use crate::parsers::SerializeBinary;
    pub use crate::semirings::*;
    pub use crate::tr::Tr;
    pub use crate::trs::{Trs, TrsConst, TrsVec};
    pub use crate::*;
}

#[cfg(test)]
pub mod proptest_fst;

/// Used to indicate a transition with no label.
#[cfg(feature = "state-label-u32")]
pub static NO_LABEL: Label = std::u32::MAX;
#[cfg(not(feature = "state-label-u32"))]
pub static NO_LABEL: Label = std::usize::MAX;

/// Used to indicate a missing state ID.
#[cfg(feature = "state-label-u32")]
pub static NO_STATE_ID: StateId = std::u32::MAX;
#[cfg(not(feature = "state-label-u32"))]
pub static NO_STATE_ID: StateId = std::usize::MAX;

pub(crate) static UNASSIGNED: usize = std::usize::MAX;

/// Provides a trait used to access transitions from a state.
pub mod trs;
/// Provides a trait used to mutably access transitions from a state.
pub mod trs_iter_mut;