[][src]Crate rustfst

Rustfst

Rust implementation of Weighted Finite States Transducers.

Rustfst is a library for constructing, combining, optimizing, and searching weighted finite-state transducers (FSTs). Weighted finite-state transducers are automata where each transition has an input label, an output label, and a weight. The more familiar finite-state acceptor is represented as a transducer with each transition's input and output label equal. Finite-state acceptors are used to represent sets of strings (specifically, regular or rational sets); finite-state transducers are used to represent binary relations between pairs of strings (specifically, rational transductions). The weights can be used to represent the cost of taking a particular transition.

FSTs have key applications in speech recognition and synthesis, machine translation, optical character recognition, pattern matching, string processing, machine learning, information extraction and retrieval among others. Often a weighted transducer is used to represent a probabilistic model (e.g., an n-gram model, pronunciation model). FSTs can be optimized by determinization and minimization, models can be applied to hypothesis sets (also represented as automata) or cascaded by finite-state composition, and the best results can be selected by shortest-path algorithms.

References

Implementation heavily inspired from Mehryar Mohri's, Cyril Allauzen's and Michael Riley's work :

Example

use failure::Fallible;
use rustfst::prelude::*;

fn main() -> Fallible<()> {
    // Creates a empty wFST
    let mut fst = VectorFst::new();

    // Add some states
    let s0 = fst.add_state();
    let s1 = fst.add_state();
    let s2 = fst.add_state();

    // Set s0 as the start state
    fst.set_start(s0)?;

    // Add an arc from s0 to s1
    fst.add_arc(s0, Arc::new(3, 5, TropicalWeight::new(10.0), s1))?;

    // Add an arc from s0 to s2
    fst.add_arc(s0, Arc::new(5, 7, TropicalWeight::new(18.0), s2))?;

    // Set s1 and s2 as final states
    fst.set_final(s1, TropicalWeight::new(31.0))?;
    fst.set_final(s2, TropicalWeight::new(45.0))?;

    // Iter over all the paths in the wFST
    for p in fst.paths_iter() {
         println!("{:?}", p);
    }

    // A lot of operations are available to modify/optimize the FST.
    // Here are a few examples :

    // - Remove useless states.
    connect(&mut fst)?;

    // - Optimize the FST by merging states with the same behaviour.
    minimize(&mut fst, true)?;

    // - Copy all the input labels in the output.
    project(&mut fst, ProjectType::ProjectInput);

    // - Remove epsilon transitions.
    fst = rm_epsilon(&fst)?;

    // - Compute an equivalent FST but deterministic.
    fst = determinize(&fst, DeterminizeType::DeterminizeFunctional)?;

    Ok(())
}

Status

A big number of algorithms are already implemented. The main ones missing are:

  • Composition
  • Replacement

Also, all the algorithms are implemented in a static fashion, some work is necessary to add support for dynamic fsts.

Modules

algorithms

Provides algorithms that are generic to all wFST.

fst_impls

Implementation of the wFST traits with different data structures.

fst_properties

Provides the FstProperties struct and some utils functions around it. Useful to assert some properties on a Fst.

fst_traits

Provides traits that must be implemented to be able to use generic algorithms.

prelude
semirings

Provides a trait that shall be implemented for all weights stored inside a wFST.

utils

A few utilities to manipulate wFSTs.

Macros

fst

Creates a linear Fst containing the arguments.

fst_path

Creates a Path containing the arguments.

symt

Creates a SymbolTable containing the arguments.

Structs

Arc

Structure representing a transition from a state to another state in a FST.

DrawingConfig

Struct to configure how the FST should be drawn.

FstPath

Structure representing a path in a FST (list of input labels, list of output labels and total weight).

SymbolTable

A symbol table stores a bidirectional mapping between arc labels and "symbols" (strings).

Constants

EPS_LABEL

Epsilon label representing the epsilon transition (empty transition) = 0.

EPS_SYMBOL

Epsilon symbol representing the epsilon transition (empty transition) = <eps>.

Type Definitions

Label

Type used for the input label and output label of an arc in a wFST -> usize

StateId

Type used to identify a state in a wFST -> usize

Symbol

Symbol to map in the Symbol Table -> String