Expand description
§ArcWeight - Weighted Finite State Transducers for Rust
A high-performance, type-safe library for constructing, combining, optimizing, and searching weighted finite state transducers (WFSTs) and automata.
§Overview
ArcWeight provides a toolkit for working with finite state machines, from simple acceptors to complex weighted transducers. Built with performance and correctness in mind, it leverages Rust’s type system to ensure compile-time safety while maintaining the flexibility needed for FST operations.
The library implements the theoretical framework of weighted finite-state transducers as formalized by Mohri, providing efficient algorithms for composition, determinization, minimization, and shortest-path computation over arbitrary semirings.
§Key Features
§FST Support
- Multiple implementations:
fst::VectorFstfor construction,fst::ConstFstfor deployment - Lazy evaluation: On-demand computation with
fst::LazyFstImpl - Memory efficiency: Compact representations and caching strategies
§Extensible Semiring Library
- Wide variety of semirings, including:
- Extensible framework: Implement custom semirings via traits
§Extensive Algorithm Library
- Core operations:
algorithms::compose,algorithms::concat,algorithms::union,algorithms::closure - Optimizations:
algorithms::minimize,algorithms::determinize,algorithms::remove_epsilons - Path algorithms:
algorithms::shortest_path,algorithms::randgen - Additional transforms:
algorithms::synchronize,algorithms::push_weights
§Property Analysis
- Automatic property tracking: Detect determinism, cyclicity, connectivity
- Optimization guidance: Algorithm selection based on FST properties
- Efficient computation: $
O(|V| + |E|)$ property analysis
§Quick Start
use arcweight::prelude::*;
// Build a simple acceptor for the pattern "ab+"
let mut fst = VectorFst::<TropicalWeight>::new();
let s0 = fst.add_state();
let s1 = fst.add_state();
let s2 = fst.add_state();
fst.set_start(s0);
fst.set_final(s2, TropicalWeight::one());
// Add transitions: 'a' -> 'b' -> 'b'*
fst.add_arc(s0, Arc::new('a' as u32, 'a' as u32, TropicalWeight::new(1.0), s1));
fst.add_arc(s1, Arc::new('b' as u32, 'b' as u32, TropicalWeight::new(0.5), s2));
fst.add_arc(s2, Arc::new('b' as u32, 'b' as u32, TropicalWeight::new(0.5), s2));
// Find shortest accepting path
let shortest: VectorFst<TropicalWeight> = shortest_path(&fst, ShortestPathConfig::default())?;
// Minimize the FST
let minimal: VectorFst<TropicalWeight> = minimize(&fst)?;§Common Use Cases
§Building a Transducer
use arcweight::prelude::*;
// Create a transducer that uppercases ASCII letters
let mut fst = VectorFst::<TropicalWeight>::new();
let state = fst.add_state();
fst.set_start(state);
fst.set_final(state, TropicalWeight::one());
// Add lowercase->uppercase mappings
for c in b'a'..=b'z' {
let lower = c as u32;
let upper = (c - 32) as u32; // ASCII uppercase
fst.add_arc(state, Arc::new(lower, upper, TropicalWeight::one(), state));
}§Composing FSTs
use arcweight::prelude::*;
// Compose two transducers to create a pipeline
fn create_pipeline() -> Result<VectorFst<TropicalWeight>> {
let fst1 = VectorFst::<TropicalWeight>::new(); // First transducer
let fst2 = VectorFst::<TropicalWeight>::new(); // Second transducer
// Compose: output of fst1 feeds into input of fst2
compose_default(&fst1, &fst2)
}§Module Organization
The library is organized into focused modules, each handling a specific aspect of FST functionality:
§Core Modules
arc- Arc types representing FST transitionsfst- FST implementations and traitssemiring- Weight types and algebraic operations
§Algorithm Modules
algorithms- FST algorithms and transformationsproperties- Property computation and analysis
§Support Modules
io- Serialization and file format supportutils- Symbol tables and utility typesprelude- Convenient re-exports of common types
§Performance Guidelines
- Choose the right FST type: Use
fst::VectorFstfor construction,fst::ConstFstfor deployment - Leverage properties: Let the library detect and optimize based on FST properties
- Batch operations: Combine multiple operations when possible
- Use lazy evaluation: For large FSTs, consider lazy implementations
§Additional Topics
- Custom Semirings: Implement the
Semiringtrait for domain-specific weights - Lazy Computation: Use
fst::LazyFstImplfor on-demand arc generation - Memory Optimization: Employ
fst::CompactFstfor memory-constrained environments - Parallel Algorithms: Some algorithms support parallel execution (feature-gated)
§References
The algorithms and data structures in this library are based on foundational work in weighted finite-state transducer theory:
-
Mehryar Mohri. 1997. Finite-state transducers in language and speech processing. Computational Linguistics 23, 2 (June 1997), 269-311.
-
Mehryar Mohri, Fernando Pereira, and Michael Riley. 2002. Weighted finite-state transducers in speech recognition. Computer Speech & Language 16, 1 (2002), 69-88.
-
Cyril Allauzen, Michael Riley, Johan Schalkwyk, Wojciech Skut, and Mehryar Mohri. 2007. OpenFst: A general and efficient weighted finite-state transducer library. In Proceedings of the 12th International Conference on Implementation and Application of Automata (CIAA’07). Springer-Verlag, Berlin, Heidelberg, 11-23.
Re-exports§
pub use arc::Arc;pub use arc::ArcIterator;pub use fst::ExpandedFst;pub use fst::Fst;pub use fst::MutableFst;pub use fst::VectorFst;pub use semiring::BooleanWeight;pub use semiring::ProbabilityWeight;pub use semiring::Semiring;pub use semiring::TropicalWeight;
Modules§
- algorithms
- Algorithms for weighted finite-state transducers.
- arc
- Arc
- fst
- Finite State Transducer (FST) implementations for weighted automata.
- gpu
- GPU acceleration for FST operations.
- io
- FST serialization and deserialization in multiple formats.
- macros
- Macros
- optimization
- Performance optimization utilities for ArcWeight.
- prelude
- Prelude
- properties
- FST properties computation and tracking.
- semiring
- Semiring implementations for weighted finite-state transducers.
- utils
- Utility types and data structures for FST operations.
Macros§
- fst
- Creates a
VectorFst<TropicalWeight>from a declarative specification. - fst_
path - Creates an
FstPathrepresenting a path through an FST. - symt
- Creates a
SymbolTablefrom a list of symbol strings.
Enums§
- Error
- Library-wide error type for FST operations.
Type Aliases§
- Result
- Library-wide result type for fallible FST operations.