Expand description
Fast and portable HFST-compatible finite-state transducers.
An implementation of finite-state transducers mostly compatible with HFST. Provides the optional accelerated back-end for kfst. Able to load and execute Voikko and Omorfi: see kfst for transducers converted to a compatible format as well as Python bindings. Supports the ATT format and its own KFST format.
To convert HFST (optimized lookup or otherwise) to ATT using HFST’s tools, do:
hfst-fst2txt transducer.hfst -o transducer.attGiven the Voikko transducer in KFST or ATT format, one could create a simple analyzer like this:
use kfst_rs::{FSTState, FST};
use std::io::{self, Write};
// Read in transducer
let fst = FST::from_kfst_file(pathtovoikko, true).unwrap();
// Alternatively, for ATT use FST::from_att_file
// Read in word to analyze
let mut buffer = String::new();
let stdin = io::stdin();
stdin.read_line(&mut buffer).unwrap();
buffer = buffer.trim().to_string();
// Do analysis proper
match fst.lookup(&buffer, FSTState::<()>::default(), true) {
Ok(result) => {
for (i, analysis) in result.into_iter().enumerate() {
println!("Analysis {}: {} ({})", i+1, analysis.0, analysis.1)
}
},
Err(err) => println!("No analysis: {:?}", err),
}Given the input “lentokoneessa”, this gives the following analysis:
Analysis 1: [Lt][Xp]lentää[X]len[Ln][Xj]to[X]to[Sn][Ny][Bh][Bc][Ln][Xp]kone[X]konee[Sine][Ny]ssa (0)Structs§
- FST
- A finite state transducer. Constructed using FST::from_kfst_bytes or FST::from_att_rows from an in-memory representation or FST::from_att_file and FST::from_kfst_file from the file system.
- Flag
Diacritic Symbol - A Symbol representing a flag diacritic. Flag diacritics allow making state transitions depend on externally kept state, thus often making transducers smaller. The symbol consist of three parts:
- FlagMap
- The flag state of an FSTState:
- InternalFST
State - A state in an FST. Not only does this contain the state number itself, but also the path weight so far, the output symbol sequence and the input and output flag state. InternalFSTState carries a type parameter for the input indices. There are cases where the input indices are useful, notably for FST::lookup_aligned. However, if you do not want to use that method, you can get away with passing the unit type. This causes the book-keeping relating to indices to be compiled away.
- RawSymbol
- A Symbol type that has a signaling byte (the first one) and 14 other bytes to dispose of as the caller wishes. This odd size is such that Symbol can be 16 bytes long: a 1-byte discriminant + 15 bytes. (The Symbol::Flag variant forces Symbol to be at least 16 bytes.)
- String
Symbol - A symbol that holds an interned string and the information of whether it should be seen as unknown (see is_unknown). The a copy of the interned string is held until the end of the program.
Enums§
- FSTLinked
List - The linked list used to represent the transduction outputs. A linked list is used here, as the transduced sequences of states share prefixes It is internally in reverse order. the IntoIterator clones items into temporary storage.
- Flag
Diacritic Type - The different types of flag diacritic supported by kfst_rs.
- Special
Symbol - The three possible HFST special symbols.
- Symbol
- A wrapper enum for different concrete symbol types. It exists to provide a dense tagged union avoiding dynamic dispatch. It also deals with converting symbols between Rust and Python when using kfst_rs as a Python library. (crate feature “python”)
Traits§
Functions§
- from_
symbol_ string - Parse a string into a Symbol; see Symbol::parse for implementation details.