lling-llang 0.1.0

WFST framework for text normalization and grammar correction
//! Optimization algorithms for WFSTs.
//!
//! This module provides specialized optimization techniques identified from
//! research on WFST-based systems, particularly for speech recognition.
//!
//! ## Overview
//!
//! | Optimization | Description | Speedup |
//! |--------------|-------------|---------|
//! | Log-Semiring Pushing | Stochastic normalization for beam search | Up to 18× |
//! | Token Grouping | Lazy evaluation for on-the-fly composition | 10-20× fewer ops |
//! | N-gram Back-off | Compact LM representation | Avoids O(|V|²) |
//!
//! ## Log-Semiring Weight Pushing
//!
//! Weight pushing in the **log semiring** (not tropical!) has a significant impact
//! on beam search pruning efficacy. Unlike tropical pushing which uses min-weight
//! potentials, log pushing uses the sum of all path probabilities, creating a
//! stochastic automaton where weights at each state sum to 1.
//!
//! This "synchronizes" acoustic likelihoods with transducer probabilities, providing
//! optimal likelihood ratio decisions for pruning.
//!
//! ### Reference
//!
//! Mohri, Pereira, Riley (2002): "Weight pushing in the log semiring has a very
//! large beneficial impact on the pruning efficacy of a standard Viterbi beam search"
//!
//! ## Token Grouping (LET-Decoder)
//!
//! For on-the-fly composition scenarios, tokens with the same HCLG-state but different
//! grammar states can be grouped together. Expansion is deferred until word boundaries,
//! avoiding redundant operations for tokens that will be pruned anyway.
//!
//! ### Reference
//!
//! Lv et al. (2023): "LET-Decoder: Lazy-evaluation Token-group Decoder"
//!
//! ## N-gram Back-off Structure
//!
//! For large vocabulary language models, directly representing all n-grams creates
//! O(|V|²) transitions. Using back-off states with ε-transitions to lower-order
//! n-grams keeps the graph compact while preserving the language model distribution.

pub mod log_push;
pub mod lookahead;
pub mod ngram_backoff;
pub mod token_group;

pub use log_push::{
    apply_log_push, compute_log_potentials, prepare_for_beam_search, BeamSearchPrepResult,
    LogPushConfig,
};
pub use lookahead::{build_lookahead_table, LookaheadConfig, LookaheadTable};
pub use ngram_backoff::{
    compute_size_reduction, BackoffWeight, BigramLm, BigramStats, NgramEntry, NgramLmBuilder,
    NgramLmConfig, NgramStats, PruningStrategy, SizeReduction, VocabId, BOS_ID, EOS_ID, UNK_ID,
};
pub use token_group::{
    ArcId, BucketQueue, GroupLink, GroupedFrame, Token, TokenGroup, TokenGroupConfig, TokenGroupId,
    TokenGroupManager, TokenGroupPool, TokenGroupStats, TokenId,
};