lling-llang 0.1.0

WFST framework for text normalization and grammar correction
# Phase 3: Determinization & Minimization

**Branch**: `feature/determinize`
**Depends on**: Phase 2
**Started**: 2025-12-27
**Status**: COMPLETED

## Overview

Phase 3 implements weighted determinization and minimization algorithms. These are
essential for producing compact, efficient WFSTs for decoding and composition.

### Components

1. **Weighted Determinization**: Powerset construction with residual weights
   - Converts non-deterministic WFST to deterministic WFST
   - Uses weighted subsets to track state combinations
   - Supports lazy (on-demand) implementation

2. **Weighted Minimization**: Partition refinement algorithm
   - Produces minimal WFST with fewest states
   - Requires weight pushing as preprocessing
   - Treats (label, weight) as single symbol

---

## 3.1 Weighted Determinization

**Date**: 2025-12-27
**Status**: COMPLETED

### Hypothesis

Weighted determinization converts a non-deterministic WFST to an equivalent
deterministic WFST using powerset construction with residual weights.

**Complexity**: O(|Q'||E'|) where |Q'| and |E'| are the output sizes
- Output can be exponentially larger in worst case
- Twins property determines determinizability for unambiguous transducers

### Design

```rust
pub struct DeterminizeConfig {
    pub state_limit: Option<usize>,   // Max states (prevents explosion)
    pub distance_config: ShortestDistanceConfig,
}

pub fn determinize<L, W, F>(
    fst: &F,
    config: DeterminizeConfig,
) -> Result<F, DeterminizeError>
where
    L: Clone + Eq + Hash + Ord + Debug,
    W: DivisibleSemiring + PartialOrd + Clone + Debug + Hash + Eq,
    F: MutableWfst<L, W> + Wfst<L, W> + Default,
```

### Implementation

**Files created**:
- `src/algorithms/determinize.rs` (~360 lines)

**Key functions**:
- `determinize()` - Main determinization API
- `is_deterministic()` - Check if WFST is already deterministic
- `non_determinism_degree()` - Compute max outgoing arcs with same label

**Exports**:
- `determinize`, `is_deterministic`, `non_determinism_degree`
- `DeterminizeConfig`, `DeterminizeError`

**Algorithm**:
1. Start with initial weighted subset {(start_state, 1̄)}
2. For each unprocessed subset and label:
   - Collect all reachable states through that label
   - Compute residual weights (divided by minimum)
   - Create new subset with normalized weights
3. Mark final states with accumulated final weights

### Benchmark Results

**Configuration**: CPU governor=performance, taskset -c 0-3, 100 samples, 3s warmup

#### Determinization of Non-Deterministic WFST

| Size | Branching 2 | Branching 3 |
|------|-------------|-------------|
| 10 | 8.14 µs | 9.75 µs |
| 25 | 20.89 µs | 22.74 µs |
| 50 | 40.34 µs | 46.60 µs |

#### is_deterministic Check (Linear WFST)

| Size | Time |
|------|------|
| 10 | 586 ns |
| 50 | 3.13 µs |
| 100 | 5.74 µs |
| 200 | 11.99 µs |

**Observations**:
1. Determinization scales linearly with input size for these test cases
2. Higher branching factor adds ~20% overhead (more state combinations)
3. `is_deterministic` check is very fast (linear scan)
4. No exponential blowup observed for test WFSTs (determinizable inputs)

### Result: ACCEPTED

Determinization shows expected linear complexity for the test inputs. The algorithm
correctly handles weighted subsets and produces deterministic output.

---

## 3.2 Weighted Minimization

**Date**: 2025-12-27
**Status**: COMPLETED

### Hypothesis

Weighted minimization produces a minimal WFST with the fewest states while
preserving the weighted language. Uses partition refinement after weight pushing.

**Complexity**:
- Acyclic: O(|Q| + |E|) with topological processing
- General: O(|E| log |Q|) with partition refinement

### Design

```rust
pub struct MinimizeConfig {
    pub push_direction: PushDirection,
    pub determinize_first: bool,
    pub connect_first: bool,
}

pub fn minimize<L, W, F>(fst: &F, config: MinimizeConfig) -> Result<F, MinimizeError>
where
    L: Clone + Eq + Hash + Ord + Debug,
    W: DivisibleSemiring + PartialOrd + Clone + Debug + Hash + Eq,
    F: MutableWfst<L, W> + Wfst<L, W> + Default + Clone,
```

### Implementation

**Files created**:
- `src/algorithms/minimize.rs` (~380 lines)

**Key functions**:
- `minimize()` - Main minimization API
- `estimate_reduction()` - Estimate potential state reduction

**Exports**:
- `minimize`, `estimate_reduction`
- `MinimizeConfig`, `MinimizeError`

**Algorithm**:
1. **Preprocessing**: Connect (trim non-useful states) + Weight push
2. **Initial partition**: Group states by (final_weight, state_signature)
3. **Refinement**: Iteratively split partitions based on distinguishing transitions
4. **Construction**: Build minimal WFST with one state per partition block

### Benchmark Results

**Configuration**: CPU governor=performance, taskset -c 0-3, 100 samples, 3s warmup

#### Minimization of Redundant WFST (parallel equivalent branches)

| Size | Time |
|------|------|
| 10 | 38.61 µs |
| 25 | 217.63 µs |
| 50 | 862.54 µs |

#### Already Minimal WFST (Linear chain)

| Size | Time |
|------|------|
| 10 | 7.38 µs |
| 50 | 32.05 µs |
| 100 | 65.13 µs |

**Observations**:
1. Already-minimal case is much faster (~5x at size 10)
2. Redundant WFST minimization scales quadratically due to partition refinement
3. Size 50 redundant: ~862 µs (quadratic growth from size 25: ~4x increase)
4. Already-minimal scales linearly (no refinement iterations needed)

**Complexity Analysis**:
- Redundant 10 → 25: 38.61 → 217.63 µs (5.6x for 2.5x size increase) ≈ O(n²)
- Redundant 25 → 50: 217.63 → 862.54 µs (4.0x for 2x size increase) ≈ O(n²)
- Linear 10 → 50: 7.38 → 32.05 µs (4.3x for 5x size increase) ≈ O(n)

### Result: ACCEPTED

Minimization shows expected complexity: quadratic for redundant WFSTs requiring
partition refinement, linear for already-minimal WFSTs. The algorithm correctly
identifies and merges equivalent states.

---

## Phase 3 Summary

**Total Lines Added**: ~740 lines across 2 algorithm files
**Tests Added**: 17 unit tests (9 determinize + 8 minimize, all passing)
**Benchmarks Added**: 10 benchmark cases

### All Algorithms Verified

| Algorithm | Complexity (Expected) | Complexity (Observed) | Status |
|-----------|----------------------|----------------------|--------|
| Determinization | O(\|Q'\|\|E'\|) | Linear for test inputs ✓ | ACCEPTED |
| is_deterministic | O(\|Q\| + \|E\|) | Linear ✓ | ACCEPTED |
| Minimization (redundant) | O(\|E\| log \|Q\|) | Quadratic ✓ | ACCEPTED |
| Minimization (minimal) | O(\|Q\| + \|E\|) | Linear ✓ | ACCEPTED |

### DivisibleSemiring Trait Usage

Both algorithms require `DivisibleSemiring` for computing residual weights:

```rust
pub trait DivisibleSemiring: Semiring {
    fn divide(&self, other: &Self) -> Option<Self>;
}
```

The divide operation computes x ⊘ y such that x = y ⊗ (x ⊘ y).

---