# Interpolation Strategies
This document describes the interpolation strategies available for combining n-gram and embedding scores in hybrid models.
## Overview
Hybrid models combine two probability sources:
- **N-gram model**: Statistical counts from training data
- **Embedding model**: Distributed semantic representations
Interpolation strategies determine how these are combined.
## Available Strategies
### Linear Interpolation
```rust
use libgrammstein::hybrid::{HybridConfig, InterpolationStrategy};
let config = HybridConfig {
strategy: InterpolationStrategy::Linear { alpha: 0.7 },
..Default::default()
};
```
**Formula**:
```
**Characteristics**:
- Simple and predictable
- Direct probability combination
- Alpha = 1.0 → pure n-gram
- Alpha = 0.0 → pure embedding
**Best for**:
- General-purpose models
- When both components are reliable
- Default starting point
### Log-Linear Interpolation
```rust
let config = HybridConfig {
strategy: InterpolationStrategy::LogLinear { alpha: 0.7 },
..Default::default()
};
```
**Formula**:
```
**Characteristics**:
- Combines in log space
- Geometric mean behavior
- More stable numerically
- Sharper probability distinctions
**Best for**:
- Models with different scales
- When one model is very confident
- Preventing probability extremes
### N-gram with Embedding Fallback
```rust
let config = HybridConfig {
strategy: InterpolationStrategy::NgramWithEmbeddingFallback,
..Default::default()
};
```
**Logic**:
```rust
if ngram.in_vocabulary(word) {
P_ngram(w|c)
} else {
P_embed(w|c)
}
```
**Characteristics**:
- Uses n-gram when data exists
- Falls back to embeddings for OOV
- No interpolation overhead for known words
**Best for**:
- Large n-gram training corpora
- When n-gram quality is high
- OOV-heavy inference scenarios
### Dynamic Interpolation
```rust
let config = HybridConfig {
strategy: InterpolationStrategy::Dynamic {
base_alpha: 0.7,
oov_alpha: 0.3,
},
..Default::default()
};
```
**Logic**:
```rust
let alpha = if ngram.in_vocabulary(word) {
base_alpha
} else {
oov_alpha // More weight to embeddings for OOV
};
**Characteristics**:
- Adapts based on word knowledge
- Higher embedding weight for OOV
- Smooth transition between modes
**Best for**:
- Mixed vocabulary scenarios
- Domain adaptation
- Robustness across text types
## Choosing a Strategy
```
┌─────────────────────────┐
│ OOV rate high? │
└───────────┬─────────────┘
│
┌────────yes──────────┴──────────no─────────┐
│ │
▼ ▼
┌─────────────────────┐ ┌─────────────────────┐
│ Dynamic or │ │ N-gram data │
│ NgramFallback │ │ quality high? │
└─────────────────────┘ └──────────┬──────────┘
│
┌─────yes────────┴──────no───────┐
│ │
▼ ▼
┌────────────────┐ ┌────────────────┐
│ Linear α=0.8 │ │ Linear α=0.5 │
│ or LogLinear │ │ or Dynamic │
└────────────────┘ └────────────────┘
```
## Alpha Tuning
### Grid Search
```rust
fn tune_alpha(
ngram: &NgramModel<D>,
embedding: &SubwordEmbedding,
dev_set: &[Vec<String>],
) -> f64 {
let mut best_alpha = 0.5;
let mut best_perplexity = f64::INFINITY;
for alpha in (0..=10).map(|i| i as f64 / 10.0) {
let config = HybridConfig {
strategy: InterpolationStrategy::Linear { alpha },
..Default::default()
};
let model = HybridLanguageModel::new(
ngram.clone(),
embedding.clone(),
config,
);
let ppl = evaluate_perplexity(&model, dev_set);
if ppl < best_perplexity {
best_perplexity = ppl;
best_alpha = alpha;
}
}
best_alpha
}
```
### Recommended Values
| Large n-gram corpus | Linear | 0.8 - 0.9 |
| Small n-gram corpus | Linear | 0.5 - 0.7 |
| High OOV rate | Dynamic | base=0.7, oov=0.3 |
| Production (stable) | NgramFallback | N/A |
| Domain adaptation | LogLinear | 0.6 - 0.8 |
## Implementation Details
### Score Computation
```rust
impl<D> HybridLanguageModel<D> {
pub fn score(&self, word: &str, context: &[&str]) -> f64 {
match &self.config.strategy {
Linear { alpha } => {
let ngram_prob = self.ngram.log_prob(word, context).exp();
let embed_prob = self.embedding_prob(word, context).exp();
(alpha * ngram_prob + (1.0 - alpha) * embed_prob).ln()
}
LogLinear { alpha } => {
let ngram_log = self.ngram.log_prob(word, context);
let embed_log = self.embedding_log_prob(word, context);
alpha * ngram_log + (1.0 - alpha) * embed_log
}
NgramWithEmbeddingFallback => {
if self.ngram.in_vocabulary(word) {
self.ngram.log_prob(word, context)
} else {
self.embedding_log_prob(word, context)
}
}
Dynamic { base_alpha, oov_alpha } => {
let alpha = if self.ngram.in_vocabulary(word) {
*base_alpha
} else {
*oov_alpha
};
let ngram_prob = self.ngram.log_prob(word, context).exp();
let embed_prob = self.embedding_prob(word, context).exp();
(alpha * ngram_prob + (1.0 - alpha) * embed_prob).ln()
}
}
}
}
```
### Embedding Probability
Converting embedding similarity to probability:
```rust
fn embedding_prob(&self, word: &str, context: &[&str]) -> f64 {
if context.is_empty() {
// Uniform over vocabulary
return 1.0 / self.embedding.vocab_size() as f64;
}
// Compute context vector (average of context words)
let context_vec = self.context_vector(context);
// Similarity to word
let word_vec = self.embedding.word_vector(word);
let sim = cosine_similarity(&word_vec, &context_vec);
// Convert to probability via softmax over vocabulary
self.softmax_probability(word, &context_vec, sim)
}
```
## Caching
Interpolated scores are cached:
```rust
pub struct HybridLanguageModel<D> {
// ...
cache: Mutex<LruCache<CacheKey, f64>>,
}
impl<D> HybridLanguageModel<D> {
pub fn score(&self, word: &str, context: &[&str]) -> f64 {
let key = CacheKey::new(word, context);
// Check cache
if let Ok(mut cache) = self.cache.lock() {
if let Some(&score) = cache.get(&key) {
return score;
}
}
// Compute score
let score = self.compute_score(word, context);
// Update cache
if let Ok(mut cache) = self.cache.lock() {
cache.put(key, score);
}
score
}
}
```
## See Also
- [OOV Handling](oov-handling.md) - Out-of-vocabulary strategies
- [Hybrid API](../../api/hybrid.md) - Complete API reference
- [Hybrid Training](../../training/hybrid.md) - Training guide