Expand description
§fynch
Fenchel-Young losses and differentiable sorting/ranking.
The name comes from Fenchel-Young losses (Blondel et al. 2020, JMLR), a unifying framework connecting prediction functions and loss functions through convex duality.
§The Fenchel-Young Framework
Given a regularizer Ω, the Fenchel-Young loss is:
L_Ω(θ; y) = Ω*(θ) - ⟨θ, y⟩ + Ω(y)where Ω* is the Fenchel conjugate. The prediction function is:
ŷ_Ω(θ) = ∇Ω*(θ) = argmax_p { ⟨θ, p⟩ - Ω(p) }Different regularizers give different behaviors:
| Regularizer Ω | Prediction ŷ_Ω | Loss L_Ω | Sparsity |
|---|---|---|---|
| Shannon negentropy | softmax | cross-entropy | Dense |
| ½‖·‖² (squared L2) | sparsemax | sparsemax loss | Sparse |
| Tsallis α-entropy | α-entmax | entmax loss | Tunable |
See the fenchel module for the generic framework.
§Differentiable Sorting/Ranking
Sorting and ranking are discontinuous—small input changes can cause large output changes (rank swaps). This breaks gradient-based optimization.
| Approach | Module | Regularization | Complexity |
|---|---|---|---|
| PAVA + Sigmoid | Root | L2 | O(n) / O(n²) |
| Sinkhorn OT | sinkhorn | Entropy (Shannon Ω) | O(n² × iter) |
Sinkhorn sorting is exactly FY with Shannon regularization applied to the permutation polytope (Birkhoff polytope).
§Key Functions
| Function | Purpose | Module |
|---|---|---|
pava | Isotonic regression | Root |
soft_rank | Continuous ranks | Root |
soft_sort | Continuous sorting | Root |
fenchel::softmax | Dense prediction | fenchel |
fenchel::sparsemax | Sparse prediction | fenchel |
fenchel::entmax | Tunable sparsity | fenchel |
§Quick Start
§Fenchel-Young Predictions
use fynch::fenchel::{softmax, sparsemax, entmax};
let theta = [2.0, 1.0, 0.1];
// Dense (softmax)
let p_soft = softmax(&theta);
assert!(p_soft.iter().all(|&x| x > 0.0));
// Sparse (sparsemax)
let p_sparse = sparsemax(&theta);
assert!(p_sparse.iter().any(|&x| x == 0.0));
// Tunable (1.5-entmax)
let p_ent = entmax(&theta, 1.5);§Fenchel-Young Losses
use fynch::fenchel::{Regularizer, Shannon, SquaredL2, Tsallis};
let theta = [2.0, 1.0, 0.1];
let y = [1.0, 0.0, 0.0]; // one-hot target
// Cross-entropy (Shannon)
let loss_ce = Shannon.loss(&theta, &y);
// Sparsemax-style FY loss via squared L2 regularizer
let loss_sp = SquaredL2.loss(&theta, &y);
// 1.5-entmax loss
let loss_ent = Tsallis::entmax15().loss(&theta, &y);§Differentiable Sorting
use fynch::{pava, soft_rank, soft_sort};
// PAVA: isotonic regression
let y = [3.0, 1.0, 2.0, 5.0, 4.0];
let monotonic = pava(&y); // [2.0, 2.0, 2.0, 4.5, 4.5]
// Soft ranking
let scores = [0.5, 0.2, 0.8, 0.1];
let ranks = soft_rank(&scores, 0.1).unwrap();§Learning to Rank
use fynch::loss::{spearman_loss, listnet_loss};
let pred = [0.9, 0.1, 0.5];
let target = [3.0, 1.0, 2.0];
let loss = spearman_loss(&pred, &target, 0.1);§Modules
| Module | Contents |
|---|---|
fenchel | Generic FY framework: regularizers, predictions, losses |
sinkhorn | Entropic OT for soft permutations |
loss | Learning-to-rank losses |
metrics | IR evaluation: MRR, NDCG, Hits@k |
§Connections
wass: Sinkhorn OT is the same algorithmlogp: Shannon entropy connects to information theory- information retrieval ecosystem: More comprehensive IR evaluation
§What Can Go Wrong
- Temperature too low: Numerical instability, vanishing gradients
- Temperature too high: Predictions become uniform, lose signal
- Sinkhorn not converging: Increase max_iter or epsilon
- Wrong regularizer: Use sparsemax for top-k, softmax for soft attention
§References
- Blondel, Martins, Niculae (2020). “Learning with Fenchel-Young Losses” (JMLR)
- Martins & Astudillo (2016). “From Softmax to Sparsemax”
- Blondel et al. (2020). “Fast Differentiable Sorting and Ranking”
- Cuturi et al. (2019). “Differentiable Ranking via Optimal Transport”
Re-exports§
pub use fenchel::entmax;pub use fenchel::entropy_bits;pub use fenchel::entropy_nats;pub use fenchel::softmax;pub use fenchel::softmax_with_temperature;pub use fenchel::sparsemax;pub use fenchel::Regularizer;pub use fenchel::Shannon;pub use fenchel::SquaredL2;pub use fenchel::Tsallis;pub use metrics::compute_rank;pub use metrics::hits_at_k;pub use metrics::mean_rank;pub use metrics::mrr;pub use metrics::ndcg;pub use metrics::ndcg_at_k;pub use metrics::RankingMetrics;pub use sigmoid::sigmoid;pub use sigmoid::sigmoid_derivative;pub use sinkhorn::sinkhorn_rank;pub use sinkhorn::sinkhorn_sort;pub use sinkhorn::SinkhornConfig;pub use topk::differentiable_bottomk;pub use topk::differentiable_topk;
Modules§
- fenchel
- Fenchel-Young losses: a unified framework for prediction functions and losses.
- loss
- Differentiable ranking losses for learning to rank.
- metrics
- Ranking evaluation metrics.
- sigmoid
- Sigmoid function for smooth approximations.
- sinkhorn
- Sinkhorn algorithm for entropic optimal transport.
- topk
- Differentiable Top-K selection.
Enums§
Functions§
- fast_
soft_ sort - Fast Differentiable Sorting ($O(n \log n)$).
- isotonic_
l2 - Isotonic regression with L2 loss.
- pava
- Pool Adjacent Violators Algorithm (PAVA) for isotonic regression.
- pava_
weighted - Weighted PAVA with custom weights.
- reciprocal_
rank_ fusion - Reciprocal Rank Fusion (RRF).
- soft_
rank - Soft ranking with temperature parameter.
- soft_
sort - Soft sorting with temperature parameter.
- soft_
topk_ indicator - Compute soft top-k indicator.