Expand description
Submodular context packing — choosing which items to include under a budget so the selected set maximises coverage of what the task cares about.
§Why submodular?
Picking a set of context items to maximise term coverage is the classic
maximum coverage problem. Coverage is monotone submodular: each added
item helps, but with diminishing returns (a second file covering the same
terms adds little). For such objectives the simple greedy algorithm —
repeatedly take the item with the largest marginal gain per unit cost — is
provably within a 1 − 1/e ≈ 0.63 factor of the optimum (Nemhauser,
Wolsey & Fisher 1978). That guarantee, plus O(n·k) cost, is exactly what
we want for online context assembly.
This module is deliberately generic and side-effect free: callers map their
domain objects (symbol bodies, ranked files, chunks) onto CoverageItems
and get back the indices to keep, in selection order.
Structs§
- Coverage
Item - A candidate for inclusion: the set of weighted terms it covers and the budget cost of including it (e.g. its token count).
Functions§
- greedy_
max_ coverage - Greedy submodular maximisation of weighted term coverage under
budget.