Expand description
File-path penalties + rerank_topk with file-saturation decay.
Port of ~/src/semble/src/semble/ranking/penalties.py. Two surfaces:
file_path_penalty— multiplicative penalty per file path. Combines penalties for test files, compat/legacy/examples directories, re-export barrels (__init__.py,package-info.java), and.d.tsdeclaration stubs.rerank_topk— greedy top-k selection that applies path penalties (whenpenalise_paths == true) then decays by 0.5 per extra chunk from the same file beyond a threshold of 1.
§Indexing convention
Where Python uses dict[Chunk, float] keyed by hashable Chunk,
Rust uses (chunk_index, score) pairs — the same convention ripvec
already uses in crate::hybrid. This avoids adding Hash/Eq
impls to CodeChunk just to satisfy a HashMap key.
Constants§
- FILE_
SATURATION_ DECAY - Multiplicative penalty per extra chunk from the same file beyond
FILE_SATURATION_THRESHOLD. Excess chunks paydecay^excess. - FILE_
SATURATION_ THRESHOLD - Maximum chunks from the same file before saturation penalty applies.
- MILD_
PENALTY .d.tsdeclaration stubs (still carry useful type info).- MODERATE_
PENALTY - Re-export / metadata files (
__init__.py,package-info.java). - STRONG_
PENALTY - Test files, compat shims, example/doc code.
Functions§
- file_
path_ penalty - Combined multiplicative penalty for all applicable path patterns.
- rerank_
topk - Select the top-k results with optional path penalties and file-saturation decay.