Skip to main content

Module penalties

Module penalties 

Source
Expand description

File-path penalties + rerank_topk with file-saturation decay.

Port of ~/src/semble/src/semble/ranking/penalties.py. Two surfaces:

  1. file_path_penalty — multiplicative penalty per file path. Combines penalties for test files, compat/legacy/examples directories, re-export barrels (__init__.py, package-info.java), and .d.ts declaration stubs.
  2. rerank_topk — greedy top-k selection that applies path penalties (when penalise_paths == true) then decays by 0.5 per extra chunk from the same file beyond a threshold of 1.

§Indexing convention

Where Python uses dict[Chunk, float] keyed by hashable Chunk, Rust uses (chunk_index, score) pairs — the same convention ripvec already uses in crate::hybrid. This avoids adding Hash/Eq impls to CodeChunk just to satisfy a HashMap key.

Constants§

FILE_SATURATION_DECAY
Multiplicative penalty per extra chunk from the same file beyond FILE_SATURATION_THRESHOLD. Excess chunks pay decay^excess.
FILE_SATURATION_THRESHOLD
Maximum chunks from the same file before saturation penalty applies.
MILD_PENALTY
.d.ts declaration stubs (still carry useful type info).
MODERATE_PENALTY
Re-export / metadata files (__init__.py, package-info.java).
STRONG_PENALTY
Test files, compat shims, example/doc code.

Functions§

file_path_penalty
Combined multiplicative penalty for all applicable path patterns.
rerank_topk
Select the top-k results with optional path penalties and file-saturation decay.