Expand description
File-path relevance ranking, ported from wcgw’s FastPathAnalyzer.
wcgw ships a tiny unigram language model trained over repo paths: a
Hugging Face tokenizer (paths_tokens.model) plus a vocab file mapping each
token to its log-probability (paths_model.vocab). A path’s score is the sum
of the log-probabilities of its tokens — higher (less negative) means the
path looks more like a “real source file worth showing” and less like noise.
Both assets are embedded so ranking works offline with zero setup, matching
the wcgw package that bundles them alongside repo_context.py.
Functions§
- score_
paths - Score each path by summed token log-probability (higher = more relevant).