Skip to main content

Module path_prob

Module path_prob 

Source
Expand description

File-path relevance ranking, ported from wcgw’s FastPathAnalyzer.

wcgw ships a tiny unigram language model trained over repo paths: a Hugging Face tokenizer (paths_tokens.model) plus a vocab file mapping each token to its log-probability (paths_model.vocab). A path’s score is the sum of the log-probabilities of its tokens — higher (less negative) means the path looks more like a “real source file worth showing” and less like noise.

Both assets are embedded so ranking works offline with zero setup, matching the wcgw package that bundles them alongside repo_context.py.

Functions§

score_paths
Score each path by summed token log-probability (higher = more relevant).