Skip to main content

Module git_staleness

Module git_staleness 

Source
Expand description

Git-based staleness computation for symbol embeddings.

Staleness reflects how much time has passed (in commits) since a file was last updated. For symbol chunks (where doc and code are colocated), the approximation is:

  1. Find last_doc_commit — the most recent commit that touched the file.
  2. Count commits to the file since last_doc_commit (exclusive). Because last_doc_commit is itself the last file touch, this counts repo-wide commits that did NOT touch the file — i.e. how many commits have elapsed with no update to this file.
  3. staleness = min(1.0, commits_since_doc_update as f64 / 50.0)

Gives 0.0 for files updated in the most recent commit, approaching 1.0 after 50+ commits have landed without touching the file.

§Performance

Use compute_staleness_batch to amortize the git walk across all files in a populate run. The function deduplicates paths and walks history once per unique file.

Functions§

compute_staleness_batch
Compute staleness scores for a batch of file paths.