# Repo Trust — Public Methodology
> **Version:** 1.0 (May 2026). Versioned with the scoring model. See [`scoring-model.md`](scoring-model.md) for the change log.
>
> **License:** This document is dual-licensed under Apache-2.0 (the project license) and [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/) so it can be cited and adapted in academic and industry work.
---
## Why publish a methodology?
A score that you can't audit is a score you can't trust. Repo Trust takes the opposite stance from closed-source SaaS competitors: every threshold, every weight, every heuristic is documented here, in plain English, with citations.
If you disagree with a number, that disagreement is welcome — open a [methodology question issue](https://github.com/Dmitrze/repo-trust/issues/new?template=methodology_question.yml) and we'll engage with the substance.
---
## Guiding principles
1. **Heuristic, not black-box.** No ML classifiers in v1 (see [ADR-0004](adr/0004-no-ml-in-v1.md)). Every score is a function of named features against named thresholds.
2. **Conservative on negative claims.** A false positive in fake-star detection harms a real maintainer. We bias toward saying "low confidence" rather than "suspicious."
3. **Federate, don't replicate.** When OpenSSF Scorecard, deps.dev, or OSV already answer a question well, we consume their output (see [ADR-0005](adr/0005-federate-not-replicate.md)).
4. **Confidence is reported separately from score** (see [ADR-0008](adr/0008-confidence-separate-from-score.md)).
5. **Probabilistic language only.** We say "X% of sampled stargazers match a low-activity profile" — never "this repo has fake stars."
---
## Module 1 — Star Authenticity
### The question
*Are the popularity signals organic, or do they show patterns consistent with paid star campaigns?*
### Inputs
For a sample of stargazers (size depends on mode — see §6):
- Account creation date
- Followers / following counts
- Public repo count
- Public gist count
- Bio, blog URL, email presence
- Star date (when available via GraphQL `starredAt` field)
- For the lockstep signal: full timestamp series of star events
For the repo itself:
- Total stars, forks, watchers
- Stargazer-time-series shape
### Heuristic 1: Low-activity profile share
A stargazer account is flagged as "low-activity" if **all** of the following are true:
| Signal | Threshold |
| --- | --- |
| `created_at` | After 2022-01-01 |
| `followers` | ≤ 1 |
| `following` | ≤ 1 |
| `public_gists` | == 0 |
| `public_repos` | ≤ 4 |
| `bio` | empty |
| `blog` | empty |
| `email` | empty |
| `starred_at` | == `created_at` (same day, when available) |
**Rationale:** This composite signature is from Dagster's [fake-star-detector (2023)](https://dagster.io/blog/fake-stars) and was further validated by StarScout (He et al., ICSE 2026, [arxiv 2412.13459](https://arxiv.org/pdf/2412.13459)). Each individual signal is weak, but their conjunction is statistically anomalous — real users almost always have at least one non-default field.
**Score thresholds (v1.0):**
| Low-activity share | Sub-score |
| --- | --- |
| < 5% | 100 |
| 5–10% | 85 |
| 10–20% | 65 |
| 20–35% | 40 |
| 35–50% | 20 |
| > 50% | 0 |
**Important caveats:**
- Below 100 sample size, confidence drops to Medium.
- Below 30 sample size, confidence drops to Low and we surface a caveat.
- New repos (< 6 months old) often have a higher share of new accounts naturally; we apply a 5pp leniency to repos under 6 months.
### Heuristic 2: Lockstep timing
**Idea:** Real stars arrive in a relatively smooth distribution shaped by traffic events (HN, Twitter, conference talks). Paid stars often arrive in tight bursts.
**Method (v1.0):**
1. Build a daily count series of stars over the repo's lifetime.
2. Compute the rolling 28-day mean and standard deviation, lagged by 7 days.
3. Compute the z-score of each day's star count vs the lagged baseline.
4. Identify "burst" days where z-score ≥ 5.
5. Compute `lockstep_z_score` = max z-score observed.
**Score thresholds (v1.0):**
| Max daily z-score | Sub-score |
| --- | --- |
| < 3 | 100 |
| 3–5 | 85 |
| 5–8 | 60 |
| 8–12 | 30 |
| > 12 | 10 |
**Caveats:**
- A genuine viral moment (HN front page) can produce z-scores in the 8–15 range. We are explicit that high z-scores are not proof of campaigns, only a signal that *combined with* the low-activity share suggests a campaign.
- We require **both** signals (low-activity share ≥ 20% AND z-score ≥ 5) before lowering the module score to the "Concerning" band.
### Heuristic 3: Fork / watcher ratios
**Idea:** Healthy projects have stars correlated with forks (people who clone the code) and watchers (people who track it). Star-only popularity is suspicious.
| Signal | Healthy range |
| --- | --- |
| `forks / stars` | ≥ 0.04 |
| `watchers / stars` | ≥ 0.005 |
These ratios shift by ecosystem (TypeScript libraries fork less than Python frameworks). We apply ecosystem-aware multipliers documented in `module-specs.md`.
### Final module score
`module_score = 0.55 × low_activity_subscore + 0.30 × lockstep_subscore + 0.15 × ratio_subscore`
This weighting reflects the rigor of each signal: low-activity profile share is the strongest, lockstep is corroborating, ratios are rough.
---
## Module 2 — Activity Health
### The question
*Is the repository alive and operationally active?*
### Inputs
- Commits in last 30 / 90 / 365 days
- Days since last commit
- Days since last release (if any)
- Median issue first-response time (last 90 days)
- Median PR review time (last 90 days)
- Active contributors in last 90 days
- Variance of monthly commit count over 18 months (continuity)
### Sub-scores
| Sub-signal | Threshold for full credit | Threshold for zero |
| --- | --- | --- |
| Days since last commit | ≤ 14 | ≥ 365 |
| Commits last 90d | ≥ 30 | == 0 |
| Active contributors last 90d | ≥ 4 | ≤ 1 |
| Median issue response (hours) | ≤ 48 | ≥ 720 (30d) |
| Days since last release | ≤ 90 | ≥ 730 (2y) |
Each sub-score is interpolated linearly between the two thresholds.
**Final:** `module_score = mean of sub-scores`.
### Ecosystem awareness
Long-stable utilities (e.g. a UUID library) legitimately go quiet. We mitigate by:
- Down-weighting "days since last release" if `total_releases >= 3` AND `versioned_consistently == true`.
- Down-weighting "commits last 90d" if `dependents_count >= 1000` (a heavily-depended-on stable library).
These adjustments are documented per ecosystem in `module-specs.md`.
---
## Module 3 — Maintainer Health
### The question
*Is stewardship sustainable, or does this project sit on one person's shoulders?*
### Inputs
- Active maintainers in last 365 days (any commit)
- Gini coefficient of commits-per-author (last 365 days)
- Gini coefficient of PR-review actions (last 365 days)
- Bus factor proxy: minimum number of authors covering 50% of last 365d commits
- Contributor retention rate: % of contributors active in two consecutive 180-day windows
- Median maintainer response time (issues + PRs)
- Presence of `CODEOWNERS`, `MAINTAINERS.md`, governance docs
### Why Gini?
A project where one author makes 95% of commits has Gini ≈ 0.95; a healthy project where five maintainers each contribute 20% has Gini ≈ 0. Gini captures concentration in a single number that is comparable across project sizes.
### Bus factor proxy
We explicitly avoid calling this "bus factor" because true bus factor is unmeasurable from outside. Our proxy: how few people would need to leave to drop below 50% of recent commits? It correlates with bus factor but isn't identical.
### Score table (illustrative)
| Bus-factor proxy | Sub-score |
| --- | --- |
| ≥ 5 | 100 |
| 4 | 85 |
| 3 | 70 |
| 2 | 50 |
| 1 | 25 |
Full tables in `scoring-model.md`.
---
## Module 4 — Adoption Signals
### The question
*Is this repository actually used in the wild?*
### Inputs (federated where possible)
- GitHub `dependents` count and trend (last 90 days)
- Package-registry weekly downloads (npm, PyPI, crates.io, RubyGems, Maven Central, NuGet) via [deps.dev](https://deps.dev)
- Mentions in well-known awesome-lists (configurable list maintained in `examples/awesome-lists.txt`)
- Documentation maturity: presence and length of `README.md`, `docs/` folder, `examples/` folder
### Federation
We do not collect download statistics ourselves — deps.dev does this at industrial scale and we'd be replicating their work badly. We map `repo → published packages` via deps.dev v3alpha (`/projects/github.com/{owner}/{repo}:packageversions`) and ingest the unique `(system, name)` set as the ecosystem-coverage signal.
As of mid-2026, deps.dev v3 no longer surfaces `weeklyDownloads` on the per-package endpoint. The `weekly_downloads` sub-score remains in the model so it lights up automatically if the field comes back, or if a later release federates downloads from another source (e.g. `pypi.org/stats/`, `npmjs.com/downloads`, `crates.io/downloads`); none is added in v0.1.
### Confidence (scoring 1.1.0+)
Confidence in the adoption score depends on the breadth of evidence deps.dev returns for the repository, not on download volume:
- **High**: the repository publishes at least one package on a recognised ecosystem (CARGO / NPM / PYPI / GO / MAVEN), is not archived, and has at least medium documentation maturity (≥ 0.50 on the maturity scale — typically a substantial README, or a shorter README plus a `docs/` directory). Two independent signals tell us the project is in active use.
- **Medium**: packages are present but the repository is archived, or packages are present but documentation is thin, or no packages are detected but documentation is mature (suggests adoption in non-package form, e.g. a manifest or example repository).
- **Low**: no published packages and no documentation depth.
If deps.dev itself is unavailable for the scan (`deps_dev_error == true`), confidence short-circuits to Low — we genuinely don't know what we're missing.
We previously gated High on a `weeklyDownloads` floor returned by deps.dev. As of deps.dev v3 (mid-2026), that field was discontinued upstream and no longer appears on the project endpoints. The methodology was updated to reflect what the federated source actually exposes today; see `docs/scoring-model.md` 1.1.0 for the full change-log entry.
**1.1.1 calibration:** The "documented" predicate in the confidence rule was widened beyond the raw 0.50 doc-maturity floor. Repos that keep a short root README but maintain canonical docs in `docs/` or `examples/` (a common library-project layout — clap-rs/clap, serde-rs/serde, tower-rs/tower) now qualify as documented for confidence purposes. The doc-maturity *score* is unchanged — only the boolean used by the confidence rule is more generous. See `is_well_documented` in `src/scoring/adoption.rs`.
**1.1.1 deps.dev relationship filter:** The `:packageversions` endpoint surfaces every package that mentions this repo as `SOURCE_REPO`, including transitive references that simply depend on it. We now filter the response down to first-party publication relationships using a combined heuristic — verified `relationProvenance` (currently only `GO_ORIGIN` is emitted by deps.dev, with all other ecosystems coming back as `UNVERIFIED_METADATA`) OR an owner-aware name match (CARGO/`tokio` from `tokio-rs/tokio`, NPM/`@octocat/hello-world` from `octocat/Hello-World`, GO/`github.com/owner/repo` from `owner/repo`) — AND a minimum of two distinct versions per `(system, name)` group. The version threshold filters out single-tagged demo/example repos that show up as `GO_ORIGIN` simply because every tagged GitHub repo is reachable as a Go module path. End result: `package_systems_count` reflects publication, not mention.
### Graceful degradation
Not every repo publishes a package. A research project on GitHub may have 10,000 stars and zero packages. We don't penalize this — we surface the absence as a Neutral caveat (`no_packages` evidence row), and confidence is graded by whether documentation maturity carries the signal instead.
---
## Module 5 — Security & Readiness
### The question
*Is this repository in a state that supports responsible adoption?*
### Inputs
- OpenSSF Scorecard score (federated from `api.scorecard.dev`)
- OSV vulnerability count for the repo's published packages (federated from `api.osv.dev`)
- Presence and recency of: `SECURITY.md`, `CONTRIBUTING.md`, `CODE_OF_CONDUCT.md`, `LICENSE`, `CODEOWNERS`
- CI workflow files in `.github/workflows/`
- Release-tagging consistency (semver adherence)
- Branch-protection signals (where observable via the API)
### Federation: Scorecard
If Scorecard has a recent score for the repo (≤ 30 days old), we use it directly with weight 0.40 of this module's score. We do **not** re-run Scorecard's 18 checks ourselves.
If Scorecard has no score, we report `Low confidence` for this module and rely on the document-presence and CI sub-signals.
### Federation: OSV
We map `repo → published packages` via deps.dev, then query OSV for each package's published versions. The count of unfixed advisories in the latest published version becomes a sub-signal.
---
## Aggregation
The overall Trust Score is a confidence-weighted average:
```
trust_score = Σ (w_i × score_i × conf_i) / Σ (w_i × conf_i)
```
where `conf_i` is `0.5` for Low, `0.75` for Medium, `1.0` for High. This means a partially-collected module contributes less to the final number, preventing thin-data modules from dominating.
Default weights (v1.0):
| Module | Weight |
| --- | --- |
| Star Authenticity | 0.20 |
| Activity Health | 0.25 |
| Maintainer Health | 0.20 |
| Adoption Signals | 0.20 |
| Security & Readiness | 0.15 |
Users can override weights via `--weights weights.toml`.
## Confidence aggregation
The report's overall confidence is the **minimum** confidence across modules whose weight contributes more than 10% of the total. This means one low-confidence module on a meaningful axis lowers the headline confidence — we don't average it away.
## Determinism
See [ADR-0007](adr/0007-deterministic-output.md). Same inputs (`repo`, `mode`, `scoring_version`, `weights`, `rng_seed`) + same upstream API state = byte-identical JSON output (modulo `snapshot_at` and `runtime_seconds`).
Default RNG seed is derived from `(repo, scoring_version)` via blake3 hash, so users get the same sample for the same repo across runs without specifying `--seed`.
## Citations
- He, H. et al. **"4.5 Million (Suspected) Fake Stars in GitHub: A Growing Spiral of Popularity Contests, Scams, and Malware."** *ICSE 2026*. [arxiv:2412.13459](https://arxiv.org/pdf/2412.13459).
- Dagster Labs. **"Fake stars on GitHub."** Blog post, March 2023. <https://dagster.io/blog/fake-stars>
- Open Source Security Foundation. **OpenSSF Scorecard.** <https://scorecard.dev>
- Google Open Source Insights. **deps.dev.** <https://deps.dev>
- Open Source Vulnerabilities. **OSV.dev.** <https://osv.dev>