Expand description
Deterministic classifier and deduplication engine for Pi extension candidates.
This module takes mixed-source research data (GitHub code search, repo search, npm scan, curated lists) and produces a validated, deduplicated candidate set.
Each candidate gets:
- A
ValidationStatus(true-extension, mention-only, unknown) ValidationEvidence(which signals matched)- A canonical identity key for deduplication
The classifier is intentionally conservative: a candidate must show clear Pi
extension API usage to be classified as TrueExtension.
Structs§
- Code
Search Entry - A candidate from the GitHub code search inventory.
- Code
Search Inventory - Wrapper for code search inventory JSON.
- Curated
List Entry - A candidate from the curated list sweep.
- Curated
List Summary - Wrapper for curated list summary JSON.
- NpmScan
Entry - A candidate from the npm scan.
- NpmScan
Summary - Wrapper for npm scan summary JSON.
- Repo
Search Entry - A candidate from the GitHub repo search.
- Repo
Search Summary - Wrapper for repo search summary JSON.
- Validated
Candidate - A fully validated candidate with classification and dedup info.
- Validation
Config - Configuration for the validation pipeline.
- Validation
Evidence - Evidence supporting a validation decision.
- Validation
Report - Output of the full validation + dedup pipeline.
- Validation
Stats - Aggregate statistics.
Enums§
- Validation
Status - Validation status for a candidate.
Functions§
- canonical_
id_ from_ npm - Generate a canonical ID from an npm package name.
Prefixed with
npm:to distinguish from GitHub repos. - canonical_
id_ from_ repo_ slug - Generate a canonical ID from a GitHub repo slug (e.g. “owner/repo”).
- canonical_
id_ from_ repo_ url - Extract a canonical ID from a GitHub repository URL.
Returns
owner/repoin lowercase, or None if not a GitHub URL. - chrono_
now_ iso - Simple ISO timestamp (avoids pulling in chrono).
- classify_
from_ evidence - Classify a candidate based on code-level evidence.
- classify_
source_ content - Classify extension source content (raw TypeScript/JavaScript).
- normalize_
github_ repo - Normalize a GitHub repo slug to lowercase
owner/repo. - run_
validation_ pipeline - Run the full validation + dedup pipeline on all research sources.