๐ก๏ธ ARGUS -- The verification layer for AI-generated code
AI generates code at near-zero cost. Human review didn't get faster. The bottleneck inverted: it's no longer generation -- it's verification.
ARGUS is the verification infrastructure. 15 Rust crates, 4 specialists, an audit chain that's BLAKE3-hash-chained and Ed25519-signed -- EU AI Act Art. 12 Level 2 ready by default. MIT licensed. BYOK. Zero SaaS lock-in.
Install ยท Quickstart ยท Why ยท What ยท Numbers ยท Pricing ยท Security
The problem is here. Now.
Open Source is dying in 2026. La confianza comunitaria se ahoga ante un +206% de scripts de Bash en proyectos AIยน, revisiones de PRs 4.6ร mรกs lentasยฒ y 15-18% mรกs de vulnerabilidadesยฒ. Con 42% del cรณdigo commiteado hoy siendo AI-generated o AI-assistedยณ y el 96% de los devs desconfiando de รฉlยณ, el AI slop -- Palabra del Aรฑo 2025โด -- ha forzado medidas extremas:
| Project | Response | Date |
|---|---|---|
| ๐ Ladybird (browser) | Cerrรณ sus PRs pรบblicas. "We will no longer accept public pull requests." | Jun 2026โต |
| ๐จ tldraw (whiteboard) | Auto-close de PRs externas. "Open source contribution has always been a gift economy held together by proof of work. AI has changed that." | Jan 2026โถ |
| ๐ฎ RPCS3 (PS3 emulator) | Tuvo que revertir mรบltiples PRs AI que causaron regresiones en producciรณn. | May 2026โท |
| ๐ cURL (web infrastructure) | Cancelรณ su bug bounty porque 19 de cada 20 reportes eran alucinaciones sintรฉticas. | Jan 2026โธ |
Fuentes: ยนGitHub Octoverse 2025 ยท ยฒOpsera 2026 AI Coding Impact Report ยท ยณSonar State of Code Developer Survey 2026 ยท โดMerriam-Webster Word of the Year 2025 ยท โตLadybird blog ยท โถtldraw issue #7695 ยท โทRPCS3 commit c0b3580 ยท โธDaniel Stenberg, "The end of the curl bug-bounty"
๐ค AI slop is a tragedy of the commons (arXiv:2603.27249): individual productivity gains externalize costs onto reviewers and maintainers. The bottleneck isn't generation. It's verification.
๐ก What ARGUS is
ARGUS = AI Review & Governance for Undermining Slop -- the trust layer for AI-generated code.
One product. Three layers. Four specialists. One signed certificate per analysis.
Built for engineering managers, OSS maintainers, and CISOs who need an audit-grade, EU AI Act-ready answer to the verification bottleneck. Pure Rust (15 crates, zero Python, zero Node.js in production). BYOK (your NVIDIA NIM key, never persisted). MIT licensed.
๐ก๏ธ The 3 layers (one worker each)
| Worker | When it runs | What it does | Latency |
|---|---|---|---|
| Aegis Guard | Pre-commit / pre-push | Hybrid scan on the staged diff: deterministic AST pre-flight (5 SLOP rules, regex, <100ms) + LLM semantic. Blocks critical issues. | <2s |
| Aegis Verify | PR review (webhook or one-shot) | 4 specialists in parallel via Tokio join! + CordonEnforcer (synthesizer never sees raw code). Emits a fix_plan.json for downstream coding agents. |
4-8s |
| Aegis Lens | Weekly digest | Aggregates findings across an org, ranks top offenders, generates an executive briefing (text + optional HeyGen video). | 5-15s |
๐ค The 4 specialists (run in parallel inside Verify)
| Specialist | Prompt | What it catches | Hybrid? |
|---|---|---|---|
| Aegis Slop | slop-detector |
Narrative comments, swallowed errors, oversized fns (>80 LOC), .unwrap() outside tests, TODO stubs, unused pub fn |
โ regex + LLM |
| Aegis Security | redteam-security |
Hardcoded credentials, injection, unsafe panic, unhandled errors, OWASP Top 10 | LLM |
| Aegis Arch | architecture-fit |
Repo coherence, pattern matching, idiom detection, separation of concerns | LLM |
| Aegis Verdict | verdict-synthesizer |
Synthesizes the 3 above into Approved/ReviewRequired/Halted + FixPlan |
LLM |
CordonEnforcer is the moat: the verdict synthesizer in the pipeline never sees raw code. It only sees the structured outputs of the other three specialists. No competitor (CodeRabbit, Greptile, Qodo) has this constraint.
โจ The 7 things that make ARGUS different
1. Hybrid detection -- cheap + deep
SLOP-001 oversized fn (size) โโบ regex < 1ms catches 40-60% of slop
SLOP-002 swallowed error arm โโบ regex < 1ms
SLOP-003 TODO stub โโบ regex < 1ms
SLOP-004 unwrap/expect outside tests โโบ regex < 1ms
SLOP-005 unused pub fn โโบ regex < 1ms
+ semantic reasoning โโบ LLM 2-4s catches the rest
No competitor has this combination. The result: 60-80% LLM cost reduction on typical PRs. Measured: P=1.000, R=0.818, F1=0.900 on 40-PR benchmark (BENCHMARK.md).
2. EU AI Act Article 12 Level 2 ready by default
The 16-field AuditEvent is automatically emitted on every LLM call:
Verifiable: curl /audit/export?from=2026-01-01&to=2026-12-31 returns NDJSON with a BLAKE3 manifest footer. No cleartext prompts, ever. GDPR derivative-liability-safe by construction. Enforcement starts Aug 2, 2026 -- 51 days from this README.
3. MCP server for Claude Code / Codex / Cursor
// ~/.config/claude-code/mcp.json
Four tools land in your agent's toolbox:
aegis_slopโ AI slop signalsaegis_securityโ adversarial reviewaegis_archโ architectural fit scoreaegis_verdictโ final verdict + FixPlan
Your coding agent now has ARGUS on tap. It can run a slop check, a security check, and a verdict on its own draft PR -- automatically, before it ever asks for human review.
4. A2A AgentCards -- discoverable to Google's open protocol
GET /.well-known/agent-card.json
GET /a2a/message
Opt-in via ARGUS_A2A_DISABLED=false. Google A2A orchestrators can discover and message our 4 specialists.
5. BYOK economics -- $0.05/dev/month
- User provides the NVIDIA NIM key (
X-LLM-Keyheader orARGUS_NIM_KEYenv) - No telemetry, no tracking, no per-seat fees
- We don't see your diffs -- they go directly from your process to NIM
- 100ร cheaper than CodeRabbit ($0.10-0.50/PR) at scale
6. Production resilience out of the box
- LLM circuit breaker with full-jitter exponential backoff (rolled our own, no
llm-retrydep) - Idempotency-Key support on
POST /analyze(24h TTL) - Graceful shutdown on SIGINT/SIGTERM (Axum
with_graceful_shutdown) - OpenTelemetry stdout exporter (env-gated via
ARGUS_OTEL_DISABLED) - SQLite audit persistence (
InMemoryAuditStorefor ephemeral,SqliteAuditStorefor durable)
7. Pure Rust 100%, MSRV 1.88
- 15 crates, 4 binaries
- 194 tests passing (no flaky)
cargo build --releasein 1m 27s- Zero Python, zero Node.js in the production binary
RUSTFLAGS="-D warnings" cargo testis the CI gate
๐๏ธ Architecture
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ GitHub PR / commit / org scan โ
โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ARGUS -- Three Layers โ
โ โ
โ Aegis Guard Aegis Verify Aegis Lens โ
โ (pre-commit) โโโบ (PR review) โโโบ (weekly digest) โ
โ <2s 4-8s 5-15s โ
โ โ โ
โ โผ โ
โ 4 specialists in parallel โ
โ (slop, security, arch, verdict) โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ AuditEvent (16 fields) -- EU AI Act L2 ready โ โ
โ โ BLAKE3 chain + Ed25519 signature + BLAKE3 NDJSON โ โ
โ โ manifest at /audit/export โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ SQLite (in-process) โโโโบ Supabase Postgres (remote, opt.) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Dashboard (axum + htmx + SSR) โ
โ Weekly briefings (HeyGen deeplink)โ
โ Cohort view (CodeRabbit-style) โ
โ + /audit/export for regulators โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
External:
โโโโโโโ
MCP server (apohara-argus-mcp) โโโบ Claude Code / Codex / Cursor
A2A AgentCards โโโบ Google A2A orchestrators
๐ฆ Install (30 seconds)
Pick the path that matches your environment. All three ship the same MIT-licensed core.
| Path | Command | What you get |
|---|---|---|
| npm (no Rust needed) | npx @apohara/argus --help |
The CLI + the MCP server. Downloads the right binary on first run. |
| cargo (Rust toolchain) | cargo install apohara-argus-cli |
Just the CLI. Faster startup, no download step. |
| Docker | docker run -e ARGUS_NIM_KEY=$YOUR_NIM_KEY SuarezPM/apohara-argus --help |
Full containerized ARGUS, no host dependencies. |
Build from source
Verify the install
# or
# or
๐ Quickstart (90 seconds end-to-end)
# 1. Get a free NVIDIA NIM key at https://build.nvidia.com/
# 2. Build everything (pure Rust, MSRV 1.88, ~1m 27s on a modern laptop)
# 3. Pre-commit guard on a local diff
|
# 4. PR review (one-shot, with the 4 specialists)
# 5. Weekly digest for an org
# 6. Start the dashboard (SSR, port 3000)
# 7. Start the MCP server (for Claude Code / Codex)
# 8. Verify EU AI Act compliance (BLAKE3 chain + manifest)
|
# โ { "# manifest: { "count": 47, "b3_hash": "...", ... } }
๐ The numbers
Numbers we measured (BENCHMARK.md), not promised:
| Metric | Value | Why it matters |
|---|---|---|
| Precision | 1.000 on 40-PR dataset | Zero false positives on the deterministic layer |
| Recall | 0.818 on 40-PR dataset | Catches 82% of AI-slop patterns; the 2 FNs are documented as rule-scope gaps |
| F1 score | 0.900 | Above the 0.70 plan target |
| Deterministic slop pass | <100ms on 10k LOC | 60-80% of LLM cost saved |
cargo build --release |
1m 27s | Fast iteration |
| Tests | 194 passing | Boring reliable |
| Per-dev cost | $0.05/month (BYOK) | 100ร cheaper than CodeRabbit at scale |
| EU AI Act Art. 12 | Level 2 ready | Regulators can verify via curl /audit/export |
| Crates | 15 | 4 binaries |
| MSRV | 1.88 | Compatible with stable Rust 2024 |
| Pure Rust | 100% | No Python, no Node.js in production |
๐ Comparison
| ARGUS | CodeRabbit | Greptile | Qodo | |
|---|---|---|---|---|
| BYOK | โ NVIDIA NIM | โ SaaS only | โ SaaS only | โ SaaS only |
| Per-dev cost | $0.05/mo | $0.10-0.50/PR | $25/mo | $40-60/mo |
| EU AI Act ready | โ Art.12 L2 | โ | โ | โ |
| Audit trail signed | โ Ed25519 + BLAKE3 | โ | โ | โ |
| MCP server | โ 4 tools | โ | โ | โ |
| A2A AgentCards | โ | โ | โ | โ |
| CordonEnforcer (synthesizer doesn't see raw code) | โ | โ | โ | โ |
| Hybrid detection (deterministic + LLM) | โ | โ LLM-only | โ LLM-only | โ LLM-only |
| Measured P/R/F1 | โ P=1.0, R=0.82 | โ | โ | โ |
| Open source | โ MIT | โ | โ | โ |
| Pure Rust | โ | โ TS/Node | โ TS/Node | โ TS/Node |
๐ฅ For the [target user]
For the CISO ๐
EU AI Act Art. 12 compliance is one curl, not a 6-month audit. The audit chain is BLAKE3-hash-chained and Ed25519-signed -- your regulator can verify it offline without trusting ARGUS. BYOK + offline-first means your code never leaves your host. No data residency issue. See docs/for-ciso.md for the full pitch.
For the engineering manager ๐
ARGUS pays for itself in week 1 of any team > 3 developers:
- Per dev: 25-40 min/PR saved in review (only edit the bot's draft) + ~15 min/week avoided in re-work
- Per team of 10 devs: 4-7 hrs/week in maintainer time + 5-10 AI slop bugs prevented/month
- Per engineering manager: 4-6 hrs/week in manual reporting โ 0 with Aegis Lens
For the OSS maintainer ๐ ๏ธ
Stop drowning in AI slop. Add ARGUS as a pre-commit hook or a PR webhook. P=1.0, R=0.82 on the deterministic layer means zero false positives for the rules we ship. The LLM semantic layer catches the rest. Triage in 4-8 seconds, not 40 minutes.
๐บ๏ธ Roadmap (what's shipped, what's next)
The 19 features shipped (1 of 20 deliberately not done):
| # | Feature | Status |
|---|---|---|
| 1.1 | Cohort view (dashboard) | โ Shipped |
| 1.2 | fix_plan.json hand-off |
โ Shipped |
| 1.3 | aislop CI badge | โ Shipped (dogfooding virtuous loop) |
| 2.1 | AuditEvent (16 fields) BLAKE3 + Ed25519 |
โ Shipped |
| 2.2 | NDJSON audit export | โ Shipped (regulator-ready) |
| 2.4 | Retention in argus health |
โ Shipped (warns if <180d per Art. 19) |
| 3.1 | LLM circuit breaker | โ Shipped (no retry storms on NIM outage) |
| 3.2 | A2A AgentCards | โ Shipped (Google's open protocol) |
| 4 | EU AI Act L2 conformance | โ Shipped (default) |
| 4.1 | Per-role model registry | โ Shipped (deepseek-v4 / nemotron-3 / glm-5.1) |
| 5 | MCP server | โ Shipped (4 tools for Claude Code/Codex/Cursor) |
| 5.1 | Deterministic slop pre-flight | โ Shipped (5 SLOP rules, <100ms) |
| 6.1 | Graceful shutdown | โ
Shipped (Axum with_graceful_shutdown) |
| 6.2 | Idempotency-Key | โ Shipped (24h TTL, no double-billing) |
| 6.3 | OpenTelemetry stdout | โ Shipped (env-gated) |
| 6.4 | SQLite audit persistence | โ Shipped (sqlx 0.7) |
| 7.1 | HeyGen deeplink | โ Shipped (url_encode, 0% cost) |
| 8.2 | SPIFFE primitives | โ Shipped (spiffe 0.16) |
| 7.2 | BYVK opt-in (HeyGen/D-ID video integration) | โ Deliberately not done -- the $78-460/yr cost kills the $0.05/dev/month story. 7.1 (deeplink) gives 80% of the value at 0% of the cost. |
What's next (human-action items)
- ๐ crates.io publishing -- 13 crates ready; awaiting
CARGO_REGISTRY_TOKENrepo secret - ๐ OpenSSF Best Practices Silver -- evidence map ready at
docs/best-practices-silver.md; awaiting form submission atbestpractices.dev - ๐ First release on GitHub with SLSA L3 attestation, SHA256 manifest, and distroless Docker image
๐ ๏ธ Use it. Fork it. Ship it.
License: MIT. Self-host, modify, redistribute. No telemetry, no phone-home.
Questions? Open an issue at https://github.com/SuarezPM/apohara-argus/issues.
๐ Read the docs
| Doc | What's in it |
|---|---|
| docs/VERIFICATION.md | The 22-check local verification report |
| docs/CI-VERIFICATION.md | The 4 auto-trigger GitHub Actions workflows |
| docs/HANDS-ON-QA.md | 22/22 hands-on QA checks pass |
| docs/SCOPE-FIDELITY.md | 95/100 scope fidelity, 24/28 sub-tasks delivered |
| docs/best-practices-silver.md | OpenSSF Best Practices Silver evidence map |
| docs/BENCHMARK.md | P/R/F1 on 40 PRs + latency + cost |
| docs/pricing.md | 3 tiers (Free / Team / Enterprise) |
| docs/for-ciso.md | CISO-targeted EU AI Act pitch |
| docs/branch-protection.md | Branch protection policy + gh api snippet |
| SECURITY.md | Threat model (covers / does NOT cover) |
| GOVERNANCE.md | Roles, access continuity, fork-ability |
| CONTRIBUTING.md | DCO + coding standards + testing policy |
| CHANGELOG.md | Keep a Changelog format |
Built for the Platzi Reto AI Academy as 5 projects in one product: System of Prompts ยท Automate the Flow ยท Web App ยท The Agent ยท MVP with Real Intelligence. 1 Cargo workspace, 15 crates, 194 tests, MIT license. The verification layer for the AI-generated code era.