veritas
veritas is a Tree-sitter testing oracle for AI-written and AI-modified software.
It is a CLI harness for mutation testing, property testing, fuzzing, coverage feedback, corpus replay, differential behavior checks, and evolutionary analysis across Rust, Go, Python, and future Tree-sitter language plugins.
It answers the question ordinary test runs often miss:
Would the current tests catch the subtle mistakes an AI coding agent is likely to make?
veritas maps changed code to verification targets, generates reviewable harnesses, runs scoped tests under budgets, and writes CI-friendly reports plus AI-ready repair prompts.
The default path is deterministic and does not call an LLM. An optional external planner hook can be enabled for AI-assisted planning while veritas still owns execution scope, budgets, and artifact writes.
Project site: Jacobious52.github.io/veritas
Why It Feels Different
- It gives an AI agent a concrete next-test queue instead of a vague "add more tests" warning.
- It keeps generated tests reviewable and removable through
.veritas/artifacts andveritas cleanup. - It is built around a generic plugin contract: Rust, Go, and Python work today, and future languages can reuse the same reports through Tree-sitter symbols, line ranges, command budgets, mutation campaigns, replay, and scoring.
- It is designed for bigger repos: changed-target selection, package/workspace awareness, command budgets, optional Rust cgroup/systemd limits, phase timing telemetry, CI profiles, benchmark fixtures, and external canaries.
Install
Prebuilt Linux and macOS binaries:
|
Install a specific release:
| VERSION=v0.1.1
Cargo fallback:
From the Git repository:
For local development:
Optional tools:
# Go verification
# Python verification
# Rust coverage, only used when coverage_enabled = true
Quick Start
Bootstrap a repo:
Use veritas on a changed branch:
Verify a specific target:
Explain and promote findings:
What a useful run looks like:
mutation survived: refund_cents <= available_cents -> refund_cents < available_cents
fuzz seed saved: " 12.34 " reproduced parser drift
replay drift: AuthorizeRefund("support", 500) changed behavior
next agent step: promote assertion candidate, rerun, keep only if the mutant dies
Documentation
- AI Agent Guide: copy-paste instructions and review loop for coding agents.
- Install Guide: release binary, cargo, git, and GitHub Actions setup.
- AI Verification Loops: tangible Rust, Go, Python, and agent-loop examples.
- Project Site: GitHub Pages landing page and public overview.
- Evolution Demo: real before/candidate/after loop from the Go evolution fixture.
- Production Guide: large-repo Go/Rust operation, budgets, CI policy, and host safety.
- Architecture: workspace layout, plugin contract, artifacts, and planner model.
- Plugin SDK: language plugin contract and the Python plugin path.
- Confidence Guide: fixture tiers, seeded examples, and external canaries.
- Releasing: crates.io publishing through GitHub Actions.
CLI Surface
Capabilities
Language and plugin model:
- Rust, Go, and Python plugins are available today
- Tree-sitter discovery provides symbols, methods, line ranges, and risk surfaces where grammars support them
- each plugin owns language-specific discovery, generated artifacts, command execution, coverage, replay compilation, and mutation operators
- the core owns shared scoring, policy, replay manifests/results, baselines, corpus entries, mutation campaign records, evolution suites, SARIF/JUnit/Markdown rendering, and AI repair prompts
- future language plugins can add their own Tree-sitter grammar and map into the same target/report/artifact contract
Changed-target verification:
- reads git diffs, staged changes, and untracked files
- maps changed lines to discovered Rust/Go/Python symbols when line ranges are available
- scopes package commands to changed packages and selected reverse dependencies where graph data exists
- writes AI review artifacts with change digests and verification guidance
Rust verification:
- detects packages and virtual workspaces through
Cargo.toml - discovers public free functions and public methods with Tree-sitter
- writes package-local
proptestintegration harnesses for supported public free functions, including no-panic and deterministic-output properties where signatures allow them - runs
cargo test --all-targetswith configurable jobs, test threads, command timeouts, and optional systemd scope limits - runs AST-scoped mutation probes, including comparison, boundary, async/task, synchronization, database, retry, testability, and brittleness domains, then reports correctness survivors separately from behavior-preserving brittleness probes
- collects
cargo llvm-cov --summary-onlywhen enabled - writes Rust symbol graph artifacts under
.veritas/symbol_graph/
Go verification:
- detects one or more
go.modroots - discovers exported functions and methods with Tree-sitter
- builds package graphs with
go list -json ./... - runs scoped
go testcommands for selected packages plus configurable reverse dependencies - discovers handwritten and generated fuzz targets
- writes
testing.Ffuzz harnesses for exported free functions with supported Go fuzz parameter types and edge-case seed rows - runs relevant
go test -run=^$ -fuzz=...targets through a bounded scheduler within caps and timeouts - applies build tags to Go list, test, fuzz, coverage, and mutation commands
- runs AST-scoped mutation probes for comparisons, nil/error branches, return defaults, boolean connectors, arithmetic and bitwise operators, assignment operators, increment/decrement statements, unary negation, loop control, literal flips, self-assignments, goroutine/defer/context lifecycle, locks, transactions, tenant/idempotency strings, retry/backoff seams, and domain-labeled risk surfaces
- writes package graph, package-awareness, and symbol graph artifacts
Python verification:
- detects Python projects through
pyproject.tomlor Python source roots - discovers functions with Tree-sitter and emits symbol graph artifacts
- runs
python3 -m pytest -qwhen the project prefers pytest and it is installed, otherwise falls back topython3 -m unittest discover - writes reviewable Hypothesis property candidates and executes them when both
hypothesisandpytestare installed, otherwise records a skipped command - collects coverage through
coverage.pywhen enabled - runs executable source-range mutation checks for supported comparisons, boolean connectors, default returns, database strings, async/testability seams, and brittleness probes
- supports replay cases for primitive single-argument and multi-argument public functions
Reports and artifacts:
- renders Markdown, JSON, SARIF 2.1.0, and compact JUnit XML
- saves the latest report to
.veritas/report.json - lists and previews candidate mutants without executing tests through
veritas mutants list, including JSON output, byte-range spans, diff previews, shard/filter controls, risk notes, and suggested tests - runs benchmark suites from
veritas-bench.tomlin temporary project copies and scores expected findings, commands, thresholds, and metrics - reports mutation score attribution/trends, per-mutant campaign records, per-run survivor diffs/logs, assertion candidates, corpus entries/replay, differential replay cases, budget skips/timeouts, property-test strength, fuzz execution, and persisted repro counts in
.veritas/report.json - summarizes current confidence and baseline deltas with
veritas score - writes API signature baselines and accepted finding baselines
- writes coverage feedback, mutation feedback, assertion candidates, corpus entries, replay manifests/results, budget plans, mutation trend JSON, mutation campaign JSON, tail-able mutation run directories under
.veritas/mutations/runs/, evolutionary candidate suites and generation outcomes with fitness/selection signals, repro notes, candidate verification patches, regression notes, evolution plans, promoted regression scaffolds, and promotion notes veritas evolve --index <n> --evaluateand--all-selected --evaluatenow emit before/after proof artifacts and remove generated candidates that regress or fail evaluationveritas conformancechecks the plugin contract for stable IDs, source-relative paths, function symbols, line ranges, and existing target files- cleans generated artifacts with
veritas cleanup
Scale and performance posture:
- changed branches are verified before full-repo sweeps;
--changedis the default CI profile path - Go package graphs and Rust workspace discovery keep command scope close to the edited surface
- command budgets, fuzz caps, mutation caps, package caps, and policy filters are configurable per repo
- Rust test and coverage commands can run inside systemd scopes with CPU and memory limits on shared hosts
- target discovery writes
.veritas/cache/<language>_targets.jsonand reports cache hits astarget_cacheartifacts so stable large-repo scans can avoid repeated Tree-sitter discovery - every report records phase timings for discovery, generation, test execution, coverage, replay, synthesis, and total runtime
- benchmark suites and external canaries track whether Veritas still works beyond tiny fixtures
- near-term performance goals are plugin-safe concurrency, adaptive mutation sampling, and reusable corpus/baseline data across runs
CI behavior:
.github/workflows/ci.ymlruns format, workspace tests, clippy, and Rust/Go/Python fixture scan/verify smoke checks on pull requests and pushes tomain- CI also runs
veritas conformanceacross the Rust, Go, and Python fixtures veritas verify --profile ciimplies--changed- CI profile disables full coverage, tightens package/fuzz/mutation/time caps, and enables policy-based failure on error severity by default
- policy filters can select severity, language, artifact kind, and target risk
- accepted finding IDs support new-findings-only CI behavior
Consumer GitHub Actions starter:
name: Veritas
on:
jobs:
verify:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v5
with:
fetch-depth: 0
- run: curl -fsSL https://github.com/Jacobious52/veritas/releases/latest/download/install.sh | sh
- run: veritas verify --changed --profile ci
- run: veritas repair-prompt --github-step-summary
if: always()
Config
Create veritas.toml or .veritas.toml in the target repo:
[]
= 120
= true
= true
= false
[]
= "deterministic"
# mode = "external_llm"
# command = "my-veritas-planner"
# fail_on_error = false
[]
= "error"
= []
= []
= []
= 70
= 70
= 80
[]
# Shared by language plugins. Operator names are intentionally generic so
# future Tree-sitter plugins can map their own AST mutations onto the same
# campaign/report model.
= []
= []
= []
= []
= []
= []
= []
= []
= []
= []
= []
= []
= false
= false
= 8
= false # set true to run the broader verification package set for every mutant
= false # set true to derive mutation timeout metadata from the baseline test duration
= 1 # Rust/Go use isolated temp roots when workers > 1; keep small repos serial by default
= [] # extra names or relative paths to skip in isolated mutation copies
= 1
= 1
= 10
= 120
= 0
= 1
= [] # e.g. ["lived", "not_covered", "timed_out"]
[]
= "proptest"
= 120
= false
= 120
= 1
= 1
= false
= "4G"
= "200%"
[]
= 10
= true
= 2
= true
= 1
= 20
= 120
= 64
= 8
= []
By default mutation runs select the narrowest package-level test commands the plugin can justify. Rust uses symbol/package ownership; Go uses the package graph plus reverse dependencies. Set disable_test_selection = true when a repo has global integration fixtures, hidden build tags, or cross-package side effects that make broad mutation commands safer than local selection.
Mutation filters are evaluated as include filters first, then exclude filters. Patterns accept exact:..., glob:... or * wildcards, and regex:...; legacy unprefixed patterns keep substring matching. Use include_target_ids / exclude_target_ids for lang:path:symbol targets and include_mutant_ids / exclude_mutant_ids for stable per-mutant IDs. Add veritas:skip-mutation inside a Rust, Go, or Python function to suppress local mutants, and set report_filtered = true when filtered candidates should appear as skipped records.
For shared machines, keep Rust coverage disabled unless needed and enable systemd scope limits:
[]
= false
= true
= 1
= 1
= "4G"
= "200%"
Development
Run the workspace checks:
Run fixture checks:
Run the richer example beds:
( && )
( && )
( && )
( && )
The example projects intentionally contain hidden assumptions while their handwritten tests pass, so they are useful for validating generated property/fuzz artifacts and report output.
Run the concrete evolution demo:
The seeded fixture starts with 14 evolution candidates, 12 selected candidates, 4 surviving mutants, and a 55 confidence score. Promoting the top ParseInvoiceTotal candidate into owned assertions raises the mutation score from 58% to 91%, removes the surviving mutants, and raises the confidence score to 98. See docs/evolution.md for the exact before/candidate/after commands and artifact paths.
Run external canary smoke checks when you want confidence against real pinned repositories:
The same canaries run weekly in GitHub Actions and can be started manually from the External Canaries workflow. large-smoke adds pinned larger Rust, Go, and Python repositories from canaries/pinned-repos.json while keeping them scan-only by default. Each run writes target/external-fixtures/reports/canary-dashboard.md with scan/verify tiers and trend deltas. Set VERITAS_CANARY_MIN_TIER, VERITAS_CANARY_MIN_CONFIDENCE, or VERITAS_CANARY_MAX_FINDINGS when a canary dashboard should fail CI on a missed threshold.