agent-spec
agent-spec is an AI-native BDD/spec verification tool for task execution.
The core idea is simple:
- humans review the contract
- agents implement against the contract
- the machine verifies whether the code satisfies the contract
The primary planning surface is the Task Contract. The older brief view remains available as a compatibility alias, but new workflows should use contract.
Task Contract
A task contract is a structured spec with four core parts:
Intent: what to do, and whyDecisions: technical choices that are already fixedBoundaries: what may change, and what must not changeCompletion Criteria: BDD scenarios that define deterministic pass/fail behavior
The DSL supports English and Chinese headings and step keywords.
Example
spec: task
name: "User Registration API"
tags: [api, contract]
---
## Intent
Implement a deterministic user registration API contract that an agent can code against
and a verifier can check with explicit test selectors.
## Decisions
- Use `POST /api/v1/users/register` as the only public entrypoint
- Persist a new user only after password hashing succeeds
## Boundaries
### Allowed Changes
- crates/api/**
- tests/integration/register_api.rs
### Forbidden
- Do not change the existing login endpoint contract
- Do not create a session during registration
## Completion Criteria
Scenario: Successful registration
Test: test_register_api_returns_201_for_new_user
Given no user with email "alice@example.com" exists
When client submits the registration request:
| field | value |
| email | alice@example.com |
| password | Str0ng!Pass#2026 |
Then response status should be 201
And response body should contain "user_id"
Chinese authoring is also supported:
## 意图
## 已定决策
## 边界
## 完成条件
场景: 全额退款保持现有返回结构
测试: test_refund_service_keeps_existing_success_payload
假设 存在一笔金额为 "100.00" 元的已完成交易 "TXN-001"
当 用户对 "TXN-001" 发起全额退款
那么 响应状态码为 202
Workflow
1. Author a task contract
Start from a template:
For rewrite/parity tasks, start from the parity-aware task template:
Or study the examples in examples/.
AI Agent Skills
This repo ships three agent skills under skills/:
agent-spec-tool-first: the default integration path — tells the agent to useagent-specas a CLI tool and drive tasks throughcontract,lifecycle, andguard.agent-spec-authoring: the authoring path — helps write or revise Task Contracts in the DSL.agent-spec-estimate: the estimation path — maps Task Contract elements (scenarios, decisions, boundaries) to round-based effort estimates.
For rewrite/parity work, the authoring path should explicitly bind observable behavior before coding:
- command x output mode
- local x remote
- warm cache x cold start
- success x partial failure x hard failure
See examples/rewrite-parity-contract.spec for a concrete parity-oriented contract.
One-line install (CLI + skills)
This installs the agent-spec CLI via cargo install (if not already present) and copies all three skills to ~/.claude/skills/.
Manual install for Claude Code
# Copy to your global skills directory
Or symlink for auto-updates:
Install for Codex
The equivalent guidance for Codex lives in AGENTS.md. Copy it to your project root:
Install for Cursor
Copy .cursorrules to your project root.
Workflow
- Use
agent-spec-tool-firstto inspect the target spec and renderagent-spec contract. - Implement code against the rendered Task Contract.
- Run
agent-spec lifecyclefor the task-level gate. - Run
agent-spec guardfor repo-level validation when needed.
Before step 2, if the task is a rewrite, migration, or parity effort, use the tool-first workflow to review which observable behaviors are still unbound. If stdout/stderr, --json, -o/--output, local/remote, cache state, or fallback order are only described in prose, go back to authoring mode and add scenarios first.
This keeps the main integration mode tool-first. Library embedding remains available for advanced Rust-host integration, but it is not the default path.
2. Render the contract for agent execution
Use --format json if another tool or agent runtime needs structured output.
3. Run the full quality gate
lifecycle runs:
- lint
- verification
- reporting
The run fails if:
- lint emits an
error - any scenario fails
- any scenario is still
skiporuncertain - the quality score is below
--min-score
4. Use the repo-level guard
guard is intended for pre-commit / CI use. It lints all specs in specs/ and verifies them against the current change set.
5. Contract Acceptance (replaces Code Review)
explain renders a reviewer-friendly summary of the Contract + verification results. Use --format markdown for direct PR description paste. Use --history to include retry trajectory from run logs.
The reviewer judges two questions: (1) Is the Contract definition correct? (2) Did all verifications pass?
6. Stamp for traceability
Outputs git trailers (Spec-Name, Spec-Passing, Spec-Summary) for the commit message. Currently only --dry-run is supported.
Explicit Test Binding
Task-level scenarios should declare an explicit Test: / 测试: selector.
Scenario: Duplicate email is rejected
Test: test_register_api_rejects_duplicate_email
If package scoping matters, use the structured selector block:
Scenario: Duplicate email is rejected
Test:
Package: user-service
Filter: test_register_api_rejects_duplicate_email
场景: 超限退款返回稳定错误码
测试:
包: refund-service
过滤: test_refund_service_rejects_refund_exceeding_original_amount
This is the default quality rule for self-hosting and new task specs. The older // @spec: source annotation is still accepted as a compatibility fallback, but it should not be the primary authoring path.
Boundaries And Change Sets
Boundaries can contain both natural-language constraints and path constraints. Path-like entries are mechanically enforced against a change set.
Examples:
## Boundaries
### Allowed Changes
- crates/spec-parser/**
- crates/spec-gateway/src/lifecycle.rs
### Forbidden
- tests/golden/**
- docs/archive/**
The relevant commands accept repeatable --change flags:
Single-task commands also support optional VCS-backed change discovery:
Available scopes: none (default for verify/lifecycle), staged, worktree, jj.
When a .jj/ directory is detected (even colocated with .git/), use --change-scope jj to discover changes via jj diff --name-only. The stamp command also outputs a Spec-Change: trailer with the jj change ID, and explain --history shows file-level diffs between adjacent runs via jj operation IDs.
AI Verifier Skeleton
agent-spec now includes a minimal AI verifier surface intended to make uncertain results explicit and inspectable before a real model backend is wired in.
The relevant commands accept:
Available modes:
off: default, preserves the current mechanical-verifier-only behaviorstub: turns otherwise-uncovered scenarios intouncertainresults withAiAnalysisevidencecaller: the calling Agent acts as the AI verifier (two-step protocol)
caller mode enables the Agent running agent-spec to also serve as the AI verifier. When lifecycle --ai-mode caller finds skipped scenarios, it writes AiRequest objects to .agent-spec/pending-ai-requests.json. The Agent reads the requests, analyzes each scenario, writes ScenarioAiDecision JSON, then calls resolve-ai --decisions <file> to merge decisions back into the report.
stub mode does not claim success. It is only a scaffold for:
- explicit
uncertainsemantics - structured AI evidence in reports
- future integration of a real model-backed verifier
Internally, the AI layer now uses a pluggable backend shape:
AiRequest: structured verifier inputAiDecision: structured verifier outputAiBackend: provider abstraction used byAiVerifierStubAiBackend: built-in backend for deterministic local behavior
No real model provider is wired in yet. The current value is that the contract/reporting surface is now stable enough to add a real backend later without redesigning the verification pipeline.
Provider selection and configuration are intentionally out of scope for agent-spec itself. The intended embedding model is:
- the host agent owns provider/model/auth/timeout policy
- the host agent injects an
AiBackendintospec-gateway agent-specstays focused on contracts, evidence, and verification semantics
guard resolves change paths in this order:
- explicit
--changearguments - auto-detected git changes according to
--change-scope, if the current workspace is inside a git repo - an empty change set, if no git repo is available
guard defaults to --change-scope staged, which keeps pre-commit behavior stable.
If you want stronger boundary checks against the full current workspace, use:
worktree includes:
- staged files
- unstaged tracked changes
- untracked files
This makes guard practical for both pre-commit usage and broader local worktree validation without forcing users to enumerate changed files manually.
For consistency, verify and lifecycle use the same precedence when --change-scope is provided. The practical default is:
verify:nonelifecycle:noneguard:staged
Commands
parse: parse.specfiles and show the ASTlint: analyze spec qualityverify: verify code against a single speccontract: render the Task Contract viewlifecycle: run lint + verify + reportguard: lint all specs and verify them against the current change setexplain: generate a human-readable contract review summary (for Contract Acceptance)stamp: preview git trailers for a verified contract (--dry-run)resolve-ai: merge external AI decisions into a verification report (caller mode)checkpoint: preview VCS-aware checkpoint statusinstall-hooks: install git hooks for automatic checkingbrief: compatibility alias forcontractmeasure-determinism: [experimental] measure contract verification variance
Examples
See examples/:
examples/user-registration-contract.specexamples/refactor-payment-service.specexamples/refund.specexamples/no-unwrap.spec
Current Status
The current system is strongest when the contract can be checked by:
- explicit tests selected from
Completion Criteria - structural checks
- boundary checks against an explicit or staged change set
More advanced verifier layers can still be added, but the current model is already sufficient for self-hosting agent-spec with task contracts.
Contributing
agent-spec is self-bootstrapping: the project uses itself to govern its own development. When you contribute, you follow the same Contract-driven workflow that agent-spec teaches.
The contribution flow
Every change starts with a Task Contract. Before writing code, create a .spec file in specs/ that defines what you're building — the intent, the technical decisions that are already fixed, the files you'll touch, and the BDD scenarios that define "done." Then implement against the Contract and verify with lifecycle.
# 1. Create a task contract for your change
# Edit the generated spec: fill in Intent, Decisions, Boundaries, Completion Criteria
# 2. Check that the contract itself is well-written
# 3. Implement your change
# 4. Verify against the contract
# 5. Run the repo-wide guard before committing
# 6. Generate the PR description
The guard pre-commit hook is installed via agent-spec install-hooks. It checks all specs in specs/ against your staged changes — your commit will be blocked if any contract fails.
Project-level rules
The file specs/project.spec defines constraints that every task spec inherits. Read it before writing your first Contract — it tells you what the project enforces globally (e.g. "all public CLI behavior must have regression tests," "verification results must distinguish pass/fail/skip/uncertain").
Roadmap specs
Future work lives in specs/roadmap/. These are real Task Contracts but they are not checked by the default guard run. When a roadmap spec is ready for implementation, promote it to the top-level specs/ directory. See specs/roadmap/README.md for the promotion rule.
Using AI agents to contribute
If you use Claude Code, Codex, Cursor, or another AI coding agent, install the skills from the skills/ directory (see AI Agent Skills above).
The agent-spec-tool-first skill tells the agent to read the Contract first, implement within its Boundaries, run lifecycle to verify, and retry on failure without modifying the spec. The agent-spec-authoring skill helps the agent draft or revise Task Contracts in the DSL. The agent-spec-estimate skill maps Contract elements to round-based effort estimates for sprint planning.
For agents without skill support, the project includes AGENTS.md (Codex), .cursorrules (Cursor), and .aider.conf.yml (Aider) with the essential command reference.
What we review
Pull requests are evaluated through Contract Acceptance, not line-by-line code review. The reviewer checks two things: is the Contract definition correct (does it capture the right intent and edge cases), and did all verifications pass (lifecycle reports all-green). If both are yes, the PR is approved.
This means the quality of your Contract matters as much as the quality of your code. A well-written Contract with thorough exception-path scenarios is a stronger contribution than clever code with a thin spec.