ReasonKit Think MCP
Crate & binary name on crates.io: reasonkit-think-mcp (install adds the reasonkit-think-mcp executable).
What this is
reasonkit-think is a Rust MCP server designed to make agent reasoning:
- more structured
- more auditable
- less hallucination-prone
- easier to inspect and reuse
It ships a stable v1 sequential-thinking surface and a fully functional v2 deep deliberation workflow (ToT + verification + governance).
Why it matters
Most agent stacks either:
- reason shallowly and move fast, or
- reason deeply but become hard to debug and trust.
ReasonKit Think aims to combine both:
- Quick mode for speed and clarity
- Deliberate mode for branch-and-verify reasoning
- Governed mode for high-stakes route decisions
Reasoning Model Vision
flowchart LR
A[Quick CoT] --> B[Deliberate ToT]
B --> C[Verification Layer]
C --> D[ReasonKit Governance]
D --> E[Calibrated Route Decision]
Current MCP Surface (v1)
Tools
sequentialthinking_toolsget_thinking_historyclear_thinking_history
Prompt
sequential-thinking-guidance
Behavior
- session-scoped bounded history (
MAX_HISTORY_SIZE, default1000) - recommendation validation against available tools
- prompt-injection pattern scan and redaction
Deep MCP Surface (v2)
Available tools
start_deliberationreasoning_intent_router(maps normal wording like "use CoT/ToT/verify/audit" to concrete MCP calls)reasoning_autopilot(safe default auto-execution of routed deep reasoning flow)set_verification_policyset_reasoning_aliases(customize natural-language trigger vocabulary without code changes)expand_thoughtsscore_thoughtsprune_thoughtsverify_thoughtsconsensus_answerrun_reasonkit_pipelineexport_reasoning_audit
Available prompts
tot-planner-guidancediverse-lens-packverification-packtriangulation-packdecision-routing-pack
Available resources
reasoning://session/{id}/graphreasoning://session/{id}/frontierreasoning://session/{id}/verification-matrixreasoning://session/{id}/route-decisionreasoning://config/aliasesreasoning://schemas/{name}
Install & verify (quick path)
Requirements
- Rust 1.95+ (see
Cargo.tomlrust-version; match withrustc --version). rmcp-compatible MCP host (stdio transport).- Optional:
justfor themcp-refreshrecipe, Python 3 forscripts/smoke_test.py.
Binary path
Replace /ABSOLUTE/PATH/TO/reasonkit-think-mcp in the snippets below with either:
- From source:
/absolute/path/to/reasonkit-think/target/release/reasonkit-think-mcpaftercargo build --release, or - From crates.io:
~/.cargo/bin/reasonkit-think-mcpif that directory is on yourPATH, or the path reported bycargo install reasonkit-think-mcp.
Build from source
The server speaks MCP over stdio; when run without a connected client it will sit waiting on stdin (this is expected).
Install from crates.io
# binary: reasonkit-think-mcp
Quick health check (recommended)
You should see "status": "ok" and lists of v1 + v2 tools and prompts.
Rebuild + MCP wiring validation (Cursor / umbrella workspace)
This rebuilds the release binary, validates a workspace .mcp.json when present, checks the executable, and runs the smoke tests. If just is not installed, run the same steps manually: cargo build --release, then ./scripts/smoke_test.py.
Configuration audit (this repository vs your machine)
These hosts use local stdio MCP: the config must point at a real executable path and the process must be allowed to start. We validated the protocol surface with scripts/smoke_test.py (tools list, representative v2 flow, policy hardening). Per-host file locations differ; use the collapsible sections below.
| Host | Typical config file | Transport |
|---|---|---|
| Cursor | ~/.cursor/mcp.json or project .cursor/mcp.json |
stdio (mcpServers) |
| Claude Code | ~/.claude.json → mcpServers |
stdio |
| Gemini CLI | ~/.gemini/settings.json → mcpServers |
stdio |
| Codex CLI | ~/.codex/config.toml → [mcp_servers.*] |
stdio |
| Copilot / VS Code | User mcp.json or ~/.copilot/mcp-config.json (varies by channel) |
stdio |
| OpenCode | ~/.opencode/opencode.json, project opencode.json, optional ~/opencode.json + OPENCODE_CONFIG |
stdio (mcp → type: local) |
Longer references and links: docs/clients/*.md.
MCP client configuration (copy & paste)
Use an absolute path to the binary. On Windows, use a full C:\... path inside JSON strings.
Official docs: Cursor MCP, Cursor CLI MCP.
Add under mcpServers in ~/.cursor/mcp.json (user) or .cursor/mcp.json (project):
Reload MCP from Cursor settings. Confirm reasonkit-think appears and tools such as sequentialthinking_tools and start_deliberation are listed.
Official docs: MCP connector.
In the root mcpServers object inside ~/.claude.json (global) or the per-project entry your Claude Code build uses:
"reasonkit-think":
Restart the Claude Code session so the new server is discovered.
Official docs: Gemini CLI — MCP.
In ~/.gemini/settings.json, under mcpServers:
"reasonkit-think":
Run Gemini’s MCP diagnostics for your CLI version once after saving.
Official docs: Codex configuration.
In ~/.codex/config.toml:
[]
= "/ABSOLUTE/PATH/TO/reasonkit-think-mcp"
= []
[]
= "/tmp"
Restart Codex CLI so config is re-read.
Official docs: VS Code MCP for Copilot, Add MCP servers to Copilot CLI.
Many setups use a top-level mcpServers object with the executable path + optional tools allow list. Example (adapt path to Copilot CLI’s ~/.copilot/mcp-config.json or your VS Code mcp.json):
Copilot MCP schemas evolve—if "type":"local" is rejected, omit type or follow the VS Code MCP JSON schema version your editor shows in diagnostics.
Trust the workspace/host if prompted, then reload the window once.
Official docs: OpenCode configuration.
OpenCode merges global, optional OPENCODE_CONFIG, and project opencode.json; project configs that define their own mcp tree may omit servers that exist only in ~/opencode.json. Prefer ~/.opencode/opencode.json and your repo-root opencode.json when working inside a tracked project.
Note: command is a single-element array here (OpenCode local servers). Restart OpenCode after edits.
If the product supports MCP over stdio, register a server whose command is /ABSOLUTE/PATH/TO/reasonkit-think-mcp (or another absolute path from cargo install) and args are empty unless the host requires a wrapper. The protocol and discovery flow are defined by the Model Context Protocol and your client’s own MCP settings UI.
Client documentation index
| Client | Markdown guide |
|---|---|
| Cursor | docs/clients/cursor.md |
| Claude Code | docs/clients/claude-code.md |
| Gemini CLI | docs/clients/gemini-cli.md |
| Codex CLI | docs/clients/codex-cli.md |
| Copilot / VS Code | docs/clients/copilot-vscode.md |
| OpenCode | docs/clients/opencode.md |
Documentation
Start here:
docs/README.mddocs/IMPLEMENTATION_MASTER_PLAN.mddocs/ARCHITECTURE.mddocs/research/SOURCES.md
Showcase Prompt (Full MCP Walkthrough)
Use this prompt in Cursor Agent to demonstrate the complete reasonkit-think surface.
You are a senior AI operations planner using the `reasonkit-think` MCP server.
Scenario (randomized quality case):
An e-commerce startup ("Northstar Market") has a sudden 38% checkout drop after a Friday release.
You must propose a 72-hour recovery plan that balances speed, risk, customer trust, and engineering capacity.
Constraints:
- No database schema rollback during peak traffic.
- Max 2 emergency deployments per day.
- Legal risk is high if payment failures are misreported.
- Team: 3 backend, 1 SRE, 1 PM.
Your task is to run a complete reasoning workflow using ALL `reasonkit-think` capabilities.
Execution requirements:
1) Use prompt `sequential-thinking-guidance` first, then call `sequentialthinking_tools` for 2-3 quick CoT thoughts.
2) Call `get_thinking_history` and summarize key assumptions.
3) Start deep reasoning:
- call `start_deliberation` (mode=`reasonkit`, profile=`paranoid`, clear goal and constraints).
4) Use all advanced prompts to frame reasoning:
- `tot-planner-guidance`
- `diverse-lens-pack`
- `verification-pack`
- `triangulation-pack`
- `decision-routing-pack`
5) Build branches:
- call `expand_thoughts` from frontier (at least 4 branches).
- call `score_thoughts` on created nodes.
- call `prune_thoughts` with beam/diversity settings.
6) Verify critical claims:
- call `verify_thoughts` with explicit `claims` and structured evidence.
- include at least:
- one claim that should verify cleanly,
- one claim with data deficit,
- one claim with source conflict.
7) Demonstrate policy control:
- call `set_verification_policy` with strict settings (fail closed on critical unresolved).
- call `consensus_answer` and report whether blocked.
- then set a relaxed policy and call `consensus_answer` again.
8) Run governance pipeline:
- call `run_reasonkit_pipeline` and explain route decision.
9) Export and inspect auditability:
- call `export_reasoning_audit`.
- call `list_reasoning_resources`.
- call `read_reasoning_resource` for:
- reasoning://session/{id}/graph
- reasoning://session/{id}/frontier
- reasoning://session/{id}/verification-matrix
- reasoning://session/{id}/route-decision
10) Close quick-mode loop:
- call `clear_thinking_history` for the initial quick CoT session.
Output format:
- Section A: 72-hour recovery plan (actions, owners, sequence).
- Section B: Verified vs unresolved claims table.
- Section C: Strict vs relaxed policy outcome.
- Section D: Final route decision with confidence and caveats.
- Section E: Audit artifact highlights (what makes this traceable).
Compact Demo Prompt + Example Output
Use the reasonkit-think reasoning system to think this through deeply and transparently.
I need your help with a live operations problem:
Our micromobility app (e-scooter rentals) suddenly has many users who can’t unlock rides after a recent backend/firmware release. Complaints are rising, support is overwhelmed, and city partners are asking for immediate action. We cannot do a full rollback during peak commute hours, we can only push two production changes in the next 24 hours, and we must not weaken geofence/safety controls.
Please approach this in a way that is both practical and highly auditable:
- Start with a quick chain-of-thought style first pass to clarify assumptions, unknowns, and immediate priorities.
- Then switch into deeper tree-of-thought reasoning: generate multiple plausible recovery strategies, compare them, and keep only the strongest options.
- Use graph-like reasoning where useful: merge insights across branches, revisit weak assumptions, and refine the best path.
- Apply diverse lenses (engineering risk, customer trust, safety/compliance, operational feasibility, and speed).
- Explicitly verify the most critical claims with evidence quality in mind (flag weak evidence or conflicting evidence clearly).
- Use a strict decision gate mindset first, then test whether relaxing evidence thresholds actually changes the outcome.
- Produce a final recommendation only after governance-style calibration: should we proceed now, proceed with caveats, gather more evidence, or defer.
- Export/show a traceable reasoning trail so this can be reviewed in a postmortem.
Deliverables I need:
1) Top assumptions and unknowns
2) A concrete 24-hour mitigation plan (owner + action + timing)
3) A 7-day stabilization plan
4) Verified vs uncertain vs conflicting claim summary
5) Strict-vs-relaxed policy comparison and what changed (or didn’t)
6) Final route decision with confidence and caveats
7) Audit highlights showing why this reasoning is trustworthy and reviewable
At the very end, reset/clean temporary scratch reasoning state from the quick first-pass session.
1) Top assumptions and unknowns
Top assumptions (first-pass + deep reasoning):
- Release-coupled backend/firmware interaction is the most likely failure locus (unlock command handshake/protocol path), not a pure demand spike.
- Fastest safe recovery is surgical mitigation (feature-gated path control + targeted fix), not broad rollback.
- Safety/geofence controls are hard constraints and must remain intact while improving unlock success.
Critical unknowns:
- Exact cohort concentration: firmware version, city/geofence, scooter hardware batch, account/auth state.
- Primary technical fault class: protocol mismatch vs timeout regression vs auth signing issue.
- True geographic concentration and partner-impact distribution (currently weak evidence quality).
2) Concrete 24-hour mitigation plan (owner + action + timing)
- 0-1h | Incident Commander (Eng On-call Lead) — Open incident bridge, freeze non-essential changes, enforce constraints: max 2 changes/24h, no full rollback in peak, no safety/geofence relaxation.
- 0-2h | SRE + Data — Launch live unlock failure board segmented by firmware/backend release marker, city, geofence zone, scooter model, and error code.
- 1-3h | Backend Lead — Add/activate feature flag to route unlock flow to known-stable command formatting path where possible (without touching geofence checks).
- 1-3h | Support Ops Lead — Deploy support playbook: known issue macro, user retry guidance, ride credit policy guardrails, escalation labels keyed to telemetry error codes.
- 2-4h | Partner Ops Manager — Send city-partner status bulletin (impact, controls preserved, ETA for update cadence every 4 hours).
- 3-6h | Firmware + Backend Joint Taskforce — Validate handshake incompatibility hypothesis via protocol negotiation logs and targeted device repro.
- Change #1 (6-8h) | Backend Lead + SRE — Ship low-risk containment patch: robust handshake fallback for compatible versions, timeout tuning, enhanced structured telemetry; preserve all geofence/safety gating logic.
- 8-14h | QA + Reliability — Measure post-change unlock success by cohort; confirm no safety regressions and no geofence bypass patterns.
- 14-18h | Incident Commander — Go/no-go checkpoint for Change #2 using measured cohort outcomes and support backlog trend.
- Change #2 (18-22h) | Firmware/Backend Owners — Ship targeted compatibility fix (version parser / command signing alignment) behind kill switch, staged rollout by city cohorts.
- 22-24h | Comms + Support — Publish user update, partner update, and internal status with current unlock success delta and remaining caveats.
3) 7-day stabilization plan
- Day 1-2: Full cohort RCA with release timeline overlay; identify exact blast radius by firmware + backend version combinations.
- Day 2-3: Add automated canary checks for unlock handshake compatibility pre-prod and pre-rollout.
- Day 3-4: Introduce “release guardrail gate” requiring unlock success SLO pass per city cohort before expansion.
- Day 4-5: Build permanent support tooling: one-click impacted-user lookup by error signature + recommended remediation.
- Day 5-6: Joint post-incident session with city partners; share corrective controls and confidence metrics.
- Day 6-7: Complete postmortem with preventive actions: protocol contract tests, rollback-safe compatibility mode, alerting thresholds for unlock failure spikes.
4) Verified vs uncertain vs conflicting claim summary
| Claim | Status | Notes |
|---|---|---|
| Unlock failures rose post-release and concentrate in release-affected cohorts | Verified | Tier1 backend metrics + tier1 firmware telemetry + tier2 support timeline |
| Backend/firmware handshake mismatch is primary root cause | Verified (moderate confidence) | Tier1 protocol logs + tier2 limited staging repro |
| >70% impacts from only 3 geofences | Uncertain / Data deficit | Only one tier2 partial support sheet |
| Payment latency is dominant unlock-failure driver | Conflicting | Tier1 payment latency signal vs contradictory tier1 BLE timeout code distribution |
5) Strict-vs-relaxed policy comparison (what changed)
- Strict gate result:
GATHER_MORE_EVIDENCE(confidence0.75) - Relaxed policy tested:
min_independent_groups: 1allow_tier3_for_independence: truerequire_tier1_unambiguous_for_critical: falsefail_closed_on_critical_unresolved: false
- Relaxed gate result: unchanged — still
GATHER_MORE_EVIDENCE(confidence0.75)
Interpretation: even with looser evidence thresholds, unresolved/weak/conflicting claims still prevent a stronger proceed decision.
6) Final route decision with confidence and caveats
- Route decision:
GATHER_MORE_EVIDENCE - Confidence:
0.75 - Operational interpretation: proceed with safe containment and targeted fixes, but treat root-cause certainty as incomplete until geofence concentration and payment-vs-handshake attribution are fully resolved.
- Caveats: do not overfit to payment-latency narrative; do not assume city concentration from partial support samples; maintain hard safety/geofence invariants.
7) Audit highlights (trustworthy + reviewable)
- Deliberation executed in
reasonkit+paranoidprofile with explicit constraints and governance stages. - Multi-branch exploration performed (diverse + contrarian expansion), then score/prune to strongest frontier.
- Evidence-quality verification explicitly captured
verified,data_deficit, andsource_conflictoutcomes. - Strict and relaxed policy gates were both run; outcome stability was tested, not assumed.
- Full audit artifacts exported and inspectable (
deliberation_id: delib-1778167835217-9,audit_id: audit-1778167894920-26, resources read for graph/frontier/verification/route).
Quick first-pass scratch reasoning session was reset at the end (scooter-quick-1 cleared).
Natural Wording Triggers
reasoning_intent_router and reasoning_autopilot support normal phrasing so agents do not need to remember exact MCP names.
Examples that now trigger routing:
- CoT/quick:
step by step,quick think,first pass,fast pass,linear reasoning - ToT/deep branching:
explore options,compare options,multiple paths,what are our options - GoT/graph style:
join branches,revisit branches,graph reasoning,merge paths - Verification:
fact-check,cross-check,confidence check,source quality,validate claims - Governance:
go/no-go,decision gate,should we proceed,policy gate - Auditability:
show your work,prove reasoning,traceable artifact,postmortem export - Cleanup:
start clean,flush session,fresh session,reset
Roadmap Overview
gantt
title ReasonKit Think MCP v2 Delivery Plan
dateFormat YYYY-MM-DD
section Contracts
Phase 0: schemas and compatibility :a1, 2026-05-08, 7d
section Deliberation
Phase 1: ToT core :a2, after a1, 10d
section Verification
Phase 2: CoVe and matrix :a3, after a2, 10d
section Governance
Phase 3: ReasonKit pipeline :a4, after a3, 12d
section Graph Operators
Phase 4: GoT transforms :a5, after a4, 10d
section Docs and Release
Phase 5: client matrix + release :a6, after a5, 7d
Research Backbone
The v2 design is grounded in known reasoning and verification work:
- Tree of Thoughts
- Self-Consistency
- Least-to-Most
- ReAct
- Chain-of-Verification
- RARR
- SelfCheckGPT
- Graph of Thoughts
See docs/research/SOURCES.md for links and traceability.
Project Status
- v1: implemented and smoke-tested
- v2: implemented and smoke-tested (policy gating, audit export, resource reads)
- current focus: iterative hardening and developer UX improvements
License
Apache-2.0