consilium
Multi-model deliberation CLI. 5 frontier LLMs debate your question, then Claude Opus 4.6 judges and synthesizes a recommendation.
How it works
- Blind phase — Each model answers independently (no herding)
- Cross-pollination — Models read all blind claims and investigate gaps
- Debate — Structured rounds with a rotating challenger ensuring sustained disagreement
- Judge — Claude Opus synthesizes using Analysis of Competing Hypotheses
- CollabEval — Gemini critiques the judge's synthesis; judge revises
- Extraction — Structured Do Now / Consider Later / Skip recommendations
Auto-routes by difficulty: simple questions get quick parallel queries ($0.10), complex ones get full council deliberation ($0.50).
Why multi-model?
Research shows multi-model collaboration produces 18.5% better outcomes than any single model working alone, through a mechanism called collaborative emergence — models surface insights that no individual model would reach. consilium's structured deliberation (blind → debate → judge) is designed to maximize this effect.
Paper: Model Collaboration (Feng et al., 2025)
Models
| Role | Model |
|---|---|
| Panelist | GPT-5.2 Pro |
| Panelist | Gemini 3.1 Pro |
| Panelist | Grok 4 |
| Panelist | DeepSeek-R1 |
| Panelist | GLM-5 |
| Judge | Claude Opus 4.6 |
| Critique | Gemini 3.1 Pro |
Install
# Build from source
# Binary at target/release/consilium
# Optionally symlink:
Requires OpenRouter API key:
# optional, Gemini fallback
Usage
# Auto-route (Opus picks the best mode)
# Quick parallel — independent opinions, no debate (~$0.10)
# Full council with JSON output (~$0.50)
# Deep — auto-decompose + 2 debate rounds (~$0.90)
# Oxford debate — binary for/against + verdict (~$0.40)
# Red team — adversarial stress-test (~$0.20)
# Roundtable discussion (~$0.30)
# Socratic examination (~$0.30)
Modes
| Mode | Flag | Cost | Description |
|---|---|---|---|
| Auto | (default) | varies | Opus classifies and picks the best mode |
| Quick | --quick |
~$0.10 | Parallel queries, no debate |
| Council | --council |
~$0.50 | Full multi-round deliberation + judge |
| Deep | --deep |
~$0.90 | Council + decompose + 2 rounds |
| Oxford | --oxford |
~$0.40 | Binary for/against debate |
| Red Team | --redteam |
~$0.20 | Adversarial stress-test |
| Discuss | --discuss |
~$0.30 | Hosted roundtable |
| Socratic | --socratic |
~$0.30 | Assumption-probing examination |
Key flags
--persona "context" Personal context injected into prompts
--domain banking|healthcare Domain-specific regulatory context
--challenger gemini Assign contrarian role (council mode)
--decompose Break question into sub-questions first
--xpol Cross-pollination phase (council mode)
--followup Interactive drill-down after synthesis (council mode)
--rounds N Rounds for discuss/socratic (0 = unlimited)
--thorough Disable early consensus exit
--output file.md Save transcript to file
--format json|yaml|prose Output format
--share Upload to secret GitHub gist
--quiet Suppress live output (auto-enabled when not a TTY)
--no-save Don't auto-save session
Session management
Architecture
~6,100 lines of Rust. Single 4.7MB binary, ~50ms cold start.
- Single tokio runtime, async throughout
- SSE streaming with
<think>block filtering (DeepSeek-R1, OpenAI reasoning) - CostTracker via AtomicU64 (micro-dollars, lock-free across tasks)
- Output trait enables TeeOutput (stdout + live file) for watch/TUI
- LiveWriter with PID-based file management and stale cleanup
Prior art
Rewritten from consilium-py (Python, 5,091 lines). Same CLI interface, same modes, same output format, same ~/.consilium/ session directory. Both versions can coexist and read each other's sessions.
License
MIT