Scitadel
Programmable, reproducible scientific literature retrieval.
Scitadel is a CLI and TUI tool that runs federated searches across multiple academic databases, deduplicates results, scores relevance with LLMs, chains citations via snowballing, and exports in structured formats. Every search is persisted and reproducible.
Why
Scientific literature retrieval is fragmented, manual, and non-reproducible. Researchers must search PubMed, arXiv, INSPIRE-HEP, and OpenAlex independently — each with different query syntax, metadata schemas, and export formats. There is no unified, scriptable interface.
Scitadel fixes this: one query, all sources, deterministic results, full audit trail.
Recommended workflow: TUI + agent in adjacent pane
Scitadel's primary workflow is scitadel tui in one terminal pane, an MCP-aware agent (Claude Code, Cursor, Cline, …) in an adjacent pane. The agent talks to scitadel through the MCP server (40+ tools); the TUI redraws live as the agent writes.
┌─────────────────────────────┬──────────────────────────────┐
│ $ scitadel tui │ $ claude │
│ ┌── Papers ───────────────┐ │ │
│ │ ✓ Attention Is All You │ │ > score the open paper │
│ │ ⊘ Recurrent Neural... │ │ against my CRISPR question │
│ │ ★ ✓ Transformer Survey │ │ │
│ └─────────────────────────┘ │ [agent calls │
│ Annotations (3) │ get_current_selection, │
│ • "self-attention..." ─ok │ prepare_assessment, │
│ • "the Transformer" ─ ok │ save_assessment] │
│ │ │
│ Esc: back · n: new · … │ Done. Score: 0.87, reasoning │
└─────────────────────────────┴──────────────────────────────┘
The agent owns conversational reasoning (drafting questions, summarizing papers, suggesting related terms). The TUI owns reading and quick navigation (browse, star, annotate, scroll). They share the SQLite database via WAL mode — every MCP write surfaces in the open TUI within ~100 ms.
You don't need any LLM CLI installed to use scitadel — every TUI/CLI feature works standalone. But once you set up the MCP server (one line, see below), the agent + TUI combo is what scitadel was designed for.
Install
Scitadel is a Rust workspace. You need a Rust toolchain (rustup, stable channel).
From source (recommended today)
This drops a single scitadel binary into ~/.cargo/bin (make sure that's on your PATH). CLI, TUI, and MCP server are all subcommands of the same binary.
As a Claude MCP server
User scope (available in every session, everywhere):
Project scope (committed to the repo, available when cwd is this project):
The repo ships a .mcp.json that registers the scitadel binary from PATH. Just run cargo install --path crates/scitadel-cli once and Claude Code will pick it up automatically.
Local/session scope (no commit, just this machine):
Verify with claude mcp list.
Quick start
# Initialize the database (creates ./.scitadel/scitadel.db)
# Store credentials in your OS keychain (one-time, per source)
# Run a federated search
# View past searches / show a paper / export
# Download a paper by DOI (OA PDF via Unpaywall, else publisher HTML)
# Launch the interactive TUI
Usage patterns
1. Search, review, export
The core loop: one query hits all sources, results are deduplicated and persisted.
# Search across all four sources
# Check what came back
# Export to BibTeX for your paper
# Or get structured JSON for downstream processing
Search IDs support prefix matching — a3f will match if unambiguous.
Re-running the same query creates a new search record. Both runs are persisted and can be compared to see what changed as the underlying corpus evolves.
2. Question-driven search and scoring
Link search terms to a research question, then let scitadel build the query automatically:
# 1. Define what you're looking for
# 2. Add search term groups
# 3. Search using linked terms (auto-builds query with OR)
# Or combine with an explicit query
# 4. Score every paper against your question
# 5. Export the scored results
The assess command sends each paper's title, authors, year, and abstract to Claude with a structured scoring rubric (0.0-1.0). Each assessment stores the full provenance: score, reasoning, exact prompt, model name, and temperature — so results are auditable and re-runnable with different models.
# Tune scoring parameters
3. Citation chaining (snowballing)
Expand your corpus by following citation chains from discovered papers. Scitadel fetches references (backward) and citing papers (forward) from OpenAlex, scores each against your research question, and only follows leads that pass a relevance threshold.
# Snowball from a search's papers
# Control depth, direction, and threshold
Options:
--depth— how many levels to chase (1-3, default 1)--direction—references(backward),cited_by(forward), orboth(default)--threshold— minimum relevance score to continue expanding (default 0.6)
New papers discovered via snowballing are deduplicated against the existing database, and all citation edges are persisted for later exploration.
4. Interactive TUI
Browse searches, papers, annotations, and citation trees in an interactive terminal dashboard. Best experienced with an MCP-aware agent in an adjacent terminal pane (see Recommended workflow).
The TUI has three tabs:
- Searches — browse past search runs, drill into papers, view full metadata and assessments
- Papers — every paper across every search, with a state column (
✓downloaded,⊘paywalled,✗failed,↻in flight) and a star toggle (s) - Questions — research questions with their linked search terms
Per-paper overlay (Enter on a paper): full metadata, annotation list, and:
R— two-pane reader (left: full text with colored highlights, right: annotation threads)n / e / r / d— create / edit / reply / delete annotationJ / K— hop between highlights in reader modeD— download via Unpaywall / publisher
The TUI re-queries the DB every redraw, so any write through the MCP server (in an adjacent pane) shows up live with no refresh keypress.
5. Agent-driven workflow via MCP
The MCP server is the seam between scitadel and any agent that speaks MCP. See Recommended workflow above for the 2-pane setup that's the primary use case.
# Run the server manually (stdio transport)
Setup snippets:
| Agent | Setup |
|---|---|
| Claude Code | claude mcp add --scope user scitadel -- scitadel mcp |
| Claude Desktop | Add to claude_desktop_config.json: {"mcpServers": {"scitadel": {"command": "scitadel", "args": ["mcp"]}}} |
| Cursor / Cline / Continue | Same JSON shape — point at the scitadel mcp subcommand |
40+ MCP tools spanning the full pipeline:
- Search & retrieval:
search,list_searches,get_papers,get_paper,get_annotated_paper,find_similar_searches,summarize_search,download_paper,read_paper,list_sources - Research questions:
create_question,list_questions,add_search_terms,get_rubric - Scoring:
prepare_assessment,prepare_batch_assessments,assess_paper,save_assessment,get_assessments - Annotations (#49):
create_annotation,reply_annotation,update_annotation,delete_annotation,list_annotations,mark_seen,mark_thread_seen,list_unread - Citation graph (#59):
get_references,get_citations - Stars (#120):
toggle_star,set_star,list_starred - TUI awareness (#122):
get_current_selection— "what is the user looking at right now?" - Export:
export_search
Every long-running tool (search / batch scoring / download) emits MCP notifications/progress frames if the caller supplies a progressToken (#58). Annotation writes are audit-logged. Identity is trust-on-first-use (real auth ships with the Phase-5 Dolt sync layer).
The full pipeline — agent formulates a question, generates search terms, runs a search, scores each paper, snowballs relevant citations, writes structured assessments — runs through tool calls with no manual intervention. With #122, the agent can also act on whatever the user is looking at in the TUI without the user pasting IDs.
Workflow coverage
Where scitadel stands against the full envisioned pipeline:
| Step | Status | Detail |
|---|---|---|
| Research question formulation | Done | CLI + MCP |
| Search term management | Done | CLI question add-terms + MCP, search --question auto-builds queries |
| Federated search (4 sources) | Done | PubMed, arXiv, OpenAlex, INSPIRE-HEP in parallel |
| Deduplication and merge | Done | DOI exact + title similarity, cross-source metadata fill |
| LLM relevance scoring | Done | CLI assess + MCP assess_paper, full provenance |
| Citation chaining | Done | Forward/backward snowballing via OpenAlex, relevance-gated |
| Structured export | Done | BibTeX, JSON, CSV |
| Reproducible audit trail | Done | Every search, assessment, citation edge, and scoring prompt persisted |
| Interactive TUI | Done | Textual dashboard with search/paper/assessment/citation browsing |
| Full-text retrieval | Planned | OA papers via Unpaywall, PDF extraction |
| Chat with papers | Planned | RAG over extracted text |
| Knowledge graph | Planned | Citation network visualization |
Sources
| Source | API | Notes |
|---|---|---|
| PubMed | E-utilities (esearch + efetch) | Set SCITADEL_PUBMED_API_KEY for higher rate limits |
| arXiv | Atom feed | No key required |
| OpenAlex | REST via PyAlex | Set SCITADEL_OPENALEX_EMAIL for polite pool |
| INSPIRE-HEP | REST API | No key required |
CLI reference
scitadel search [QUERY] Run a federated search (query optional with -q)
scitadel history Show past search runs
scitadel export <id> Export results (bibtex, json, csv)
scitadel question create <text> Create a research question
scitadel question list List research questions
scitadel question add-terms <qid> ... Link search terms to a question
scitadel assess <search-id> Score papers against a question with Claude
scitadel snowball <search-id> Run citation chaining from a search
scitadel tui Launch the interactive TUI
scitadel mcp Start the MCP server (stdio)
scitadel download <doi> Fetch PDF (Unpaywall) or publisher HTML
scitadel auth login <source> Store credentials in OS keychain
scitadel auth status List configured credentials
scitadel init Initialize the database
Search options
-q, --question Research question ID (auto-builds query from linked terms)
-s, --sources Comma-separated sources (default: pubmed,arxiv,openalex,inspire)
-n, --max-results Maximum results per source (default: 50)
Export options
-f, --format Output format: bibtex, json, csv (default: json)
-o, --output Write to file instead of stdout
Assess options
-q, --question Research question ID (required)
-m, --model Model for scoring (default: claude-sonnet-4-6)
-t, --temperature Temperature for scoring (default: 0.0)
Snowball options
-q, --question Research question ID (required)
-d, --depth Max chaining depth, 1-3 (default: 1)
--direction references, cited_by, or both (default: both)
--threshold Min relevance score to expand (default: 0.6)
-m, --model Model for scoring (default: claude-sonnet-4-6)
Question add-terms options
-q, --query Custom query string (default: terms joined by spaces)
Configuration
Credentials resolve in this order: OS keychain → environment variable → .scitadel/config.toml → empty. For most users the keychain path is best — scitadel auth login <source> prompts you and stores the secret securely.
| Source | Keychain key | Env var | Notes |
|---|---|---|---|
| PubMed | pubmed.api_key |
SCITADEL_PUBMED_API_KEY |
Optional, higher rate limits |
| OpenAlex | openalex.email |
SCITADEL_OPENALEX_EMAIL |
Polite pool |
| PatentsView | patentsview.api_key |
SCITADEL_PATENTSVIEW_KEY |
Free registration |
| Lens | lens.api_token |
SCITADEL_LENS_TOKEN |
Free tier |
| EPO OPS | epo.consumer_key + epo.consumer_secret |
SCITADEL_EPO_KEY, SCITADEL_EPO_SECRET |
Registered app |
| Anthropic | (not stored) | ANTHROPIC_API_KEY |
Required for assess, snowball, MCP scoring |
Other knobs:
| Variable | Default | Description |
|---|---|---|
SCITADEL_DB |
./.scitadel/scitadel.db |
Database path |
SCITADEL_CHAT_MODEL |
claude-sonnet-4-6 |
Model used for scoring |
SCITADEL_CHAT_MAX_TOKENS |
4096 |
Max completion tokens |
SCITADEL_SCORING_CONCURRENCY |
5 |
Parallel scoring requests |
Architecture
Hexagonal (ports and adapters), implemented as a Rust workspace:
scitadel-cli (clap) / scitadel-mcp (rmcp) / scitadel-tui (ratatui)
-> scitadel-core (services, domain, ports)
-> scitadel-db (rusqlite adapters)
-> scitadel-adapters (PubMed, arXiv, OpenAlex, INSPIRE-HEP,
PatentsView, Lens, EPO OPS, Unpaywall)
-> scitadel-scoring (Anthropic SDK)
-> scitadel-export (BibTeX, JSON, CSV)
- Domain models define
Paper,Search,ResearchQuestion,Assessment,Citation,SnowballRun - Repository ports are traits — SQLite today, swap for Postgres without touching services
- Source adapters run in parallel with retry/backoff; partial failures don't abort the search
- Dedup engine validates and normalizes DOIs, merges by DOI (exact) then title similarity (Jaccard), filling metadata gaps across sources
- Snowball service chains citations with relevance-gated traversal, depth limiting, and deduplication
Data model
A paper exists once, regardless of how many searches found it. A paper's relevance is not intrinsic — it's relative to a research question, captured as an assessment with score, reasoning, and provenance (human vs. model). Citations are directed edges between papers, discovered via snowball runs.
ResearchQuestion -< SearchTerm
-< Assessment >- Paper
-< SnowballRun
Search -< SearchResult >- Paper
Search -< SourceOutcome
Paper -< Citation >- Paper
Development
Requires a stable Rust toolchain.
# Build
# Run the binary without installing
# Run tests (workspace-wide)
# Lint
Prebuilt binaries
Every tagged release attaches tarballs for Linux x86_64, macOS x86_64, and
macOS arm64 to the Releases page.
Download the one for your platform, extract, and put the scitadel binary on
your $PATH:
# Example for macOS arm64 (Apple Silicon)
|
Each release also ships a .sha256 file next to each tarball — verify with
shasum -a 256 -c <file>.sha256.
License
Dual-licensed under either of
at your option.
Contribution
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.