papers-rag 0.3.0

Local vector RAG system for academic papers using LanceDB and FastEmbed
docs.rs failed to build papers-rag-0.3.0
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.

papers

crates.io CI License: MIT

Search, manage, and explore academic papers from the terminal. Or run as an MCP to query your papers with an LLM. Queries 240M+ works via OpenAlex, integrates with your Zotero library, and builds a local vector index over your papers with LanceDB so you can semantically search across sections and figures. Embedding accelerated with DirectML and CoreML on Windows and macOS.

[!NOTE] Even the best analytical PDF-extraction methods mangle LaTeX and tables for technical papers. This project uses vision-model-based OCR via Datalab (requires API key) to produce clean markdown with math and tables preserved. Extracted results (JSON, markdown, images) sync back to your Zotero library.

You can also run marker locally if you meet its license requirements — just place the output in the cache directory.

Install

Download a prebuilt binary from releases, or build from source:

cargo install --path crates/papers-cli

Commands

Command Description
work, author, source, institution, topic, publisher, funder, domain, field, subfield Query the OpenAlex catalog
zotero Access your Zotero library
rag Semantic search over locally indexed papers
selection Manage named groups of papers
mcp MCP server integration

Commands accepts --json for machine-readable output.

OpenAlex

Search and filter

papers work list -s "attention is all you need" -n 3
papers work list --author "Yann LeCun" --year 2020-2024 --open
papers work list --topic "deep learning" --citations ">100" --sort cited_by_count:desc
papers author list --institution harvard --country US --h-index ">50"

Filter aliases (--author, --year, --topic, --citations, etc.) resolve names to OpenAlex IDs automatically. You can also use raw OpenAlex filter syntax via --filter.

Get by ID or search

The get subcommand accepts OpenAlex IDs, DOIs, ORCIDs, ROR IDs, PubMed IDs, ISSNs, or plain search queries:

papers work get "attention is all you need"
papers work get https://doi.org/10.7717/peerj.4375
papers author get "yoshua bengio"
papers institution get "MIT"

Zotero

Requires ZOTERO_USER_ID and ZOTERO_API_KEY environment variables (zotero.org/settings/keys).

papers zotero work list --tag Starred --sort dateModified --direction desc
papers zotero work list --search "rendering" --type conferencePaper -n 5
papers zotero work annotations <key>
papers zotero attachment file <key> --output paper.pdf
papers zotero collection list --top

Entities: work, attachment, annotation, note, collection, tag, search, group.

RAG

Local semantic search over your papers using LanceDB and Embedding Gemma 300M (via FastEmbed + ONNX Runtime). Runs locally with hardware acceleration via DirectML on Windows and CoreML on macOS.

papers rag ingest                        # Index papers from marker cache
papers rag search "differentiable rendering" -n 5
papers rag search-figures "neural radiance field architecture"
papers rag get-section <paper> <section>
papers rag outline <paper>

MCP server

Exposes CLI commands as MCP tools for LLMs. Currently only --stdio is supported.

Claude Code:

claude mcp add papers -- papers mcp start --stdio

.mcp.json (Claude Desktop, Cursor, etc.):

{
  "mcpServers": {
    "papers": {
      "command": "papers",
      "args": ["mcp", "start", "--stdio"]
    }
  }
}

Filter aliases

Shorthand flags that resolve to OpenAlex filter expressions. Entity-based aliases accept an OpenAlex ID or a search string (resolved to the top result by citation count).

work list

Flag Example Resolves to
--author "einstein", A5108093963 authorships.author.id:<id>
--topic "deep learning", T10320 topics.id:<id>
--domain "physical sciences", 3 topics.domain.id:<id>
--field "computer science", 17 topics.field.id:<id>
--subfield "artificial intelligence", 1702 topics.subfield.id:<id>
--publisher "acm", "acm|ieee" primary_location.source.publisher_lineage:<id>
--source "nature", S137773608 primary_location.source.id:<id>
--institution "mit", I136199984 authorships.institutions.lineage:<id>
--year 2024, >2008, 2008-2024 publication_year:<value>
--citations ">100", "10-50" cited_by_count:<value>
--country US, GB authorships.countries:<value>
--continent europe, asia authorships.continents:<value>
--type article, preprint type:<value>
--open (flag) is_oa:true

author list

Flag Example
--institution "harvard", I136199984
--country US, GB
--continent europe, asia
--citations ">1000", "100-500"
--works ">500", "100-200"
--h-index ">50", "10-20"