cli-pdf-extract-0.1.3 is not a library.
cli-pdf-extract
A fast Rust CLI wrapper around pdf_oxide that lets LLMs peek into PDFs without paying the cost of loading and synthesizing whole documents. It extracts page text as Markdown for quick ingestion and can also pull only highlights (plus notes), which is ideal when you want the essence without the full context and need higher LLM throughput.
Features
- Extract pages as Markdown (single page, range, or all pages by default)
- Extract only highlight annotations and their notes (
--highlight) - Write to stdout (for piping) or a file (
--output) - Designed for low-latency LLM workflows (fast “peek” into large PDFs)
Prerequisites
- Rust and Cargo installed (
rustuprecommended)
Installation
Option 1: Build and run locally
Option 2: Install as a local CLI binary
From the repository root:
Then run:
Usage
Show help
Single page extraction
Page range extraction (inclusive)
Extract highlights only (fastest LLM pass)
Pipe directly to another command / LLM tool
|
Notes
- Pages are zero-indexed.
--start-pageand--end-pagemust be provided together.--pagecannot be combined with range flags.- Pro-tip: add standardized tags to annotation notes (e.g.,
<problem-simulations>,<paper-idea>) to enable downstream clustering, trend discovery, and routing.
License
MIT. See LICENSE.
Author
Edgar Torres (edgar.torres@ki.uni-stuttgart.de)