PDF-CLI
A Rust library and CLI tool for reading, writing, and manipulating PDF files. Converts to/from Markdown. Implemented entirely in Rust without external PDF libraries.
Features
Library API
- In-memory PDF generation:
generate_pdf_bytes()— no filesystem needed - PDF validation:
validate_pdf()/validate_pdf_bytes()— structural integrity checks - Rich element model: 17
Elementvariants for document modeling - Accessibility:
StructureTypeenum (35 types),StructureElementtree,AccessibilityOptions
PDF Generation
- From scratch: Create PDFs with custom fonts and text content
- From Markdown: Rich formatting (headers, lists, task lists, blockquotes, tables, code blocks, definition lists, footnotes, images, links, page breaks)
- Text color:
Colorstruct (RGB), code blocks in gray, links in blue - Text alignment: H1 centered, configurable
TextAlignenum - Page orientation: Landscape/portrait with
--landscapeCLI flag - Page numbering: Automatic footer page numbers
- Watermarks: Diagonal text with configurable opacity/size
PDF Parsing
- Text extraction: Tj, TJ operators, font encodings (WinAnsi, MacRoman)
- Cross-reference streams: PDF 1.5+ xref stream parsing
- Object streams: Compressed object stream handling
- Validation: Header, xref, trailer, catalog, pages, object pairing checks
PDF Manipulation
- Merge: Combine multiple PDFs
- Split: Extract page ranges
- Rotate: 0/90/180/270°
- Reorder: Arbitrary page ordering
- Watermark: Diagonal text overlay
- Metadata: Title, author, subject, keywords
- Annotations: Text, link, and highlight annotations
- Images: JPEG embedding with aspect-ratio scaling
Installation
From Source
The binary will be available at target/release/pdf-cli.
Usage
Basic Commands
Create a Simple PDF
Create PDF with Custom Font and Size
Convert Markdown to PDF
Convert Markdown to PDF with Custom Styling
Extract Text from PDF
Convert PDF to Markdown
Add Image to PDF
Landscape PDF
Merge PDFs
Split PDF (extract pages 2-5)
Rotate PDF
Create PDF with Metadata
Supported Fonts
- Helvetica
- Times-Roman
- Courier
- And other standard PDF Type 1 fonts
Examples
Creating a Multi-page Document
Converting Complex Markdown
# Create a sample markdown file
EOF
Convert to PDF
pdf-cli md-to-pdf sample.md sample.pdf --font "Times-Roman" --font-size 12
## Library Usage
```rust
use pdfrs::{elements, pdf_generator, pdf};
// Parse markdown into elements
let elements = elements::parse_markdown("# Hello\n\nWorld");
// Generate PDF bytes in memory
let layout = pdf_generator::PageLayout::portrait();
let pdf_bytes = pdf_generator::generate_pdf_bytes(
&elements, "Helvetica", 12.0, layout
).unwrap();
// Validate the generated PDF
let validation = pdf::validate_pdf_bytes(&pdf_bytes);
assert!(validation.valid);
assert!(validation.page_count >= 1);
Architecture
This tool is built with a modular architecture:
- PDF Parser (
src/pdf.rs): PDF parsing, text extraction, validation, xref/object stream parsing - PDF Generator (
src/pdf_generator.rs): Creates PDFs with layout, color, alignment, accessibility - Elements (
src/elements.rs): 17 structured element types and markdown parser - Markdown (
src/markdown.rs): Markdown-to-PDF pipeline with rich formatting - PDF Operations (
src/pdf_ops.rs): Merge, split, rotate, reorder, watermark, metadata, annotations - Image Handler (
src/image.rs): JPEG/PNG/BMP embedding with dimension parsing - Compression (
src/compression.rs): PDF stream compression (deflate) - Security (
src/security.rs): Password protection, permissions
See ARCHITECTURE.md for detailed module documentation.
Testing
251 tests across 4 test suites:
- 115 lib tests: Unit tests for all modules
- 112 bin tests: CLI command tests
- 13 integration tests: End-to-end roundtrip, merge, split, rotate, watermark, reorder
- 11 bench tests: Property-based and benchmark tests
Round-trip validation tests verify that every element type survives: generate → validate → parse → verify.
Limitations
- Text extraction works best with PDFs generated by this tool or simple Type 1 font PDFs
- Font support is limited to standard Type 1 fonts (Helvetica, Times-Roman, Courier)
- Image embedding is JPEG-focused (PNG/BMP dimension parsing available)
- Full tagged PDF output not yet implemented (structure types defined)
Contributing
Contributions are welcome! Please read our Contributing Guidelines for details.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- Built entirely in Rust without external PDF dependencies
- Implements core PDF specifications from scratch
- Inspired by the need for a lightweight PDF toolchain