# pdf-dump
A CLI tool for inspecting and debugging the internal structure of PDF files.
`pdf-dump` shows you what's actually inside a PDF — objects, streams, fonts, images, form fields, bookmarks, annotations, tagged structure, and more. Useful for debugging PDF generation, understanding why a PDF looks wrong, or exploring the format.
## Installation
```bash
cargo install pdf-dump
```
Requires a Rust toolchain that supports edition 2024.
## Quick Start
```bash
# Overview: metadata, validation summary, stream stats, feature indicators
pdf-dump file.pdf
# Extract text
pdf-dump file.pdf --text
pdf-dump file.pdf --text --page 3
# Search for text across pages
pdf-dump file.pdf --find-text "invoice"
# Page info: dimensions, resources, fonts, annotations, text preview
pdf-dump file.pdf --page 3
# List fonts or images
pdf-dump file.pdf --fonts
pdf-dump file.pdf --images
# Explain a specific object
pdf-dump file.pdf --inspect 5
# Find all font objects
pdf-dump file.pdf --search Type=Font
# Structural validation
pdf-dump file.pdf --validate
# One-line listing of every object
pdf-dump file.pdf --list
```
## Modes
### Document-level modes (combinable)
These can be used together — output gets section headers automatically:
| `--text` | Extract readable text from content streams |
| `--operators` | Show content stream operators |
| `--find-text "pattern"` | Case-insensitive text search with context |
| `--fonts` | List all fonts with encoding and embedding details |
| `--images` | List all images with dimensions, color space, filters |
| `--forms` | List AcroForm fields with names, types, values |
| `--bookmarks` | Show the document outline tree |
| `--annotations` | Show annotations with link targets |
| `--tags` | Show tagged PDF structure tree (accessibility) |
| `--tree` | Show the object graph as an indented reference tree |
| `--validate` | Structural checks: broken refs, unreachable objects, required keys |
| `--list` | One-line-per-object table |
| `--detail <view>` | Detail views: `security`, `embedded`, `labels`, `layers` |
```bash
# Combine freely
pdf-dump file.pdf --fonts --images --validate
```
### Standalone modes (one at a time)
| `--object N` | Print object(s) by number (`5`, `1,5,12`, `3-7`) |
| `--inspect N` | Full explanation of an object's role and relationships |
| `--search <expr>` | Find objects matching criteria (`Type=Font`, `key=MediaBox`, `stream=text`) |
| `--extract-stream N --output file` | Extract a decoded stream to a file |
### Modifiers
| `--page N` or `--page N-M` | Filter to specific pages; shows page info when used alone |
| `--json` | Structured JSON output (works with every mode) |
| `--decode` | Decompress stream contents |
| `--deref` | Inline-expand references (with `--object`) |
| `--depth N` | Limit traversal depth (with `--tree`, `--tags`, `--json`) |
| `--hex` | Hex dump for binary streams |
| `--raw` | Raw undecoded stream bytes (with `--object`) |
| `--truncate N` | Limit binary output to N bytes |
| `--dot` | GraphViz DOT output (with `--tree`) |
## JSON Output
Every mode supports `--json` for structured output:
```bash
pdf-dump file.pdf --json # Overview as JSON
pdf-dump file.pdf --fonts --json # Font list as JSON
pdf-dump file.pdf --fonts --images --json # Combined modes wrapped in a JSON object
pdf-dump file.pdf --validate --json # Validation results as JSON
```
## Supported Stream Filters
FlateDecode, ASCII85Decode, ASCIIHexDecode, LZWDecode, RunLengthDecode — applied sequentially for multi-filter pipelines.
## Acknowledgments
Built on [lopdf](https://github.com/nickel-org/lopdf), a pure-Rust PDF parsing library.
## Related Projects
- [medpdf](https://github.com/ComposerChrisF/medpdf) — Medium-level PDF API over lopdf (includes medpdf-image for image embedding)
- [pdf-maker](https://github.com/ComposerChrisF/pdf-maker) — CLI tool for merging, watermarking, and manipulating PDF files
## License
Licensed under either of
- Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE) or <http://www.apache.org/licenses/LICENSE-2.0>)
- MIT license ([LICENSE-MIT](LICENSE-MIT) or <http://opensource.org/licenses/MIT>)
at your option.
## Contribution
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.