hayro_syntax/lib.rs
1/*!
2A low-level library for reading PDF files.
3
4This crate implements the `Syntax` chapter of the PDF reference, and therefore
5serves as a very good basis for building various abstractions on top of it, without having to reimplement
6the PDF parsing logic.
7
8This crate does not provide more high-level functionality, such as parsing fonts or color spaces.
9Such functionality is out-of-scope for `hayro-syntax`, since this crate is supposed to be
10as *light-weight* and *application-agnostic* as possible.
11
12Functionality-wise, this crate is therefore close to feature-complete. The main missing feature
13is support for password-protected documents, as well as improved support for JPEG2000
14documents. In addition to that, more low-level APIs might be added in the future.
15
16# Example
17This short example shows you how to load a PDF file and iterate over the content streams of all
18pages.
19```rust
20use hayro_syntax::Pdf;
21use std::path::PathBuf;
22use std::sync::Arc;
23
24// First load the data that constitutes the PDF file.
25let data = std::fs::read(
26 PathBuf::from(env!("CARGO_MANIFEST_DIR")).join("../hayro-tests/pdfs/custom/text_with_rise.pdf"),
27)
28.unwrap();
29
30// Then create a new PDF file from it.
31//
32// Here we are just unwrapping in case reading the file failed, but you
33// might instead want to apply proper error handling.
34let pdf = Pdf::new(Arc::new(data)).unwrap();
35
36// First access all pages, and then iterate over the operators of each page's
37// content stream and print them.
38let pages = pdf.pages();
39for page in pages.iter() {
40 for op in page.typed_operations() {
41 println!("{op:?}");
42 }
43}
44```
45
46# Safety
47There is one usage of `unsafe`, needed to implement caching using a self-referential struct. Other
48than that, there is no usage of `unsafe`, especially in _any_ of the parser code.
49
50# Features
51The supported features include:
52- Parsing xref tables in all its possible formats, including xref streams.
53- Best-effort attempt at repairing PDF files with broken xref tables.
54- Parsing of all objects types (also in object streams).
55- Parsing and evaluating PDF functions.
56- Parsing and decoding PDF streams.
57- Iterating over pages as well as their content streams in a typed or untyped fashion.
58- The crate is very lightweight, especially in comparison to other PDF crates, assuming you don't
59 enable the `jpeg2000` feature (see further below for more information).
60
61# Limitations
62- There are still a few features missing, for example, support for
63 password-protected PDFs. In addition to that, many properties (like page annotations) are
64 currently not exposed.
65- This crate is for read-only processing, you cannot directly use it to manipulate PDF files.
66 If you need to do that, there are other crates in the Rust ecosystem that are suitable for this.
67
68# Cargo features
69This crate has one feature, `jpeg2000`. PDF allows for the insertion of JPEG2000 images. However,
70unfortunately, JPEG2000 is a very complicated format. There exists a Rust
71[jpeg2k](https://github.com/Neopallium/jpeg2k) crate that allows decoding such images. However, it is a
72relatively heavy dependency, has a lot of unsafe code (due to having been ported with
73[c2rust](https://c2rust.com/)), and also has a dependency on libc, meaning that you might be
74restricted in the targets you can build to. Because of this, I recommend not enabling this feature
75unless you absolutely need to be able to support such images.
76*/
77
78#![deny(missing_docs)]
79
80use std::sync::Arc;
81
82pub(crate) mod data;
83pub(crate) mod filter;
84pub(crate) mod pdf;
85pub(crate) mod trivia;
86pub(crate) mod util;
87
88pub mod bit_reader;
89pub mod content;
90mod crypto;
91pub mod object;
92pub mod page;
93pub mod xref;
94
95// This module should only be used by hayro crates and is considered an implementation detail.
96#[doc(hidden)]
97pub mod reader;
98
99pub use pdf::*;
100
101/// A container for the bytes of a PDF file.
102pub type PdfData = Arc<dyn AsRef<[u8]> + Send + Sync>;