hayro_syntax/lib.rs
1/*!
2A low-level library for reading PDF files.
3
4This crate implements the `Syntax` chapter of the PDF reference, and therefore
5serves as a very good basis for building various abstractions on top of it, without having to reimplement
6the PDF parsing logic.
7
8This crate does not provide more high-level functionality, such as parsing fonts or color spaces.
9Such functionality is out-of-scope for `hayro-syntax`, since this crate is supposed to be
10as *light-weight* and *application-agnostic* as possible.
11
12Functionality-wise, this crate is therefore close to feature-complete. The main missing feature
13is support for password-protected documents. In addition to that, more low-level APIs might be
14added in the future.
15
16# Example
17This short example shows you how to load a PDF file and iterate over the content streams of all
18pages.
19```rust
20use hayro_syntax::Pdf;
21use std::path::PathBuf;
22use std::sync::Arc;
23
24// First load the data that constitutes the PDF file.
25let data = std::fs::read(
26 PathBuf::from(env!("CARGO_MANIFEST_DIR")).join("../hayro-tests/pdfs/custom/text_with_rise.pdf"),
27)
28.unwrap();
29
30// Then create a new PDF file from it.
31//
32// Here we are just unwrapping in case reading the file failed, but you
33// might instead want to apply proper error handling.
34let pdf = Pdf::new(Arc::new(data)).unwrap();
35
36// First access all pages, and then iterate over the operators of each page's
37// content stream and print them.
38let pages = pdf.pages();
39for page in pages.iter() {
40 for op in page.typed_operations() {
41 println!("{op:?}");
42 }
43}
44```
45
46# Safety
47There is one usage of `unsafe`, needed to implement caching using a self-referential struct. Other
48than that, there is no usage of `unsafe`, especially in _any_ of the parser code.
49
50# Features
51The supported features include:
52- Parsing xref tables in all its possible formats, including xref streams.
53- Best-effort attempt at repairing PDF files with broken xref tables.
54- Parsing of all objects types (also in object streams).
55- Parsing and decoding PDF streams.
56- Iterating over pages as well as their content streams in a typed or untyped fashion.
57- The crate is very lightweight, especially in comparison to other PDF crates.
58
59# Limitations
60- There are still a few features missing, for example, support for
61 password-protected PDFs. In addition to that, many properties (like page annotations) are
62 currently not exposed.
63- This crate is for read-only processing, you cannot directly use it to manipulate PDF files.
64 If you need to do that, there are other crates in the Rust ecosystem that are suitable for this.
65*/
66
67#![deny(missing_docs)]
68
69use std::sync::Arc;
70
71pub(crate) mod data;
72pub(crate) mod filter;
73pub(crate) mod pdf;
74pub(crate) mod trivia;
75pub(crate) mod util;
76
77pub mod content;
78mod crypto;
79pub mod metadata;
80pub mod object;
81pub mod page;
82pub mod xref;
83
84// We only expose them so hayro-interpret can use them, but they are not intended
85// to be used by others
86#[doc(hidden)]
87pub mod bit_reader;
88#[doc(hidden)]
89pub mod byte_reader;
90#[doc(hidden)]
91pub mod reader;
92
93pub use filter::*;
94pub use pdf::*;
95
96/// A container for the bytes of a PDF file.
97pub type PdfData = Arc<dyn AsRef<[u8]> + Send + Sync>;