pdf_syntax/lib.rs
1/*!
2A low-level library for reading PDF files.
3
4This crate implements the `Syntax` chapter of the PDF reference, and therefore
5serves as a very good basis for building various abstractions on top of it, without having to reimplement
6the PDF parsing logic.
7
8This crate does not provide more high-level functionality, such as parsing fonts or color spaces.
9Such functionality is out-of-scope for `pdf-syntax`, since this crate is supposed to be
10as *light-weight* and *application-agnostic* as possible.
11
12Functionality-wise, this crate is therefore close to feature-complete. The main missing feature
13is support for password-protected documents. In addition to that, more low-level APIs might be
14added in the future.
15
16The crate is `no_std` compatible but requires an allocator to be available.
17
18# Example
19This short example shows you how to load a PDF file and iterate over the content streams of all
20pages.
21```rust,no_run
22use pdf_syntax::Pdf;
23use std::path::PathBuf;
24
25// First load the data that constitutes the PDF file.
26let data = std::fs::read(
27 PathBuf::from(env!("CARGO_MANIFEST_DIR")).join("../hayro-tests/pdfs/custom/text_with_rise.pdf"),
28)
29.unwrap();
30
31// Then create a new PDF file from it.
32//
33// Here we are just unwrapping in case reading the file failed, but you
34// might instead want to apply proper error handling.
35let pdf = Pdf::new(data).unwrap();
36
37// First access all pages, and then iterate over the operators of each page's
38// content stream and print them.
39let pages = pdf.pages();
40for page in pages.iter() {
41 for op in page.typed_operations() {
42 println!("{op:?}");
43 }
44}
45```
46
47# Safety
48There is one usage of `unsafe`, needed to implement caching using a self-referential struct. Other
49than that, there is no usage of `unsafe`, especially in _any_ of the parser code.
50
51# Features
52The supported features include:
53- Parsing xref tables in all its possible formats, including xref streams.
54- Best-effort attempt at repairing PDF files with broken xref tables.
55- Parsing of all objects types (also in object streams).
56- Parsing and decoding PDF streams.
57- Iterating over pages as well as their content streams in a typed or untyped fashion.
58- The crate is very lightweight, especially in comparison to other PDF crates.
59
60# Limitations
61- There are still a few features missing, for example, support for
62 password-protected PDFs. In addition to that, many properties (like page annotations) are
63 currently not exposed.
64- This crate is for read-only processing, you cannot directly use it to manipulate PDF files.
65 If you need to do that, there are other crates in the Rust ecosystem that are suitable for this.
66*/
67
68#![cfg_attr(not(feature = "std"), no_std)]
69#![deny(missing_docs)]
70
71extern crate alloc;
72
73pub(crate) mod math;
74pub(crate) mod sync;
75
76mod data;
77pub(crate) mod filter;
78pub(crate) mod pdf;
79pub(crate) mod trivia;
80pub(crate) mod util;
81
82pub mod content;
83mod crypto;
84pub mod metadata;
85pub mod object;
86pub mod page;
87pub mod xref;
88
89// We only expose them so hayro-interpret can use them, but they are not intended
90// to be used by others
91#[doc(hidden)]
92pub mod bit_reader;
93#[doc(hidden)]
94pub mod byte_reader;
95#[doc(hidden)]
96pub mod reader;
97
98pub use data::PdfData;
99pub use filter::*;
100pub use pdf::*;