Skip to main content

hayro_syntax/
lib.rs

1/*!
2A low-level library for reading PDF files.
3
4This crate implements the `Syntax` chapter of the PDF reference, and therefore
5serves as a very good basis for building various abstractions on top of it, without having to reimplement
6the PDF parsing logic.
7
8This crate does not provide more high-level functionality, such as parsing fonts or color spaces.
9Such functionality is out-of-scope for `hayro-syntax`, since this crate is supposed to be
10as *light-weight* and *application-agnostic* as possible.
11
12Functionality-wise, this crate is therefore close to feature-complete. The main missing feature
13is support for password-protected documents. In addition to that, more low-level APIs might be
14added in the future.
15
16The crate is `no_std` compatible but requires an allocator to be available.
17
18# Example
19This short example shows you how to load a PDF file and iterate over the content streams of all
20pages.
21```rust
22use hayro_syntax::Pdf;
23use std::path::PathBuf;
24
25// First load the data that constitutes the PDF file.
26let data = std::fs::read(
27    PathBuf::from(env!("CARGO_MANIFEST_DIR")).join("../hayro-tests/pdfs/custom/text_with_rise.pdf"),
28)
29.unwrap();
30
31// Then create a new PDF file from it.
32//
33// Here we are just unwrapping in case reading the file failed, but you
34// might instead want to apply proper error handling.
35let pdf = Pdf::new(data).unwrap();
36
37// First access all pages, and then iterate over the operators of each page's
38// content stream and print them.
39let pages = pdf.pages();
40for page in pages.iter() {
41    let mut ops = page.typed_operations();
42
43    while let Some(op) = ops.next() {
44        println!("{op:?}");
45    }
46}
47```
48
49# Safety
50When the `unsafe` feature is disabled, there is only use of `unsafe` in `hayro-syntax`.
51We also unconditionally use the `smallvec` crate which uses unsafe internally, but that's
52it.
53
54For better performance, it is strongly recommended to enable the `unsafe` feature,
55which does result in slightly more unsafe code being, but does give better performance:
56- We will use the `flate2` crate for decoding flate streams.
57- SIMD will be enabled to accelerate decoding of images.
58- `memchr` will be used to accelerate some of the fallback parsing code.
59
60# Features
61The supported features include:
62- Parsing xref tables in all its possible formats, including xref streams.
63- Best-effort attempt at repairing PDF files with broken xref tables.
64- Parsing of all objects types (also in object streams).
65- Parsing and decoding PDF streams.
66- Iterating over pages as well as their content streams in a typed or untyped fashion.
67- The crate is very lightweight, especially in comparison to other PDF crates.
68
69# Limitations
70- There are still a few features missing, for example, support for
71  password-protected PDFs. In addition to that, many properties (like page annotations) are
72  currently not exposed.
73- This crate is for read-only processing, you cannot directly use it to manipulate PDF files.
74  If you need to do that, there are other crates in the Rust ecosystem that are suitable for this.
75*/
76
77#![cfg_attr(not(feature = "std"), no_std)]
78#![deny(missing_docs)]
79
80extern crate alloc;
81
82#[macro_use]
83mod log;
84
85pub(crate) mod math;
86pub(crate) mod sync;
87
88mod data;
89pub(crate) mod filter;
90pub(crate) mod pdf;
91pub(crate) mod trivia;
92pub(crate) mod util;
93
94pub mod content;
95mod crypto;
96pub mod metadata;
97pub mod object;
98pub mod page;
99pub mod transform;
100pub mod xref;
101
102// We only expose them so hayro-interpret can use them, but they are not intended
103// to be used by others
104#[doc(hidden)]
105pub mod bit_reader;
106#[doc(hidden)]
107pub mod byte_reader;
108#[doc(hidden)]
109pub mod reader;
110
111pub use data::PdfData;
112pub use filter::*;
113pub use pdf::*;