hayro_syntax/
lib.rs

1/*!
2A low-level library for reading PDF files.
3
4This crate implements most of the points in the `Syntax` chapter of the PDF reference, and therefore
5serves as a very good basis for building various abstractions on top of it, without having to reimplement
6the PDF parsing logic.
7
8This crate does not provide more high-level functionality, such as parsing fonts or color spaces.
9Such functionality is out-of-scope for `hayro-syntax`, since this crate is supposed to be
10as light-weight and application-agnostic as possible. Functionality-wise, this crate is therefore
11pretty much feature-complete, though more low-level APIs might be added in the future.
12
13# Example
14This short example shows how you can iterate over the operations of the content stream of all pages
15in a PDF file.
16```rust
17use std::path::PathBuf;
18use std::sync::Arc;
19use hayro_syntax::pdf::Pdf;
20
21let data = std::fs::read(PathBuf::from(env!("CARGO_MANIFEST_DIR")).join("../hayro-render/pdfs/text_with_rise.pdf")).unwrap();
22let pdf = Pdf::new(Arc::new(data)).unwrap();
23let pages = pdf.pages().unwrap();
24
25for page in pages {
26    for op in page.typed_operations() {
27        println!("{:?}", op);
28    }
29}
30```
31
32# Features
33The supported features include:
34- Parsing the xref table in all of its possible formats, including xref streams.
35- Parsing of all objects (also in object streams).
36- Parsing and evaluating of PDF functions.
37- Parsing and decoding PDF streams.
38- Iterating over pages in a PDF as well as their content streams in a typed fashion.
39- The crate is very lightweight in comparison to other PDF crates, at least if you don't enable
40  the jpeg2000 feature.
41
42# Limitations
43I would like to highlight the following limitations:
44
45- There are still features missing, for example, support for encrypted PDFs. In addition to that,
46  many properties (like page annotations) are currently not exposed.
47- This crate is for read-only processing, you cannot directly use it to manipulate PDF files.
48  If you need to do that, there are other crates in the Rust ecosystem that are suitable for this.
49- I do want to note that the main reason this crate exists is to serve as a foundation for
50  `hayro-render`. Therefore, I am not planning on adding many other features that aren't needed
51  to rasterize PDFs. But I am open to feedback, and if the crate covers everything
52  you need, you are more than free to use it directly!
53
54# Cargo features
55This crate has one feature, `jpeg2000`. PDF allows for the insertion of JPEG2000 images. However,
56unfortunately, JPEG2000 is a very complicated format. There exists a Rust crate that allows decoding
57such images (which is also used by `hayro-syntax`), but it is a very heavy dependency, has a lot of
58unsafe code (due to having been ported with `c2rust`), and also has a dependency on libc, meaning that you
59might be restricted in the targets you can build to. Because of this, I recommend not enabling this
60feature, unless you absolutely need to be able to support such images.
61*/
62
63#![forbid(unsafe_code)]
64#![deny(missing_docs)]
65
66use std::sync::Arc;
67
68pub mod bit_reader;
69pub mod content;
70pub(crate) mod data;
71pub mod document;
72pub mod filter;
73pub mod function;
74pub mod object;
75pub mod pdf;
76pub mod reader;
77pub mod trivia;
78pub(crate) mod util;
79pub mod xref;
80
81const NUM_SLOTS: usize = 10000;
82
83/// A container for the bytes of a PDF file.
84pub type PdfData = Arc<dyn AsRef<[u8]> + Send + Sync>;