hayro_syntax/
lib.rs

1/*!
2A low-level library for reading PDF files.
3
4This crate implements most of the points in the `Syntax` chapter of the PDF reference, and therefore
5serves as a very good basis for building various abstractions on top of it, without having to reimplement
6the PDF parsing logic.
7
8This crate does not provide more high-level functionality, such as parsing fonts or color spaces.
9Such functionality is out-of-scope for `hayro-syntax`, since this crate is supposed to be
10as *light-weight* and *application-agnostic* as possible. Functionality-wise, this crate is therefore
11close to feature-complete (the main missing feature is support for encrypted documents), though more
12low-level APIs might be added in the future.
13
14# Example
15This short example shows you how to load a PDF file and iterate over the content streams of all
16pages.
17```rust
18use std::path::PathBuf;
19use std::sync::Arc;
20use hayro_syntax::Pdf;
21
22let data = std::fs::read(PathBuf::from(env!("CARGO_MANIFEST_DIR")).join("../hayro-tests/pdfs/text_with_rise.pdf")).unwrap();
23let pdf = Pdf::new(Arc::new(data)).unwrap();
24let pages = pdf.pages();
25
26for page in pages.iter() {
27    for op in page.typed_operations() {
28        println!("{:?}", op);
29    }
30}
31```
32
33# Safety
34There is one usage of `unsafe`, needed to implement caching using a self-referential struct. Other
35than that, there is no usage of `unsafe`, especially in _any_ of the parser code.
36
37# Features
38The supported features include:
39- Parsing the xref table in all of its possible formats, including xref streams.
40- Best-effort attempt at repairing broken PDF files.
41- Parsing of all objects types (also in object streams).
42- Parsing and evaluating PDF functions.
43- Parsing and decoding PDF streams.
44- Iterating over pages as well as their content streams in a typed fashion.
45- The crate is very lightweight, especially in comparison to other PDF crates, assuming you don't
46  enable the `jpeg2000` feature (see further below).
47
48# Limitations
49- There are still a few features missing, for example, support for encrypted PDFs. In addition to that,
50  many properties (like page annotations) are currently not exposed.
51- This crate is for read-only processing, you cannot directly use it to manipulate PDF files.
52  If you need to do that, there are other crates in the Rust ecosystem that are suitable for this.
53
54# Cargo features
55This crate has one feature, `jpeg2000`. PDF allows for the insertion of JPEG2000 images. However,
56unfortunately, JPEG2000 is a very complicated format. There exists a Rust
57[jpeg2k](https://github.com/Neopallium/jpeg2k) crate that allows decoding such images. However, it is a
58relatively heavy dependency, a lot of unsafe code (due to having been ported with `c2rust`), and
59also has a dependency on libc, meaning that you might be restricted in the targets you can build to.
60Because of this, I recommend not enabling this feature unless you absolutely need to be able to
61support such images.
62*/
63
64#![deny(missing_docs)]
65
66use std::sync::Arc;
67
68pub(crate) mod data;
69pub(crate) mod filter;
70pub(crate) mod pdf;
71pub(crate) mod reader;
72pub(crate) mod trivia;
73pub(crate) mod util;
74
75pub mod bit_reader;
76pub mod content;
77pub mod function;
78pub mod object;
79pub mod page;
80pub mod xref;
81
82pub use pdf::*;
83
84/// A container for the bytes of a PDF file.
85pub type PdfData = Arc<dyn AsRef<[u8]> + Send + Sync>;