1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
/*!
A low-level library for reading PDF files.
This crate implements the `Syntax` chapter of the PDF reference, and therefore
serves as a very good basis for building various abstractions on top of it, without having to reimplement
the PDF parsing logic.
This crate does not provide more high-level functionality, such as parsing fonts or color spaces.
Such functionality is out-of-scope for `hayro-syntax`, since this crate is supposed to be
as *light-weight* and *application-agnostic* as possible.
Functionality-wise, this crate is therefore close to feature-complete. The main missing feature
is support for password-protected documents. In addition to that, more low-level APIs might be
added in the future.
The crate is `no_std` compatible but requires an allocator to be available.
# Example
This short example shows you how to load a PDF file and iterate over the content streams of all
pages.
```rust
use hayro_syntax::Pdf;
use std::path::PathBuf;
// First load the data that constitutes the PDF file.
let data = std::fs::read(
PathBuf::from(env!("CARGO_MANIFEST_DIR")).join("../hayro-tests/pdfs/custom/text_with_rise.pdf"),
)
.unwrap();
// Then create a new PDF file from it.
//
// Here we are just unwrapping in case reading the file failed, but you
// might instead want to apply proper error handling.
let pdf = Pdf::new(data).unwrap();
// First access all pages, and then iterate over the operators of each page's
// content stream and print them.
let pages = pdf.pages();
for page in pages.iter() {
let mut ops = page.typed_operations();
while let Some(op) = ops.next() {
println!("{op:?}");
}
}
```
# Safety
When the `unsafe` feature is disabled, there is only use of `unsafe` in `hayro-syntax`.
We also unconditionally use the `smallvec` crate which uses unsafe internally, but that's
it.
For better performance, it is strongly recommended to enable the `unsafe` feature,
which does result in slightly more unsafe code being, but does give better performance:
- We will use the `flate2` crate for decoding flate streams.
- SIMD will be enabled to accelerate decoding of images.
- `memchr` will be used to accelerate some of the fallback parsing code.
# Features
The supported features include:
- Parsing xref tables in all its possible formats, including xref streams.
- Best-effort attempt at repairing PDF files with broken xref tables.
- Parsing of all objects types (also in object streams).
- Parsing and decoding PDF streams.
- Iterating over pages as well as their content streams in a typed or untyped fashion.
- The crate is very lightweight, especially in comparison to other PDF crates.
# Limitations
- There are still a few features missing, for example, support for
password-protected PDFs. In addition to that, many properties (like page annotations) are
currently not exposed.
- This crate is for read-only processing, you cannot directly use it to manipulate PDF files.
If you need to do that, there are other crates in the Rust ecosystem that are suitable for this.
*/
extern crate alloc;
pub
pub
pub
pub
pub
pub
// We only expose them so hayro-interpret can use them, but they are not intended
// to be used by others
pub use PdfData;
pub use *;
pub use *;