pub struct InterpreterSettings {
pub font_resolver: FontResolverFn,
pub cmap_resolver: CMapResolverFn,
pub warning_sink: WarningSinkFn,
pub render_annotations: bool,
pub skip_signature_widgets: bool,
}Expand description
Settings that should be applied during the interpretation process.
Fields§
§font_resolver: FontResolverFnNearly every PDF contains text. In most cases, PDF files embed the fonts they use, and pdf-interpret can therefore read the font files and do all the processing needed. However, there are two problems:
- Fonts don’t have to be embedded, it’s possible that the PDF file only defines the basic metadata of the font, like its name, but relies on the PDF processor to find that font in its environment.
- The PDF specification requires a list of 14 fonts that should always be available to a
PDF processor. These include:
- Times New Roman (Normal, Bold, Italic,
BoldItalic) - Courier (Normal, Bold, Italic,
BoldItalic) - Helvetica (Normal, Bold, Italic,
BoldItalic) ZapfDingBats- Symbol
- Times New Roman (Normal, Bold, Italic,
Because of this, if any of the above situations occurs, this callback will be called, which expects the data of an appropriate font to be returned, if available. If no such font is provided, the text will most likely fail to render.
For the font data, there are two different formats that are accepted:
- Any valid TTF/OTF font.
- A valid CFF font program.
The following recommendations are given for the implementation of this callback function.
For the standard fonts, in case the original fonts are available on the system, you should just return those. Otherwise, for Helvetica, Courier and Times New Roman, the best alternative are the corresponding fonts of the Liberation font family. If you prefer smaller fonts, you can use the Foxit CFF fonts, which are much smaller but are missing glyphs for certain scripts.
For the Symbol and ZapfDingBats fonts, you should also prefer the system fonts, and if
not available to you, you can, similarly to above, use the corresponding fonts from Foxit.
If you don’t want having to deal with this, you can just enable the embed-fonts feature
and use the default implementation of the callback.
cmap_resolver: CMapResolverFnA callback for resolving cmaps that aren’t embedded.
When the PDF requires using a cmap that is not directly embedded in the PDF, this callback will be called to attempt fetching the data of the file.
When the embed-cmaps feature is enabled, this uses load_embedded
method from pdf-interpret-cmap by default, which embeds the cmap files for
all 61 predefined cmaps
that the PDF specification requires to be readily available on a system.
Otherwise, you can implement your custom logic for lazily fetching the
data. If you are fine not supporting such PDFs, you can simply pass a closure
that always returns None.
warning_sink: WarningSinkFnIn certain cases, pdf-interpret will emit a warning in case an issue was encountered while interpreting
the PDF file. Providing a callback allows you to catch those warnings and handle them, if desired.
render_annotations: boolWhether annotations should be rendered as well.
Note that this feature is currently not fully implemented yet, so some annotations might be missing.
skip_signature_widgets: boolWhether to skip /FT /Sig (signature widget) appearance streams.
Rendering sets this to true to match MuPDF behaviour, but text
extraction should set it to false so that signature text is included.
Trait Implementations§
Source§impl Clone for InterpreterSettings
impl Clone for InterpreterSettings
Source§fn clone(&self) -> InterpreterSettings
fn clone(&self) -> InterpreterSettings
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more