Crate zqa_pdftools

Crate zqa_pdftools 

Source
Expand description

This crate provides PDF parsing functionality, and is somewhat tailored to academic PDFs. It handles text and skips images and tables. It also handles commonly-used math expressions, though this feature is not perfect by any means. Note also that due to kerning considerations, the parsed text may contain erroneous spaces.

Modulesยง

parse
The core PDF parsing module. This includes the PdfParser struct, which is somewhat tuned for academic PDFs. In particular, it skips images and tables by default. This behavior might change later. The parser also handles common math symbols and converts them to their corresponding LaTeX equivalents.