Crate zqa_pdftools

Expand description

This crate provides PDF parsing functionality, and is somewhat tailored to academic PDFs. It handles text and skips images and tables. It also handles commonly-used math expressions, though this feature is not perfect by any means. Note also that due to kerning considerations, the parsed text may contain erroneous spaces.

Modules§

parse: The core PDF parsing module. This includes the PdfParser struct, which is somewhat tuned for academic PDFs. In particular, it skips images and tables by default. This behavior might change later. The parser also handles common math symbols and converts them to their corresponding LaTeX equivalents.

Crate zqa_pdftools

Crate zqa_pdftools Copy item path

Modules§

Crate zqa_pdftools