Expand description
This crate provides PDF parsing functionality, and is somewhat tailored to academic PDFs. It handles text and skips images and tables. It also handles commonly-used math expressions, though this feature is not perfect by any means. Note also that due to kerning considerations, the parsed text may contain erroneous spaces.
Modulesยง
- parse
- The core PDF parsing module. This includes the
PdfParserstruct, which is somewhat tuned for academic PDFs. In particular, it skips images and tables by default. This behavior might change later. The parser also handles common math symbols and converts them to their corresponding LaTeX equivalents.