Crate pdf2image_alt

Source
Expand description

§pdf2image

Provides functions for rendering a single page and one for rendering multiple pages

A simplified port of Python’s pdf2image that wraps pdftoppm and pdftocairo (part of poppler) to convert PDFs to image::DynamicImages.

This library is a fork of https://github.com/styrowolf/pdf2image that replaces the usages of blocking multithreaded (rayon) with tokio async rendering. Which itself is a port of the python pdf2image library.

It wraps pdftoppm and pdftocairo (part of Poppler) under the hood, uses the “pdfinfo” from poppler to determine basic info about the pdf (number of pages and whether its encrypted)

This fork uses async rendering instead and allows the rendering of a single page or multiple pages with separate functions.

[!INFO] You must have poppler installed on your system in order to use this program it depends on the pdfinfo and

§Installation

pdf2image requires poppler to be installed.

cargo add pdf2image_alt

§Windows

Windows users will have to build or download poppler for Windows. Python’s pdf2image maintainer recommends @oschwartz10612 version. You will then have to add the bin/ folder to PATH or use the environment variable PDF2IMAGE_POPPLER_PATH.

§macOS

using homebrew:

brew install poppler

§Linux

Most distros ship with pdftoppm and pdftocairo. If they are not installed, refer to your package manager to install poppler-utils

§Platform-independent (Using conda)

  1. Install poppler: conda install -c conda-forge poppler
  2. Install pdf2image: pip install pdf2image

§Quick Start

use pdf2image_alt::{render_pdf_multi_page, PDF2ImageError, PdfInfo, RenderOptionsBuilder};

#[tokio::main]
async fn main() -> Result<(), PDF2ImageError> {
    let data = std::fs::read("examples/pdfs/ropes.pdf").unwrap();
    let pdf_info = PdfInfo::read(data.as_slice()).await.unwrap();
    let options = RenderOptionsBuilder::default().pdftocairo(true).build()?;
    let pages = render_pdf_multi_page(
        &data,
        &pdf_info,
        pdf2image_alt::Pages::Range(1..=8),
        &options,
    )
    .await
    .unwrap();
    println!("{:?}", pages.len());

    Ok(())
}

§License

pdf2image includes code derived from Edouard Belval’s pdf2image Python module, which is MIT licensed. Similarly, pdf2image is also licensed under the MIT License.

Re-exports§

pub use image;

Structs§

Crop
Crop a specific section of the page
PdfInfo
RenderOptions
Options for rendering PDFs
RenderOptionsBuilder
Builder for RenderOptions.

Enums§

DPI
Specifies resolution in terms of dots per inch
PDF2ImageError
pdf2img error variants
Pages
Specifies which pages to render
Password
Password to unlock encrypted PDFs
Scale
Scales pages to a certain number of pixels

Functions§

pdftext_all_pages
Extracts the text contents of a pdf file from all pages as one big string, use pdftext_multi_page to get a separate string for each page
pdftext_multi_page
Extracts the text contents of a pdf file from multiple page
pdftext_single_page
Extracts the text contents of a pdf file from a single page
render_pdf_multi_page
Renders the PDF to images.
render_pdf_single_page
Renders the PDF to images.

Type Aliases§

Result
A Result type alias using PDF2ImgError instances as the error variant.