pdf-extract 0.10.0

A library to extract content from pdfs

pdf-extract

A rust library to extract content from PDF files.

let bytes = std::fs::read("tests/docs/simple.pdf").unwrap();
let out = pdf_extract::extract_text_from_mem(&bytes).unwrap();
assert!(out.contains("This is a small demonstration"));

See also

https://github.com/elacin/PDFExtract/
https://github.com/euske/pdfminer / https://github.com/pdfminer/pdfminer.six
https://gitlab.com/crossref/pdfextract
https://github.com/VikParuchuri/marker
https://github.com/kermitt2/pdfalto used by grobid
https://github.com/opendatalab/MinerU (uses PyMuPDF and pdfminer.six)

Not PDF specific

https://github.com/Layout-Parser/layout-parser