pdf-extract 0.3.0

A library to extract content from pdfs
Documentation
https://github.com/euske/pdfminer

Cermine uses  Java itext in characterextractor

Grobid uses xpdf / Using pdf2xml/
written in Java though

https://www.crossref.org/labs/pdfextract/
written in ruby recommends Cermine

https://github.com/elifesciences/sciencebeam
uses Grobid and apache beam


contentmine

https://github.com/ContentMine/norma