Crate pdf_extract

Modules§

decode_text_string
Decodes a text string. Depending on the BOM at the start of the string, a different encoding is chosen. All encodings specified in PDF2.0 are supported (PDFDocEncoding, UTF-16BE, and UTF-8).
encode_utf8
Encodes the given str to UTF-8. This method of encoding text strings is first specified in PDF2.0 and reader support is still lacking (notably, Adobe Acrobat Reader doesn’t support it at the time of writing). Thus, using it is NOT RECOMMENDED.
encode_utf16_be
Encodes the given str to UTF-16BE. The recommended way to encode text strings, as it supports all of unicode and all major PDF readers support it.
extract_text
Extract the text from a pdf at path and return a String with the results
extract_text_by_pages
Extract the text from a pdf at path and return a Vec<String> with the results separately by page
extract_text_by_pages_encrypted
extract_text_encrypted
extract_text_from_mem
extract_text_from_mem_by_pages
extract_text_from_mem_by_pages_encrypted
extract_text_from_mem_encrypted
output_doc
Parse a given document and output it to output
output_doc_encrypted
output_doc_page
print_metadata
text_string
Creates a text string. If the input only contains ASCII characters, the string is encoded in PDFDocEncoding, otherwise in UTF-16BE.

ObjectId
Object identifier consists of two parts: object number and generation number.
Transform