Crate pdf_extract

Source

Modules§

content
encryption
filters
xobject
xref

Macros§

dictionary

Structs§

Bookmark
CalGray
CalRGB
Destination
Dictionary
Dictionary object.
Document
A PDF document.
EncryptionState
HTMLOutput
IncrementalDocument
Lab
MediaBox
ObjectStream
Path
Permissions
PlainTextOutput
Reader
SVGOutput
Separation
Space
Stream
Stream object Warning - all streams must be indirect objects, while the stream dictionary may be a direct object
Toc
WriteAdapter

Enums§

AlternateColorSpace
ColorSpace
Encoding
EncryptionVersion
Error
Object
Basic PDF object types defined in an enum.
Outline
OutputError
PathOp
StringFormat
String objects can be written in two formats.

Traits§

ConvertToFmt
OutputDev

Functions§

decode_text_string
Decodes a text string. Depending on the BOM at the start of the string, a different encoding is chosen. All encodings specified in PDF2.0 are supported (PDFDocEncoding, UTF-16BE, and UTF-8).
encode_utf8
Encodes the given str to UTF-8. This method of encoding text strings is first specified in PDF2.0 and reader support is still lacking (notably, Adobe Acrobat Reader doesn’t support it at the time of writing). Thus, using it is NOT RECOMMENDED.
encode_utf16_be
Encodes the given str to UTF-16BE. The recommended way to encode text strings, as it supports all of unicode and all major PDF readers support it.
extract_text
Extract the text from a pdf at path and return a String with the results
extract_text_by_pages
Extract the text from a pdf at path and return a Vec<String> with the results separately by page
extract_text_by_pages_encrypted
extract_text_encrypted
extract_text_from_mem
extract_text_from_mem_by_pages
extract_text_from_mem_by_pages_encrypted
extract_text_from_mem_encrypted
output_doc
Parse a given document and output it to output
output_doc_encrypted
output_doc_page
print_metadata
text_string
Creates a text string. If the input only contains ASCII characters, the string is encoded in PDFDocEncoding, otherwise in UTF-16BE.

Type Aliases§

ObjectId
Object identifier consists of two parts: object number and generation number.
Transform