Module encoding

Expand description

Character Encoding Detection and Conversion Module

This module provides robust character encoding detection and conversion capabilities to handle the diverse encoding scenarios found in real-world PDF files.

§Overview

Many PDFs contain text encoded in various character sets beyond UTF-8, including:

Latin-1 (ISO 8859-1) - Common in European documents
Windows-1252 - Microsoft’s extension of Latin-1
MacRoman - Apple’s legacy encoding
Various PDF-specific encodings

This module provides automatic detection and graceful conversion with fallback handling for unrecognized characters.

Structs§

EncodingIssue: Information about encoding issues encountered
EncodingOptions: Configuration for character encoding processing
EncodingResult: Character encoding detection and conversion result
EnhancedDecoder: Enhanced character decoder implementation

Enums§

EncodingType: Supported encoding types for PDF text
IssueResolution

Traits§

CharacterDecoder: Main character decoder trait

Functions§

decode_text: Convenience function to decode bytes with default settings
decode_text_with_encoding: Convenience function to decode bytes with specific encoding

Module encoding

Module encoding Copy item path

§Overview

Structs§

Enums§

Traits§

Functions§

Module encoding