pub struct PdfString { /* private fields */ }Expand description
A PDF string with encoding-aware conversion.
Stores the raw bytes as they appear in the PDF file. The encoding is
detected from the content: if the bytes start with 0xFE 0xFF (BOM),
the string is UTF-16BE; otherwise it is PDFDocEncoding.
Implementations§
Source§impl PdfString
impl PdfString
Sourcepub fn from_bytes(bytes: Vec<u8>) -> Self
pub fn from_bytes(bytes: Vec<u8>) -> Self
Create a PdfString from raw bytes (as parsed from the PDF).
Sourcepub fn from_unicode(s: &str) -> Self
pub fn from_unicode(s: &str) -> Self
Encode a UTF-8 string as a PDF string.
Uses PDFDocEncoding if every character is representable; otherwise uses
UTF-16BE with a 0xFE 0xFF byte-order mark. This matches the logic of
PDF_EncodeText() in PDFium upstream.
§Examples
let ascii = PdfString::from_unicode("hello");
assert_eq!(ascii.encoding(), PdfStringEncoding::PdfDocEncoding);
let unicode = PdfString::from_unicode("日本語");
assert_eq!(unicode.encoding(), PdfStringEncoding::Utf16Be);Sourcepub fn encoding(&self) -> PdfStringEncoding
pub fn encoding(&self) -> PdfStringEncoding
Detect encoding from the byte-order mark.
0xFE 0xFF→PdfStringEncoding::Utf16Be0xEF 0xBB 0xBF→PdfStringEncoding::Utf8Bom- Otherwise →
PdfStringEncoding::PdfDocEncoding
Sourcepub fn to_string_lossy(&self) -> String
pub fn to_string_lossy(&self) -> String
Decode to a Rust String (UTF-8), handling all PDF string encodings.
- UTF-16BE: decoded with surrogate-pair support; invalid pairs → U+FFFD.
- UTF-8 BOM: decoded as UTF-8 after stripping the BOM.
- PDFDocEncoding: each byte mapped to Unicode per ISO 32000-2 Annex D.
ISO 2022 language-tag escape sequences (U+001B…U+001B) present in
UTF-16BE and UTF-8 BOM strings are stripped, matching the behaviour of
StripLanguageCodes() / PDF_DecodeText() in PDFium upstream.
Sourcepub fn unicode_data(&self) -> String
👎Deprecated: use to_string_lossy() instead
pub fn unicode_data(&self) -> String
use to_string_lossy() instead
Decode to a Rust String (UTF-8), handling both PDF encodings.
Deprecated; use to_string_lossy instead.
Sourcepub fn get_unicode_data(&self) -> String
pub fn get_unicode_data(&self) -> String
Upstream-aligned alias for to_string_lossy.
Corresponds to ByteString::GetUnicodeData() in PDFium upstream.
Sourcepub fn get_raw_string(&self) -> &[u8] ⓘ
pub fn get_raw_string(&self) -> &[u8] ⓘ
Upstream-aligned alias for as_bytes.
Corresponds to ByteString::GetRawString() in PDFium upstream.