pub struct TextChunk {Show 17 fields
pub value: String,
pub bbox: BoundingBox,
pub font_name: String,
pub font_size: f64,
pub font_weight: f64,
pub italic_angle: f64,
pub font_color: String,
pub contrast_ratio: f64,
pub symbol_ends: Vec<f64>,
pub text_format: TextFormat,
pub text_type: TextType,
pub pdf_layer: PdfLayer,
pub ocg_visible: bool,
pub index: Option<usize>,
pub page_number: Option<u32>,
pub level: Option<String>,
pub mcid: Option<i64>,
}Expand description
Atomic text fragment — one font run in the PDF content stream.
Fields§
§value: StringDecoded Unicode text content
bbox: BoundingBoxBounding box in page coordinates
font_name: StringFont name (base font name like “Helvetica”)
font_size: f64Font size in points (effective, after matrix transforms)
font_weight: f64Font weight (100.0 - 900.0)
italic_angle: f64Italic angle from font descriptor
font_color: StringText color as hex string (e.g. “#000000”)
contrast_ratio: f64Contrast ratio against background (1.0-21.0)
symbol_ends: Vec<f64>X-coordinate of each glyph end position
text_format: TextFormatText baseline format (normal, superscript, subscript)
text_type: TextTypeText type classification
pdf_layer: PdfLayerProcessing layer that produced this chunk
ocg_visible: boolWhether the OCG (Optional Content Group) is visible
index: Option<usize>Global index in extraction order
page_number: Option<u32>Page number (1-based)
level: Option<String>Nesting level (from structure tree)
mcid: Option<i64>Marked content identifier (from BDC/BMC operators in the content stream). Links this chunk to a structure tree node for semantic tagging.
Implementations§
Source§impl TextChunk
impl TextChunk
Sourcepub fn is_white_space_chunk(&self) -> bool
pub fn is_white_space_chunk(&self) -> bool
Whether the entire text value is whitespace.
Sourcepub fn compress_spaces(&mut self)
pub fn compress_spaces(&mut self)
Collapse consecutive spaces into single space.
Sourcepub fn text_length(&self) -> usize
pub fn text_length(&self) -> usize
Number of characters in the text.
Sourcepub fn average_symbol_width(&self) -> f64
pub fn average_symbol_width(&self) -> f64
Average width per symbol.
Sourcepub fn symbol_start_coordinate(&self, idx: usize) -> f64
pub fn symbol_start_coordinate(&self, idx: usize) -> f64
Get the X coordinate where the symbol at idx starts.
Sourcepub fn symbol_end_coordinate(&self, idx: usize) -> f64
pub fn symbol_end_coordinate(&self, idx: usize) -> f64
Get the X coordinate where the symbol at idx ends.
Trait Implementations§
Source§impl<'de> Deserialize<'de> for TextChunk
impl<'de> Deserialize<'de> for TextChunk
Source§fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
Auto Trait Implementations§
impl Freeze for TextChunk
impl RefUnwindSafe for TextChunk
impl Send for TextChunk
impl Sync for TextChunk
impl Unpin for TextChunk
impl UnsafeUnpin for TextChunk
impl UnwindSafe for TextChunk
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more