pub struct PlainTextConfig {
pub space_threshold: f64,
pub newline_threshold: f64,
pub preserve_layout: bool,
pub line_break_mode: LineBreakMode,
}Expand description
Configuration for plain text extraction
Controls how text is extracted and formatted when position information is not required. Thresholds are expressed in text space units and should be tuned based on your specific PDF characteristics.
§Default Configuration
use oxidize_pdf::text::plaintext::PlainTextConfig;
let config = PlainTextConfig::default();
assert_eq!(config.space_threshold, 0.3);
assert_eq!(config.newline_threshold, 10.0);
assert!(!config.preserve_layout);Fields§
§space_threshold: f64Space detection threshold (multiple of average character width)
When horizontal displacement between characters exceeds this threshold (expressed as a multiple of the average character width), a space character is inserted.
- Lower values (0.1-0.2): More spaces inserted, good for tightly-spaced text
- Default (0.3): Balanced for most documents
- Higher values (0.4-0.5): Fewer spaces, good for wide-spaced text
Range: 0.05 to 1.0 (typical)
newline_threshold: f64Newline detection threshold (multiple of line height)
When vertical displacement between text elements exceeds this threshold (in text space units), a newline character is inserted.
- Lower values (5.0-8.0): More line breaks, preserves paragraph structure
- Default (10.0): Balanced for most documents
- Higher values (15.0-20.0): Fewer line breaks, joins more text
Range: 1.0 to 50.0 (typical)
preserve_layout: boolPreserve original layout whitespace
When true, attempts to maintain the original document’s whitespace
structure (indentation, spacing) by inserting appropriate spaces and
newlines based on position changes in the PDF.
When false, uses minimal whitespace (single spaces between words,
single newlines between paragraphs).
Use true for:
- Documents with tabular data
- Code listings or formatted text
- Documents where indentation matters
Use false for:
- Plain text extraction for search indexing
- Content analysis where layout doesn’t matter
- Maximum performance (less processing)
line_break_mode: LineBreakModeLine break handling mode
Controls how line breaks in the PDF are interpreted and processed. Different modes are useful for different document types and use cases.
Implementations§
Source§impl PlainTextConfig
impl PlainTextConfig
Sourcepub fn new() -> Self
pub fn new() -> Self
Create a new configuration with default values
§Examples
use oxidize_pdf::text::plaintext::PlainTextConfig;
let config = PlainTextConfig::new();Sourcepub fn dense() -> Self
pub fn dense() -> Self
Create a configuration optimized for dense text (tight spacing)
Lower thresholds detect spaces more aggressively, useful for PDFs with minimal character spacing.
§Examples
use oxidize_pdf::text::plaintext::PlainTextConfig;
let config = PlainTextConfig::dense();
assert_eq!(config.space_threshold, 0.1);Sourcepub fn loose() -> Self
pub fn loose() -> Self
Create a configuration optimized for loose text (wide spacing)
Higher thresholds avoid false space detection in documents with generous character spacing.
§Examples
use oxidize_pdf::text::plaintext::PlainTextConfig;
let config = PlainTextConfig::loose();
assert_eq!(config.space_threshold, 0.4);Sourcepub fn preserve_layout() -> Self
pub fn preserve_layout() -> Self
Create a configuration that preserves layout structure
Useful for documents with tabular data, code, or formatted text where whitespace is semantically important.
§Examples
use oxidize_pdf::text::plaintext::PlainTextConfig;
let config = PlainTextConfig::preserve_layout();
assert!(config.preserve_layout);Trait Implementations§
Source§impl Clone for PlainTextConfig
impl Clone for PlainTextConfig
Source§fn clone(&self) -> PlainTextConfig
fn clone(&self) -> PlainTextConfig
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreSource§impl Debug for PlainTextConfig
impl Debug for PlainTextConfig
Source§impl Default for PlainTextConfig
impl Default for PlainTextConfig
Source§impl PartialEq for PlainTextConfig
impl PartialEq for PlainTextConfig
impl StructuralPartialEq for PlainTextConfig
Auto Trait Implementations§
impl Freeze for PlainTextConfig
impl RefUnwindSafe for PlainTextConfig
impl Send for PlainTextConfig
impl Sync for PlainTextConfig
impl Unpin for PlainTextConfig
impl UnwindSafe for PlainTextConfig
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§impl<T> Pointable for T
impl<T> Pointable for T
Source§impl<R, P> ReadPrimitive<R> for P
impl<R, P> ReadPrimitive<R> for P
Source§fn read_from_little_endian(read: &mut R) -> Result<Self, Error>
fn read_from_little_endian(read: &mut R) -> Result<Self, Error>
ReadEndian::read_from_little_endian().