Skip to main content

PlainTextConfig

Struct PlainTextConfig 

Source
pub struct PlainTextConfig {
    pub space_threshold: f64,
    pub newline_threshold: f64,
    pub preserve_layout: bool,
    pub line_break_mode: LineBreakMode,
}
Expand description

Configuration for plain text extraction

Controls how text is extracted and formatted when position information is not required. Thresholds are expressed in text space units and should be tuned based on your specific PDF characteristics.

§Default Configuration

use oxidize_pdf::text::plaintext::PlainTextConfig;

let config = PlainTextConfig::default();
assert_eq!(config.space_threshold, 0.3);
assert_eq!(config.newline_threshold, 10.0);
assert!(!config.preserve_layout);

Fields§

§space_threshold: f64

Space detection threshold (multiple of average character width)

When horizontal displacement between characters exceeds this threshold (expressed as a multiple of the average character width), a space character is inserted.

  • Lower values (0.1-0.2): More spaces inserted, good for tightly-spaced text
  • Default (0.3): Balanced for most documents
  • Higher values (0.4-0.5): Fewer spaces, good for wide-spaced text

Range: 0.05 to 1.0 (typical)

§newline_threshold: f64

Newline detection threshold (multiple of line height)

When vertical displacement between text elements exceeds this threshold (in text space units), a newline character is inserted.

  • Lower values (5.0-8.0): More line breaks, preserves paragraph structure
  • Default (10.0): Balanced for most documents
  • Higher values (15.0-20.0): Fewer line breaks, joins more text

Range: 1.0 to 50.0 (typical)

§preserve_layout: bool

Preserve original layout whitespace

When true, attempts to maintain the original document’s whitespace structure (indentation, spacing) by inserting appropriate spaces and newlines based on position changes in the PDF.

When false, uses minimal whitespace (single spaces between words, single newlines between paragraphs).

Use true for:

  • Documents with tabular data
  • Code listings or formatted text
  • Documents where indentation matters

Use false for:

  • Plain text extraction for search indexing
  • Content analysis where layout doesn’t matter
  • Maximum performance (less processing)
§line_break_mode: LineBreakMode

Line break handling mode

Controls how line breaks in the PDF are interpreted and processed. Different modes are useful for different document types and use cases.

Implementations§

Source§

impl PlainTextConfig

Source

pub fn new() -> Self

Create a new configuration with default values

§Examples
use oxidize_pdf::text::plaintext::PlainTextConfig;

let config = PlainTextConfig::new();
Source

pub fn dense() -> Self

Create a configuration optimized for dense text (tight spacing)

Lower thresholds detect spaces more aggressively, useful for PDFs with minimal character spacing.

§Examples
use oxidize_pdf::text::plaintext::PlainTextConfig;

let config = PlainTextConfig::dense();
assert_eq!(config.space_threshold, 0.1);
Source

pub fn loose() -> Self

Create a configuration optimized for loose text (wide spacing)

Higher thresholds avoid false space detection in documents with generous character spacing.

§Examples
use oxidize_pdf::text::plaintext::PlainTextConfig;

let config = PlainTextConfig::loose();
assert_eq!(config.space_threshold, 0.4);
Source

pub fn preserve_layout() -> Self

Create a configuration that preserves layout structure

Useful for documents with tabular data, code, or formatted text where whitespace is semantically important.

§Examples
use oxidize_pdf::text::plaintext::PlainTextConfig;

let config = PlainTextConfig::preserve_layout();
assert!(config.preserve_layout);

Trait Implementations§

Source§

impl Clone for PlainTextConfig

Source§

fn clone(&self) -> PlainTextConfig

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for PlainTextConfig

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Default for PlainTextConfig

Source§

fn default() -> Self

Returns the “default value” for a type. Read more
Source§

impl PartialEq for PlainTextConfig

Source§

fn eq(&self, other: &PlainTextConfig) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
Source§

impl StructuralPartialEq for PlainTextConfig

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<R, P> ReadPrimitive<R> for P
where R: Read + ReadEndian<P>, P: Default,

Source§

fn read_from_little_endian(read: &mut R) -> Result<Self, Error>

Read this value from the supplied reader. Same as ReadEndian::read_from_little_endian().
Source§

fn read_from_big_endian(read: &mut R) -> Result<Self, Error>

Read this value from the supplied reader. Same as ReadEndian::read_from_big_endian().
Source§

fn read_from_native_endian(read: &mut R) -> Result<Self, Error>

Read this value from the supplied reader. Same as ReadEndian::read_from_native_endian().
Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more