pub struct ParallelExtractor;parallel only.Expand description
Parallel extractor that processes PDF pages across multiple threads.
Uses a batched strategy: divides pages into chunks and each rayon worker
opens a single PdfDocument instance to process its chunk sequentially.
This amortizes the cost of document opening, xref parsing, page tree walks,
and font loading across many pages instead of paying it per-page.
All results are returned in page order regardless of which thread processed each page.
Implementations§
Source§impl ParallelExtractor
impl ParallelExtractor
Sourcepub fn extract_all_text(path: &Path) -> Result<Vec<String>>
pub fn extract_all_text(path: &Path) -> Result<Vec<String>>
Extract plain text from every page of a PDF in parallel.
Opens the document once on the calling thread to determine the page count,
then divides pages into batches distributed across rayon worker threads.
Each worker opens a single PdfDocument and extracts all pages in its batch.
Returns a Vec<String> with one entry per page, in page order.
§Errors
Returns the first error encountered by any worker. If multiple workers fail, only one error is propagated (rayon semantics).
§Example
use std::path::Path;
use pdf_oxide::parallel::ParallelExtractor;
let pages = ParallelExtractor::extract_all_text(Path::new("report.pdf"))?;
assert_eq!(pages.len(), 42);Sourcepub fn extract_all_markdown(
path: &Path,
options: &ConversionOptions,
) -> Result<Vec<String>>
pub fn extract_all_markdown( path: &Path, options: &ConversionOptions, ) -> Result<Vec<String>>
Extract Markdown from every page of a PDF in parallel.
Behaves like extract_all_text but converts
each page to Markdown using the supplied ConversionOptions.
Returns a Vec<String> with one entry per page, in page order.
§Errors
Returns the first error encountered by any worker.
§Example
use std::path::Path;
use pdf_oxide::parallel::ParallelExtractor;
use pdf_oxide::converters::ConversionOptions;
let opts = ConversionOptions::default();
let pages = ParallelExtractor::extract_all_markdown(
Path::new("report.pdf"),
&opts,
)?;Auto Trait Implementations§
impl Freeze for ParallelExtractor
impl RefUnwindSafe for ParallelExtractor
impl Send for ParallelExtractor
impl Sync for ParallelExtractor
impl Unpin for ParallelExtractor
impl UnsafeUnpin for ParallelExtractor
impl UnwindSafe for ParallelExtractor
Blanket Implementations§
Source§impl<'a, T, E> AsTaggedExplicit<'a, E> for Twhere
T: 'a,
impl<'a, T, E> AsTaggedExplicit<'a, E> for Twhere
T: 'a,
Source§impl<'a, T, E> AsTaggedImplicit<'a, E> for Twhere
T: 'a,
impl<'a, T, E> AsTaggedImplicit<'a, E> for Twhere
T: 'a,
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
impl<ST, DT> CastableFrom<ST, Initialized, Initialized> for DT
impl<ST, DT> CastableFrom<ST, Uninit, Uninit> for DT
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§impl<T> Pointable for T
impl<T> Pointable for T
impl<T> Read<Exclusive, BecauseExclusive> for Twhere
T: ?Sized,
Source§impl<U, T> ToOwnedObj<U> for Twhere
U: FromObjRef<T>,
impl<U, T> ToOwnedObj<U> for Twhere
U: FromObjRef<T>,
Source§fn to_owned_obj(&self, data: FontData<'_>) -> U
fn to_owned_obj(&self, data: FontData<'_>) -> U
T, using the provided data to resolve any offsets.