Skip to main content

PdfLoader

Struct PdfLoader 

Source
pub struct PdfLoader { /* private fields */ }
Expand description

Loads documents from a PDF file.

Uses pdf_extract to extract text content from PDF files. Supports two modes of operation:

  • Single document (default): All pages are combined into one Document.
  • Split pages: Each page becomes a separate Document, split on form feed characters (\x0c) that pdf_extract inserts between pages.

§Examples

use synaptic_pdf::{PdfLoader, Loader};

// Load entire PDF as one document
let loader = PdfLoader::new("document.pdf");
let docs = loader.load().await?;
assert_eq!(docs.len(), 1);

// Load with one document per page
let loader = PdfLoader::with_split_pages("document.pdf");
let docs = loader.load().await?;
// docs.len() == number of pages

Implementations§

Source§

impl PdfLoader

Source

pub fn new(path: impl Into<PathBuf>) -> Self

Create a new PdfLoader that extracts all text as a single document.

Source

pub fn with_split_pages(path: impl Into<PathBuf>) -> Self

Create a new PdfLoader that splits text into one document per page.

Page boundaries are detected by form feed characters (\x0c) inserted by the PDF extraction library.

Trait Implementations§

Source§

impl Loader for PdfLoader

Source§

fn load<'life0, 'async_trait>( &'life0 self, ) -> Pin<Box<dyn Future<Output = Result<Vec<Document>, SynapticError>> + Send + 'async_trait>>
where Self: 'async_trait, 'life0: 'async_trait,

Load all documents from this source.
Source§

fn lazy_load( &self, ) -> Pin<Box<dyn Stream<Item = Result<Document, SynapticError>> + Send + '_>>

Stream documents lazily. Default implementation wraps load().

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.