Skip to main content

Module streaming

Module streaming 

Source
Expand description

Streaming support for incremental PDF processing

This module provides advanced streaming capabilities for processing PDFs without loading the entire document into memory. It’s designed for handling very large PDFs or situations with limited memory.

§Features

  • Incremental Parsing: Parse PDF objects as they’re needed
  • Page Streaming: Process pages one at a time
  • Content Stream Processing: Handle content streams in chunks
  • Progressive Text Extraction: Extract text as it’s encountered
  • Memory Bounds: Configurable memory limits for buffering
  • Async Support: Future-ready for async I/O operations

§Example

use oxidize_pdf::streaming::{StreamingDocument, StreamingOptions};
use std::fs::File;

let file = File::open("large_document.pdf")?;
let options = StreamingOptions::default()
    .with_buffer_size(1024 * 1024) // 1MB buffer
    .with_page_cache_size(5);      // Keep 5 pages in memory

let mut doc = StreamingDocument::new(file, options)?;

// Process pages incrementally
while let Some(page) = doc.next_page()? {
    println!("Processing page {}", page.number());
     
    // Extract text incrementally
    let text = page.extract_text_streaming()?;
    println!("Text: {}", text);
}

Re-exports§

pub use chunk_processor::process_in_chunks;
pub use chunk_processor::ChunkOptions;
pub use chunk_processor::ChunkProcessor;
pub use chunk_processor::ChunkType;
pub use chunk_processor::ContentChunk;
pub use incremental_parser::process_incrementally;
pub use incremental_parser::IncrementalParser;
pub use incremental_parser::ParseEvent;
pub use page_streamer::PageStreamer;
pub use page_streamer::StreamingPage;
pub use text_streamer::stream_text;
pub use text_streamer::TextChunk;
pub use text_streamer::TextStreamOptions;
pub use text_streamer::TextStreamer;

Modules§

chunk_processor
Chunk-based content processing for streaming operations
incremental_parser
Incremental PDF parser for streaming operations
page_streamer
Page streaming for incremental page processing
text_streamer
Text streaming for incremental text extraction

Structs§

StreamingDocument
A PDF document that supports streaming operations
StreamingOptions
Options for streaming operations
StreamingStats
Statistics for streaming operations