Expand description
Streaming support for incremental PDF processing
This module provides advanced streaming capabilities for processing PDFs without loading the entire document into memory. It’s designed for handling very large PDFs or situations with limited memory.
§Features
- Incremental Parsing: Parse PDF objects as they’re needed
- Page Streaming: Process pages one at a time
- Content Stream Processing: Handle content streams in chunks
- Progressive Text Extraction: Extract text as it’s encountered
- Memory Bounds: Configurable memory limits for buffering
- Async Support: Future-ready for async I/O operations
§Example
use oxidize_pdf::streaming::{StreamingDocument, StreamingOptions};
use std::fs::File;
let file = File::open("large_document.pdf")?;
let options = StreamingOptions::default()
.with_buffer_size(1024 * 1024) // 1MB buffer
.with_page_cache_size(5); // Keep 5 pages in memory
let mut doc = StreamingDocument::new(file, options)?;
// Process pages incrementally
while let Some(page) = doc.next_page()? {
println!("Processing page {}", page.number());
// Extract text incrementally
let text = page.extract_text_streaming()?;
println!("Text: {}", text);
}Re-exports§
pub use chunk_processor::process_in_chunks;pub use chunk_processor::ChunkOptions;pub use chunk_processor::ChunkProcessor;pub use chunk_processor::ChunkType;pub use chunk_processor::ContentChunk;pub use incremental_parser::process_incrementally;pub use incremental_parser::IncrementalParser;pub use incremental_parser::ParseEvent;pub use page_streamer::PageStreamer;pub use page_streamer::StreamingPage;pub use text_streamer::stream_text;pub use text_streamer::TextChunk;pub use text_streamer::TextStreamOptions;pub use text_streamer::TextStreamer;
Modules§
- chunk_
processor - Chunk-based content processing for streaming operations
- incremental_
parser - Incremental PDF parser for streaming operations
- page_
streamer - Page streaming for incremental page processing
- text_
streamer - Text streaming for incremental text extraction
Structs§
- Streaming
Document - A PDF document that supports streaming operations
- Streaming
Options - Options for streaming operations
- Streaming
Stats - Statistics for streaming operations