pdf-rs

⚠️ Warning: This project is currently under active development. APIs may change frequently and are not yet stable.

A PDF parsing library written in Rust.

Overview

pdf-rs is a Rust library for parsing PDF files. The project aims to provide parsing functionality for PDF document structures, including:

PDF version identification
Cross-reference table (xref) parsing
Object parsing (dictionaries, arrays, strings, etc.)
Basic access to PDF structures

Features

PDF Version Support: Supports PDF versions from 1.0 to 2.0
Object Parsing: Parses various object types in PDF, including dictionaries, arrays, strings, etc.
Cross-reference Table Parsing: Parses PDF's xref table to locate objects
Stream Reading: Uses Sequence trait for efficient streaming file reading
Memory Efficiency: Designed to minimize memory usage during parsing
Error Handling: Comprehensive error handling with detailed error messages
Type Safety: Fully utilizes Rust's type system for safety guarantees

Installation

Add this to your Cargo.toml:

[dependencies]
pdf-rs = "0.1"

Usage Example

use std::path::PathBuf;
use pdf_rs::document::PDFDocument;

// Create PDF document parser
let path = PathBuf::from("example.pdf");
let document = PDFDocument::open(path)?;

// Access PDF version
println!("PDF Version: {}", document.get_version());

// Get cross-reference table
let xrefs = document.get_xref();
println!("XRef entries: {}", xrefs.len());

API Documentation

For detailed API documentation, please refer to the crate documentation.

Module Structure

document: Main PDF document parsing functionality
objects: PDF object representations (dictionaries, arrays, strings, etc.)
parser: Core parsing logic for PDF objects
sequence: Streaming file reading utilities
tokenizer: Tokenization of PDF content
error: Error types and handling

Design Highlights

Modular Design: Different functionalities separated into different modules for easy maintenance and extension
Error Handling: Comprehensive error type system providing detailed error information
Memory Efficiency: Streaming reading avoids loading entire files into memory
Type Safety: Fully utilizes Rust's type system for safety guarantees
Extensibility: Designed with extensibility in mind for future enhancements

Current Status

The project is in early development stage. Basic PDF parsing functionality has been implemented, including version detection, xref table parsing, and basic object parsing.

Future Plans

Improve PDF object parsing functionality
Add encrypted PDF support
Implement advanced PDF content extraction features
Provide more user-friendly API interfaces
Add comprehensive documentation with examples
Improve performance and memory usage

Build Requirements

Rust 1.5+ (latest stable version recommended)

Build Steps

cargo build

Run Tests

cargo test

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Acknowledgments

Thanks to the Rust community for providing excellent tools and libraries
Inspired by other PDF parsing libraries in different languages

pdf-rs 0.1.6-dev